Correlations of three variables

This post was kindly contributed by SAS ANALYSIS - go there to comment and to read the full post.

Question
There is an interesting question in statistics —
“There are 3 random variables X, Y and Z. The correlation between X and Y is 0.8 and the
correlation between X and Z is 0.8. What is the maximum and minimum correlation between Y and Z?”

Solutions
1. Geometric illustration
The value of corr(Y, Z) is the COS function of the angle between Y and Z. We already know the corr(X, Y) and corr(X, Z). In this particular case, the angle can be zero, which suggests Y and Z are identical and the max value of corr(Y, Z) is 1. The min value of corr(Y, Z) is caused by the biggest angle between Y and Z, which is 0.28.

2. Positive semi-definiteness property of the correlation matrix
Due to this feature, the determinant of the correlation matrix is greater than or equal to zero. Thus we will be able to construct a quadratic inequality to evaluate the boundaries, which is from 0.28 to 1.


proc fcmp outlib=work.funcs.test1;
  function corrdet(x, a, b);
    return(-x**2 + 2*a*b*x - a**2 -b**2 +1);
  endsub;
  function solvecorr(ini, a, b);
    array solvopts[5] initial abconv relconv
          maxiter solvstat (.5 .001 1.0e-6 100);
    initial = ini;
    x = solve('corrdet', solvopts, 0, ., a, b);
    return(x);
  endsub;
quit;

options cmplib = work.funcs;
data one;
  * Max value;
  upper = solvecorr(1, 0.8, 0.8);
  upper_check = corrdet(upper,0.8,0.8);
  * Min value;
  lower = solvecorr(-1, 0.8, 0.8);
  lower_check = corrdet(lower,0.8,0.8);
run;

Generalization
We can generalize the question to all possibilities for corr(X, Y) and corr(X, Z). First we need to create two user-defined functions to solve the maximum and the minimum values. Then we will be able to draw the max values and min values in the same plot.


proc fcmp outlib = work.funcs.test2;
  function upper(a, b);
     x = 4*(a**2)*(b**2) - 4*(a**2+b**2-1);
     if x ge 0 then y = -0.5*(sqrt(x) - 2*a*b);
     else y = .;
     return(y);
  endsub;
  function lower(a, b);
     x = 4*(a**2)*(b**2) - 4*(a**2+b**2-1);
     if x ge 0 then y = -0.5*(-sqrt(x) - 2*a*b);
     else y = .;
     return(y);
  endsub;
quit;

data two;
   do xy = -.99 to .99 by 0.01;
      do xz = -.99 to .99 by 0.01;
         upper = upper(xy, xz);
         lower = lower(xy, xz);
         output;
      end;
   end;
run;

proc template;
   define statgraph surface001;
   begingraph;
      layout overlay3d / cube = false rotate = 150 tilt = 30
         xaxisopts = (label="Correlation between X and Y") 
         yaxisopts = (label="Correlation between X and Z") 
         zaxisopts = (label="Boundaries of correlation between Y and Z") ;
      surfaceplotparm x = xy y = xz z = upper; 
      surfaceplotparm x = xy y = xz z = lower; 
      endlayout;
   endgraph;
   end;
run;

proc sgrender data =  two template = surface001;
run;

This post was kindly contributed by SAS ANALYSIS - go there to comment and to read the full post.