This post was kindly contributed by SAS ANALYSIS - go there to comment and to read the full post. |
Question
There is an interesting question in statistics —
“There are 3 random variables X, Y and Z. The correlation between X and Y is 0.8 and the
correlation between X and Z is 0.8. What is the maximum and minimum correlation between Y and Z?”
Solutions
1. Geometric illustration
The value of corr(Y, Z) is the COS function of the angle between Y and Z. We already know the corr(X, Y) and corr(X, Z). In this particular case, the angle can be zero, which suggests Y and Z are identical and the max value of corr(Y, Z) is 1. The min value of corr(Y, Z) is caused by the biggest angle between Y and Z, which is 0.28.
2. Positive semi-definiteness property of the correlation matrix
Due to this feature, the determinant of the correlation matrix is greater than or equal to zero. Thus we will be able to construct a quadratic inequality to evaluate the boundaries, which is from 0.28 to 1.
proc fcmp outlib=work.funcs.test1;
function corrdet(x, a, b);
return(-x**2 + 2*a*b*x - a**2 -b**2 +1);
endsub;
function solvecorr(ini, a, b);
array solvopts[5] initial abconv relconv
maxiter solvstat (.5 .001 1.0e-6 100);
initial = ini;
x = solve('corrdet', solvopts, 0, ., a, b);
return(x);
endsub;
quit;
options cmplib = work.funcs;
data one;
* Max value;
upper = solvecorr(1, 0.8, 0.8);
upper_check = corrdet(upper,0.8,0.8);
* Min value;
lower = solvecorr(-1, 0.8, 0.8);
lower_check = corrdet(lower,0.8,0.8);
run;
Generalization
We can generalize the question to all possibilities for corr(X, Y) and corr(X, Z). First we need to create two user-defined functions to solve the maximum and the minimum values. Then we will be able to draw the max values and min values in the same plot.
proc fcmp outlib = work.funcs.test2;
function upper(a, b);
x = 4*(a**2)*(b**2) - 4*(a**2+b**2-1);
if x ge 0 then y = -0.5*(sqrt(x) - 2*a*b);
else y = .;
return(y);
endsub;
function lower(a, b);
x = 4*(a**2)*(b**2) - 4*(a**2+b**2-1);
if x ge 0 then y = -0.5*(-sqrt(x) - 2*a*b);
else y = .;
return(y);
endsub;
quit;
data two;
do xy = -.99 to .99 by 0.01;
do xz = -.99 to .99 by 0.01;
upper = upper(xy, xz);
lower = lower(xy, xz);
output;
end;
end;
run;
proc template;
define statgraph surface001;
begingraph;
layout overlay3d / cube = false rotate = 150 tilt = 30
xaxisopts = (label="Correlation between X and Y")
yaxisopts = (label="Correlation between X and Z")
zaxisopts = (label="Boundaries of correlation between Y and Z") ;
surfaceplotparm x = xy y = xz z = upper;
surfaceplotparm x = xy y = xz z = lower;
endlayout;
endgraph;
end;
run;
proc sgrender data = two template = surface001;
run;
This post was kindly contributed by SAS ANALYSIS - go there to comment and to read the full post. |