This post was kindly contributed by SAS Programming for Data Mining Applications - go there to comment and to read the full post. |
Partial Least Square is one of several supervised dimension reduction techniques and attracts attention in recent years. In the one hand, PLS is able to generate a series of scores that maximize linear correlation between dependent variables and independent variables, on the other hand, the loading of PLS can be regarded as similar counterpart from factor analysis, hence we can rotate the loadings from PLS therefore eliminate some of the non-significant variable in terms of prediction.
%macro PLSRotate(Loading, TransMat, PatternOut, PatternShort,
method=VARIMAX, threshold=0.25);
/* VARIMAX rotation of PLS loadings. Only variables having
large loadings after rotation will enter the final model.
Loading dataset contains XLoadings output from PROC PLS
and should have variable called NumberOfFactors
TransMat is the generated Transformation matrix;
PatternOut is the output Pattern after rotation;
PatternShort is the output Pattern with selected variables
*/
%local covars;
proc sql noprint;
select name into :covars separated by ' '
from sashelp.vcolumn
where libname="WORK" & memname=upcase("&Loading")
& upcase(name) NE "NUMBEROFFACTORS"
& type="num"
;
quit;
%put &covars;
data &Loading.(type=factor);
set &Loading;
_TYPE_='PATTERN';
_NAME_=compress('factor'||_n_);
run;
ods select none;
ods output OrthRotFactPat=&PatternOut;
ods output OrthTrans=&TransMat;
proc factor data=&Loading method=pattern rotate=&method simple;
var &covars;
run;
ods select all;
data &PatternShort;
set &PatternOut;
array _f{*} factor:;
_cntfac=0;
do _j=1 to dim(_f);
_f[_j]=_f[_j]*(abs(_f[_j])>&threshold); _cntfac+(_f[_j]>0);
end;
if _cntfac>0 then output;
drop _cntfac _j;
run;
%mend;
Here I try to replicate the case study in [1] which elaborated how to do and properties of VARIMAX rotation to PLS loadings. The PROC PLS output, after various tweaks on convergence criteria and singularity conditions, is still a little different from the result reported in [1] for factors other than the leading one, therefore, I will directly use the U=PS matrix in pp.215.
data loading;
input factor1-factor3;
cards;
-0.9280 -0.0481 0.2750
0.0563 -0.8833 0.5306
-0.9296 -0.0450 0.2720
-0.7534 0.1705 -0.5945
0.5917 -0.0251 -0.6450
0.9082 0.3345 0.1118
-0.8086 0.4551 -0.3800
;
run;
proc transpose data=loading out=loading2;
run;
data loading2(type=factor);
retain _TYPE_ "PATTERN";
set loading2;
run;
ods select none;
ods output OrthRotFactPat=OrthRotationOut;
ods output OrthTrans=OrthTrans;
proc factor data=Loading2 method=pattern rotate=varimax simple;
var col1-col7;
run;
ods select all;
Reference:
[1] Huiwen Wang; Qiang Liu , Yongping Tu, “Interpretation of PLS Regression Models with VARIMAX Rotation”, Computational Statistics and Data Analysis, Vol.48 (2005) pp207 – 219
This post was kindly contributed by SAS Programming for Data Mining Applications - go there to comment and to read the full post. |