The LOESS procedure
16. February 2008 05:48PROC LOESS implements a nonparametric method for estimating local regression surfaces pioneered by Cleveland (1979); also refer to Cleveland et al. (1988) and Cleveland and Grosse (1991). This method is commonly referred to as loess, which is short for local regression.
PROC LOESS allows greater flexibility than traditional modeling tools because you can use it for situations in which you do not know a suitable parametric form of the regression surface. Furthermore, PROC LOESS is suitable when there are outliers in the data and a robust fitting method is necessary.
The main features of PROC LOESS are as follows:
Local Regression and the Loess Method Assume that for i = 1 to n, the ith measurement yi of the response y and the corresponding measurement xi of the vector x of p predictors are related by
yi = g(xi) + ei
where g is the regression function and ei is a random error. The idea of local regression is that near x = x0, the regression function g(x) can be locally approximated by the value of a function in some specified parametric class. Such a local approximation is obtained by fitting a regression surface to the data points within a chosen neighborhood of the point x0.
In the loess method, weighted least squares is used to fit linear or quadratic functions of the predictors at the centers of neighborhoods. The radius of each neighborhood is chosen so that the neighborhood contains a specified percentage of the data points. The fraction of the data, called the smoothing parameter, in each local neighborhood controls the smoothness of the estimated surface. Data points in a given local neighborhood are weighted by a smooth decreasing function of their distance from the center of the neighborhood.
The result of procedure basically is a curved regression line, useful at least for data description purposes and as a diagnostic to suggest whether a linear regression is appropriate or not
Example 1.
ods output OutputStatistics=PredLOESS;proc loess data=ExperimentA;
model Yield = Temperature Catalyst / scale=sd degree=2 select=gcv;
run;
ods output close;
proc gam data=ExperimentA;
model Yield = loess(Temperature) loess(Catalyst) / method=gcv;
output out=PredGAM;
run;
.
Although LOESS provides a model of the response surface, it do not provide an equation stating the dependence and do not provide information about interactions and non-linearities.If the span for the preferred LOESS fit is small, it is unlikely that a common functiuon can be found for all the data. If the span is large, then it is quite likely that a common function can be found.
If we have more than one (or two) outliers or points of influence (leverage points) we can't just drop one point, re-do the hat matrix, drop another point and re-do the hat matrix one more time . We need a more comprehensive approach like LOESS, M-estimation (which was introduced by Huber in 1973), S-estimaion, LTS-estimation, and MM-estimation. All of these (other than LOESS) are in PROC ROBUSTREG.
Also MARS (Multivariate Adaptive Regression Splines) that fits piecewise linear regressions can be useful in the case that the dependent variable binary.
It uses separate regression slopes
in distinct intervals of the predictor variable space. PROC LOESS (version 8) which uses weighted polynomial regression, Kernel
regression (in INSIGHT), and
PROC TRANSREG (Version 8) which uses cubic polynomials in piecewise
regression are similar to MARS

Email 