The LOESS procedure

by Irina 16. February 2008 05:48

PROC LOESS implements a nonparametric method for estimating local regression surfaces pioneered by Cleveland (1979); also refer to Cleveland et al. (1988) and Cleveland and Grosse (1991). This method is commonly referred to as loess, which is short for local regression.

PROC LOESS allows greater flexibility than traditional modeling tools because you can use it for situations in which you do not know a suitable parametric form of the regression surface. Furthermore, PROC LOESS is suitable when there are outliers in the data and a robust fitting method is necessary.

The main features of PROC LOESS are as follows:
  • fits nonparametric models
  • supports the use of multidimensional predictors
  • supports multiple dependent variables
  • supports both direct and interpolated fitting using kd trees
  • computes confidence limits for predictions
  • performs iterative reweighting to provide robust
  • fitting when there are outliers in the data
  • supports scoring for multiple data sets
    Local Regression and the Loess Method Assume that for i = 1 to n, the ith measurement yi of the response y and the corresponding measurement xi of the vector x of p predictors are related by
    yi = g(xi) + ei

    where g is the regression function and ei is a random error. The idea of local regression is that near x = x0, the regression function g(x) can be locally approximated by the value of a function in some specified parametric class. Such a local approximation is obtained by fitting a regression surface to the data points within a chosen neighborhood of the point x0.


    In the loess method, weighted least squares is used to fit linear or quadratic functions of the predictors at the centers of neighborhoods. The radius of each neighborhood is chosen so that the neighborhood contains a specified percentage of the data points. The fraction of the data, called the smoothing parameter, in each local neighborhood controls the smoothness of the estimated surface. Data points in a given local neighborhood are weighted by a smooth decreasing function of their distance from the center of the neighborhood.

  • The result of procedure basically is a curved regression line, useful at least for data description purposes and as a diagnostic to suggest whether a linear regression is appropriate or not

    Example 1.

       ods output OutputStatistics=PredLOESS;
       proc loess data=ExperimentA;
          model Yield = Temperature Catalyst  / scale=sd degree=2 select=gcv;
       run;
      
    ods output close;

    proc gam data=ExperimentA;
          model Yield = loess(Temperature) loess(Catalyst) / method=gcv;
          output out=PredGAM;
       run;

    .

    Although LOESS provides a model of the response surface, it do not provide an equation stating the dependence and do  not provide information about interactions and non-linearities.If the span for the preferred LOESS fit is small, it is unlikely that a common functiuon can be found for all the data. If the span is large, then it is quite likely that a common function can be found.
    If we have more than one (or two) outliers or points of influence (leverage points) we can't just drop one point, re-do the hat matrix, drop another point and re-do the hat matrix one more time . We need a more comprehensive approach like LOESS, M-estimation (which was introduced by Huber in 1973), S-estimaion, LTS-estimation, and MM-estimation. All of these (other than LOESS) are in PROC ROBUSTREG.


    It's not recommended to use LOESS for a binary dependent variable. LOESS can certainly handle multivariate data. But the fit is done as a weighted least squares model of linear and/or quadratic forms of the regressors. So using it as is, on categorical data, is not the best idea.Instead possible to use PROC GAM. It can fit splines and other nonparametric models, as well as semi-parametric models and parametric models, and it can fit them to binary dependent variables too.PROC GAM, there isn't a simple linear system that can be fed into PROC SCORE for scoring new data. So PROC GAM has a convenient SCORE statement to take care of that for us.  


  • Also MARS (Multivariate Adaptive Regression Splines) that fits piecewise linear regressions can be useful in the case that the dependent variable binary. It uses separate regression slopes in distinct intervals of the predictor variable space. PROC LOESS (version 8) which uses weighted polynomial regression, Kernel regression (in INSIGHT), and PROC TRANSREG (Version 8) which uses cubic polynomials in piecewise regression are similar to MARS  


    Tags:

    SAS | models

    Related posts

    Add comment


    (will show your Gravatar icon)  





    Live preview

    9/7/2010 2:25:27 PM

     

    About the author

    Irina Spivak Irina Spivak
    Team Leader at G-Stat. More...


    Send mail Email

    Authors

    Blogroll

      Disclaimer

      The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

      © Copyright 2010

      Sign in

      eXTReMe Tracker