Model selection in R featuring the lasso
Course Topics
The purpose of statistical model selection is to identify a parsimonious model, which is a model that is as simple as possible while maintaining good predictive ability over the outcome of interest. Parsimony is a fundamental concept in statistical modeling for a wide variety of fields, and many model selection and variable subset selection approaches have been proposed. The lasso, or 鈥渓east absolute shrinkage and selection operator,鈥 provides a method of continuous subset selection. Rather than completely including or excluding predictors, the lasso shrinks the magnitude of unimportant predictors and even has the ability to drive coefficients to zero for variables which have low predictive value for the response.
Implementing the lasso requires more technical groundwork compared with simpler subset selection or information criteria-based routines such as forward, backward, or stepwise selection and AIC or BIC. However, the lasso approach avoids some of the high variability associated with subset selection and is computationally cheaper to implement than information criteria when the number of candidate predictors is large (see e.g. Hastie, Tibshirani, Friedman 2009).
This short course includes lecture and computer laboratory components. In the lecture component the mathematical formulation of the lasso approach will be briefly motivated, compared, and contrasted with other methods including ordinary least squares, ridge regression, stepwise selection, and information criteria. During the laboratory portion the lasso approach will be implemented using R on a classic prostate data set (Stamey, et al. 1989), which includes 9 clinical measurements on 97 men. Specification of the lasso tuning parameter will be discussed and demonstrated via cross validation, which is another important modeling concept. This course covers more advanced content than other LISA short courses and assumes basic R coding ability and familiarity with regression and model selection.
Resources:
The Lasso Page:
Download R:
References:
Hastie T, Tibshirani T, Friedman J. The elements of statistical learning: data mining, inference, and prediction, 2009. Springer.
Stamey TA, Kabalin JN, McNeal JE, Johnstone IM, Freiha FS, Redwine EA, and Yang N: Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate. II. Radical prostatectomy treated patients. J Urol.141: 1076-1083, 1989.
[video:https://vimeo.com/73334369]
from on .