Nonparametric regression analysis is regression without an assumption of linearity. The scope of nonparametric regression is very broad, ranging from "smoothing" the relationship between two variables in a scatterplot to multiple-regression analysis and generalized regression models (for example, logistic nonparametric regression for a binary response variable). Unthinkable only a few years ago, methods of nonparametric-regression analysis have been rendered practical by advances in statistics and computing, and are now a serious alternative to more traditional parametric-regression modelling.

This short course aims to provide
a broad introduction to nonparametric regression, covering the following topics
(as time permits): introduction to nonparametric regression; binning, local
averaging, and kernel estimators; local-polynomial regression ("loess");
robust nonparametric regression; regression and smoothing splines; statistical
inference for nonparametric regression; the role of nonparametric regression
in data analysis; nonparametric multiple regression, including additive regression
models; generalized nonparametric regression, including generalized additive
models.

Topic |
Materials |

"Crash" Course in R | R script file, Tom Short's R reference card, exercises (R script for answers), Duncan.txt, Long.txt, Powers.txt |

Nonparametric Regression | Lecture notes (corrected), R script file, exercises (R scripts for answers: part 1, part2), R resources in nonparametric regression, loessPlot.R |

J. M. Chambers and T.J. Hastie, eds.,
*Statistical Models in S*. Pacific Grove, CA: Wadsworth, 1992. This volume
includes excellent introductions to three aspects of nonparametric regression,
which are of value independent of interest in S (i.e, R and S-PLUS): A chapter
on additive regression models (generalized additive models) by Hastie; another
on local polynomial regression (lowess or loess) models by Cleveland, Grosse,
and Shyu; and a third on regression and classification trees by Clark and Pregibon.

A.W. Bowman and A. Azzalini. *Applied
Smoothing Techniques for Data Analysis: The Kernel Approach with S-Plus Illustrations.*
Oxford, Oxford University Press, 1997. An accessible treatment of nonparametric
regression and related methods, with a useful library of S programs and worked-out
S examples.

J. Fan and I. Gijbels. *Local Polynomial
Modelling and Its Applications.* London: Chapman and Hall, 1996. A technical
presentation of the theoretical underpinnings of local polynomial regression
estimators (such as lowess/loess). Includes an extensive set of references to
the journal literature.

J. Fox. *Nonparametric Simple Regression:
Smoothing Scatterplots*, and J. Fox, *Multiple and Generalized Nonparametric
Regression*. Thousand Oaks, CA, Sage (2000). These two monographs provide
the material for my lectures on nonparametric regression.

P.J. Green and B.W. Silverman. *Nonparametric
Regression and Generalized Linear Models: A Roughness Penalty Approach*.
London, Chapman and Hall, 1994. Describes smoothing splines, the major alternative
to local polynomial regression. A relatively difficult, but very high quality
text.

F. E. Harrell, Jr. *Regression
Modeling Strategies: With Applications to Linear Models, Logistic Regression,
and Survival Analysis*. New York: Springer, 2001. Although Harrell deals
very little with nonparametric regression per se, he does show how much the
same effect can be achieved in a linear (or generalized-linear) model through
the use of regression splines.

T. J. Hastie and R. J. Tibshirani.*
Generalized Additive Models*. London: Chapman and Hall, 1990. This is - for
the most part - a very readable book. Generalized additive models include additive
regression models, but extend additive nonparametric regression to other 'link'
functions -- such as logistic regression, probit regression, and Poisson regression.
The book provides a fine general introduction to nonparametric regression.

T. Hastie, R. Tibshirani, and J.
Friedman. *The Elements of Statistical Learning: Data Mining, Inference, and
Prediction*. New York: Springer, 2001. The focus of this book is much broader
than nonparametric regression, as it is usually conceived, but the authors include
excellent treatments of both spline- and kernel-based smoothing methods, among
others.

C. Loader. *Local Regression and
Likelihood*. New York: Springer, 1999. This is a wide-ranging and reasonably
accessible treatment of local polynomial estimation for a variety of statistical
problems, including density estimation, regression models, generalized regression
models, and survival models. Loader's book is associated with an excellent library
of S functions.

B. W. Silverman.* Density Estimation for Statistics and Data Analysis*.
London: Chapman and Hall, 1986. Kernel density estimation - smoothing the distribution
of a variable or variables - is a relatively narrow topic in graphical data
analysis, but it is valuable in its own right and provides a basis for methods
of nonparametric regression. Silverman's short book is a paragon of clarity.

J. S. Simonoff. *Smoothing Methods
in Statistics*. New York: Springer, 1996. This book covers a variety of applications
of smoothing, including - but not limited to - nonparametric density estimation
and nonparametric regression. Simonoff develops a number of illustrative applications
and provides good references to the journal literature and to computer programs.
Some of the theoretical material is relatively difficult, but of the several
texts devoted to general ideas in smoothing with which I am familiar, this and
Bowman and Azzalini are the most accessible.

W.N. Venables and B.D. Ripley. *Modern
Applied Statistics with S, Fouth Edition*. New York: Springer, 2002. As the
title implies, this book has a broad focus, but it has good coverage of a wide
variety of nonparametric regression methods, and demonstrates their implementation
in S.

S.N. Wood, Modelling and smoothing
parameter estimation with multiple quadratic penalties. *Journal of the Royal
Statistical Society, Series B*, 62: 413-428, 2000.

S.N. Wood. *mgcv**:*
GAMs and generalized ridge regression for R*.* *R News* 1(2):20-25,
2001.

S. N. Wood. Stable and efficient
multiple smoothing parameter estimation for generalized additive models. *Journal
of the American Statistical Association* 99:673-686, 2004.

These papers the mgcv
package in R, which contains a gam
function for fitting generalized additive models. The initials "mgcv"
stand for multiple generalized cross validation, the method by which Wood selects
GAM smoothing parameters.

R can be downloaded from Comprehensive R Archive Network (CRAN) web site; further information is available on the R home page.

Manuals

R is distributed with a set of manuals, which are also available at the CRAN web site.

A manual for S-PLUS Trellis Graphics
(also useful for the lattice package in R) is available
on the web.

Programming in S

R. A. Becker, J. M. Chambers, and
A .R. Wilks, *The New S Language: A Programming Environment for Data Analysis
and Statistics*. Pacific Grove, CA: Wadsworth, 1988. Defines S Version 2,
which forms the basis of the currently used S Versions 3 and 4, as well as R.
(Sometimes called the "Blue Book.")

J. M. Chambers, *Programming with
Data: A Guide to the S Language*. New York: Springer, 1998. Describes the
new features in S Version 4, including the newer formal object-oriented programming
system (also incorporated in R), by the principal designer of the S language.
Not an easy read. (The "Green Book.")

J. M. Chambers and T.J. Hastie,
eds., *Statistical Models in S*. Pacific Grove, CA: Wadsworth, 1992. An
edited volume describing the statistical modeling language in S, Versions 3
and 4, and R, and the object-oriented programming system used in S Version 3
and R (and available, for "backwards compatibility," in S Version 4). In addition,
the text covers S software for particular kinds of statistical models, including
linear models, nonlinear models, generalized linear models, local-polynomial
regression models, and generalized additive models. (The "White Book.")

R. Ihaka and R. Gentleman, R: A
language for data analysis and graphics. *Journal of Computational and Graphical
Statistics*, 5:299-314, 1996. The original published description of the R
project, now dated but still worth looking at.

W. N. Venables and B. D. Ripley,
*S Programming*. New York: Springer, 2000. The definitive treatment of
writing software in the various versions S-PLUS and R, now slightly dated, particularly
with respect to R.

Selected Statistical Methods Programmed in S (beyond nonparametric regression)

C. Davison and D. V. Hinkley, *Bootstrap
Methods and their Application*. Cambridge: Cambridge University Press, 1997.
A comprehensive introduction to bootstrap resampling, associated with the boot
package (for S-PLUS and R, written by A. J. Canty). Somewhat more difficult
than Efron and Tibshirani.

J. Fox, *An R and S-PLUS Companion
to Applied Regression*, Sage, 2002. Provides a general introduction to S,
with a focus on applied regression analysis and generalized linear models. Appendices
available on the book's
web site cover a variety of methods -- nonparametric regression, nonlinear
regression, etc.

B. Efron and R. J. Tibshirani, *An
Introduction to the Bootstrap*. London: Chapman and Hall, 1993. Another extensive
treatment of bootstrapping by its originator (Efron), also accompanied by an
S package, bootstrap (for both
S-PLUS and R, but somewhat less usable than boot).

J. L. Schafer, *Analysis of Incomplete
Multivariate Data*. London: Chapman and Hall, 1997. This text presents a
broadly applicable Bayesian treatment of missing-data problems, including methods
for multiple imputation. The most extensive implementation of the methods in
the book is in the missing library
in S-PLUS version 6. Schafer's norm,
cat, mix,
and pan packages are available
for earlier versions of S-PLUS and for R.

T. M. Therneau and P. M. Grambsch,
*Modeling Survival Data: Extending the Cox Model*. New York, Springer:
2000. An overview of both basic and advanced methods of survival analysis (event-history
analysis), with reference to S and SAS software. There are both S-PLUS and R
versions of Therneau's state-of-the-art survival
package.

W. N. Venables and B. D. Ripley.
*Modern Applied Statistics with S, Fourth Edition*. New York: Springer,
2002. An influential and wide-ranging treatment of data analysis using S. Many
of the facilities described in the book are programmed in the associated (and
indispensable) MASS, nnet,
and spatial packages, available
both for S-PLUS and R. This text is more advanced and has a broader focus than
my *R and S-PLUS Companion*.

Other Sources (Some Free)

The *R
News* newsletter is an excellent source of information on R.

See the R web site for a list of publications.

Last Modified: 25 January 2007 by J. Fox <jfox AT mcmaster.ca>