An R and S-PLUS Companion to Applied Regression. John FOX. Thousand Oaks, CA: Sage Publications, 2002, xvi + 312 pp., $79.95 (H), $39.95 (P), ISBN: 0-7619-2279-2.
The S statistical computing language, implemented as the freely available R and the commercial package S-Plus, provides a superb environment for data exploration, manipulation, analysis, and graphical display. In addition to very high-level built-in functions for statistical and graphical analysis, it includes a well-developed programming language, making it highly extensible.
Fox's book admirably succeeds in its stated purpose: "to teach the use of R and S-Plus in the context of applied regression analysis" (p. ix). It provides an excellent vehicle for students and others without previous experience with S in any form to become comfortable with the basics of the environment and particularly with the tools available for preliminary exploratory analysis and for fitting and evaluating linear and generalized linear models.
The first two chapters (through p. 84) take the reader through the basics of using S-elementary commands and syntax, reading and manipulating data, data frames, handling of vectors and matrices, S functions for basic statistical operations, and so on. The third chapter deals with the kind of exploratory univariate and bivariate numerical and graphical analysis that should precede the fitting of any multivariate models. Symmetrizing and variance-stabilizing transformations, including Box-Cox, are treated here.
Only three chapters deal explicitly with regression models. The fitting of linear models, including regression and analysis of variance, is introduced in Chapter 4. An example of Fox's skill at clear explanation, as well as attention to the details of differences between R and S-Plus, is his treatment of contrasts in ANOVA models. Chapter 5 provides a similar introduction to generalized linear models. Chapter 6, on regression diagnostics, presents a unified framework for identifying outliers and influential points; diagnosing and dealing with non-linearity and nonconstant error variance; and for recognizing and coping with collinearity in both linear and generalized linear models.
Many of the functions introduced in this chapter are from Fox's own R package, car (Companion to Applied Regression), which is available from the Comprehensive R Archive Network (CRAN) and can be installed using the menu system in R. S-Plus versions of car can be downloaded from the Web site for the book. Some of the functions in car mask built-in S functions of the same names that are less general. For example, car's anova function performs model comparison of nested models for both linear and generalized linear models, using analysis of variance for the former and analysis of deviance for the latter.
The last two chapters deal with more advanced topics in general usage of the S language-specifically, advanced graphics and writing programs.
All statistical computing methods are introduced in the context of realistic analyses of a few real datasets. In most cases, the purpose of the analysis is presented, the required code or function calls are presented and explained, and the resulting output is interpreted.
Fox makes thoughtful and effective efforts to prevent the frustration that new users of sophisticated software all too often experience. For example, boxes highlight differences between R and S-Plus, and between different versions of S.
The Web site for the book, www.socsci.mcmaster.ca/jfox/Books/companion, includes the following features: a link to the Comprehensive R Archive Network, from which R and the car package for R can be downloaded; instructions for installing the current version of R for Windows; a link to the S-Plus Web site; a list of book errata and updates as a PDF file; links to other Web resources on R and S-Plus; a Web appendix to the book; scripts for all the examples in the text by chapter and appendix; a section on obtaining help with R and S-Plus; and information for instructors. The section on obtaining help with R and S-Plus continues the effort to prevent students from being frustrated by software problems for which they have no tools to solve. It includes discussion of how to find what you need in the R and S-Plus online help and printed manuals, links to FAQs and other support resources for R and S-Plus, and links for joining the r-help and s-news E-mail listservs. The author's usual penchant for practical advice reappears at the end of this section with a box reading:
Before posting to the r-help or s-news list: Posting a question to the r-help or s-news list should not be your first resort. The individuals who answer questions posted to these lists are volunteering their time. Before you post a question, you should make an attempt to answer it yourself using the resources listed here.
The Web appendix contains brief presentations, at a relatively advanced level (comparable to starred sections in the text), on the following topics: nonlinear regression and nonlinear least squares, nonparametric regression, robust regression, time-series regression and generalized least squares, Cox proportional-hazards regression for survival data, structural-equation models, bootstrapping regression models, and frames, environments, and scope in R and S-Plus. Script files are available for the examples in each appendix.
Given that Fox states in the Preface that the book is intended as "a companion to a text or course on modern applied regression. . . ," it is not surprising that this book could not stand alone as a vehicle for learning the art and science of regression analysis. For some topics, Fox emphasizes how to produce graphical or numeric statistical output with only cursory explanation of why they are desirable or how to interpret them. His choices of when to include detailed explanations and when not to are sometimes unexpected. For example, Fox includes a page of technical explanation of the iteratively reweighted least squares algorithm for computing maximum-likelihood estimates of coefficients in generalized linear models. However, despite mentioning that statistical texts are not consistent in their usage of the term "studentized residuals," he does not include a description of how the rstudent function in his car package computes studentized residuals.
It is disappointing that the book includes no exercises. However, the section on information for instructors includes directions for assembling datasets into an R package so that an instructor can easily distribute data-analysis exercises to students.
All in all, Fox's book would make an excellent supplementary text for an applied
regression course. It is specifically designed to accompany Fox's own textbook
(Fox 1997), but it could easily be used with others, such as Hamilton (1999).
An R and S-Plus Companion could also provide a very user-friendly introduction
to S for self-study by a person who already was familiar with the statistical
concepts it addresses.
Fox, J. (1997), Applied Regression Analysis, Linear Models, and Related Methods, Thousand Oaks, CA: Sage Publications.
Hamilton, L. (1999), Regression with Graphics, Belmont, CA: Brooks-Cole.
Mary Kathryn COWLES
The University of Iowa