1.2 A COMPARISON OF MULTIPLE REGRESSION/CORRELATION AND ANALYSIS OF VARIANCEAPPROACHES
MRC, ANOVA, and ANCOVA are each special cases of the general linear model in mathe matical statistics.2 The description of MRC in this book includes extensions of conventional MRC analysis to the point where it is essentially equivalent to the general linear model. It thus follows that any data analyzable by ANOVA/ANCOVA may be analyzed by MRC, whereas the reverse is not the case. For example, research designs that study how a scaled characteristic of participants (e.g., IQ) and an experimental manipulation (e.g., structured vs. unstructured tasks) jointly influence the subjects' responses (e.g., task performance) cannot readily be fit into the ANOVA framework. Even experiments with factorial designs with unequal cell sam ple sizes present complexities for ANOVAapproaches because of the nonindependence of the factors, and standard computer programs now use a regression approach to estimate effects in such cases. The latter chapters of the book will extend the basic MRC model still further to include alternative statistical methods of estimating relationships.
1.2.1 Historical Background
Historically, MRC arose in the biological and behavioral sciences around 1900 in the study of the natural covariation of observed characteristics of samples of subjects, including Gallon's studies of the relationship between the heights of fathers and sons and Pearson's and Yule's work on educational issues (Yule, 1911). Somewhat later, ANOVA/ANCOVA grew out of the analysis of agricultural data produced by the controlled variation of treatment conditions in manipulative experiments. It is noteworthy that Fisher's initial statistical work in this area emphasized the multiple regression framework because of its generality (see Tatsuoka, 1993). However, multiple regression was often computationally intractable in the precomputer era: computations that take milliseconds by computer required weeks or even months to do by hand. This led Fisher to develop the computationally simpler, equal (or proportional) sample size ANOVA/ANCOVA model, which is particularly applicable to planned experiments. Thus multiple regression and ANOVA/ANCOVAapproaches developed in parallel and, from the perspective of the substantive researchers who used them, largely independently. Indeed, in certain disciplines such as psychology and education, the association of MRC with nonexper imental, observational, and survey research led some scientists to perceive MRC to be less scientifically respectable than ANOVA/ANCOVA, which was associated with experiments. Close examination suggests that this guilt (or virtue) by association is unwarranted—the result of the confusion of data-analytic method with the logical considerations that govern the inference of causality. Experiments in which different treatments are applied to randomly assigned groups of subjects and there is no loss (attrition) of subjects permit unambiguous inference of causality; the observation of associations among variables in a group of ran domly selected subjects does not. Thus, interpretation of a finding of superior early school achievement of children who participate in Head Start programs compared to nonparticipating children depends on the design of the investigation (Shadish, Cook, & Campbell, 2002; West, Biesanz, & Pitts, 2000). For the investigator who randomly assigns children to Head Start versus Control programs, attribution of the effect to program content is straightforward. For the investigator who simply observes whether children whose parents select Head Start pro grams have higher school achievement than those who do not, causal inference becomes less certain. Many other possible differences (e.g., child IQ; parent education) may exist betweenthe two groups of children that could potentially account for any findings. But each of the investigative teams may analyze their data using either ANOVA (or equivalently a t test of the mean difference in school achievement) or MRC (a simple one-predictor regression analysis of school achievement as a function of Head Start attendance with its identical t test). The logical status of causal inference is a function of how the data were produced, not how they were analyzed (see further discussion in several chapters, especially in Chapter 12).