
Introduction to Linear Regression Analysis
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
A comprehensive and current introduction to the fundamentals of regression analysis
Introduction to Linear Regression Analysis, 6th Edition is the most comprehensive, fulsome, and current examination of the foundations of linear regression analysis. Fully updated in this new sixth edition, the distinguished authors have included new material on generalized regression techniques and new examples to help the reader understand retain the concepts taught in the book.
The new edition focuses on four key areas of improvement over the fifth edition:
* New exercises and data sets
* New material on generalized regression techniques
* The inclusion of JMP software in key areas
* Carefully condensing the text where possible
Introduction to Linear Regression Analysis skillfully blends theory and application in both the conventional and less common uses of regression analysis in today's cutting-edge scientific research. The text equips readers to understand the basic principles needed to apply regression model-building techniques in various fields of study, including engineering, management, and the health sciences.
More details
Other editions
Additional editions

Persons
DOUGLAS C. MONTGOMERY, PHD, is Regents Professor of Industrial Engineering and Statistics at Arizona State University. Dr. Montgomery is the co-author of several Wiley books including Introduction to Linear Regression Analysis, 5th Edition.
ELIZABETH A. PECK, PHD, is Logistics Modeling Specialist at the Coca-Cola Company in Atlanta, Georgia.
G. GEOFFREY VINING, PHD, is Professor in the Department of Statistics at Virginia Polytechnic and State University. Dr. Peck is co-author of Introduction to Linear Regression Analysis, 5th Edition.
Content
Preface xiii
About the Companion Website xvi
1. Introduction 1
1.1 Regression and Model Building 1
1.2 Data Collection 5
1.3 Uses of Regression 9
1.4 Role of the Computer 10
2. Simple Linear Regression 12
2.1 Simple Linear Regression Model 12
2.2 Least-Squares Estimation of the Parameters 13
2.2.1 Estimation of ß0 and ß1 13
2.2.2 Properties of the Least-Squares Estimators and the Fitted Regression Model 18
2.2.3 Estimation of s2 20
2.2.4 Alternate Form of the Model 22
2.3 Hypothesis Testing on the Slope and Intercept 22
2.3.1 Use of t Tests 22
2.3.2 Testing Significance of Regression 24
2.3.3 Analysis of Variance 25
2.4 Interval Estimation in Simple Linear Regression 29
2.4.1 Confidence Intervals on ß0, ß1, and s2 29
2.4.2 Interval Estimation of the Mean Response 30
2.5 Prediction of New Observations 33
2.6 Coefficient of Determination 35
2.7 A Service Industry Application of Regression 37
2.8 Does Pitching Win Baseball Games? 39
2.9 Using SAS® and R for Simple Linear Regression 41
2.10 Some Considerations in the Use of Regression 44
2.11 Regression Through the Origin 46
2.12 Estimation by Maximum Likelihood 52
2.13 Case Where the Regressor x Is Random 53
2.13.1 x and y Jointly Distributed 54
2.13.2 x and y Jointly Normally Distributed: Correlation Model 54
Problems 59
3. Multiple Linear Regression 69
3.1 Multiple Regression Models 69
3.2 Estimation of the Model Parameters 72
3.2.1 Least-Squares Estimation of the Regression Coefficients 72
3.2.2 Geometrical Interpretation of Least Squares 79
3.2.3 Properties of the Least-Squares Estimators 81
3.2.4 Estimation of s2 82
3.2.5 Inadequacy of Scatter Diagrams in Multiple Regression 84
3.2.6 Maximum-Likelihood Estimation 85
3.3 Hypothesis Testing in Multiple Linear Regression 86
3.3.1 Test for Significance of Regression 86
3.3.2 Tests on Individual Regression Coefficients and Subsets of Coefficients 90
3.3.3 Special Case of Orthogonal Columns in X 95
3.3.4 Testing the General Linear Hypothesis 97
3.4 Confidence Intervals in Multiple Regression 99
3.4.1 Confidence Intervals on the Regression Coefficients 100
3.4.2 ci Estimation of the Mean Response 101
3.4.3 Simultaneous Confidence Intervals on Regression Coefficients 102
3.5 Prediction of New Observations 106
3.6 A Multiple Regression Model for the Patient Satisfaction Data 106
3.7 Does Pitching and Defense Win Baseball Games? 108
3.8 Using SAS and R for Basic Multiple Linear Regression 110
3.9 Hidden Extrapolation in Multiple Regression 111
3.10 Standardized Regression Coefficients 115
3.11 Multicollinearity 121
3.12 Why Do Regression Coefficients Have the Wrong Sign? 123
Problems 125
4. Model Adequacy Checking 134
4.1 Introduction 134
4.2 Residual Analysis 135
4.2.1 Definition of Residuals 135
4.2.2 Methods for Scaling Residuals 135
4.2.3 Residual Plots 141
4.2.4 Partial Regression and Partial Residual Plots 148
4.2.5 Using Minitab®, SAS, and R for Residual Analysis 151
4.2.6 Other Residual Plotting and Analysis Methods 154
4.3 PRESS Statistic 156
4.4 Detection and Treatment of Outliers 157
4.5 Lack of Fit of the Regression Model 161
4.5.1 A Formal Test for Lack of Fit 161
4.5.2 Estimation of Pure Error from Near Neighbors 165
Problems 170
5. Transformations and Weighting To Correct Model Inadequacies 177
5.1 Introduction 177
5.2 Variance-Stabilizing Transformations 178
5.3 Transformations to Linearize the Model 182
5.4 Analytical Methods for Selecting a Transformation 188
5.4.1 Transformations on y: The Box-Cox Method 188
5.4.2 Transformations on the Regressor Variables 190
5.5 Generalized and Weighted Least Squares 194
5.5.1 Generalized Least Squares 194
5.5.2 Weighted Least Squares 196
5.5.3 Some Practical Issues 197
5.6 Regression Models with Random Effects 200
5.6.1 Subsampling 200
5.6.2 The General Situation for a Regression Model with a Single Random Effect 204
5.6.3 The Importance of the Mixed Model in Regression 208
Problems 208
6. Diagnostics for Leverage and Influence 217
6.1 Importance of Detecting Influential Observations 217
6.2 Leverage 218
6.3 Measures of Influence: Cook's D 221
6.4 Measures of Influence: DFFITS and DFBETAS 223
6.5 A Measure of Model Performance 225
6.6 Detecting Groups of Influential Observations 226
6.7 Treatment of Influential Observations 226
Problems 227
7. Polynomial Regression Models 230
7.1 Introduction 230
7.2 Polynomial Models in One Variable 230
7.2.1 Basic Principles 230
7.2.2 Piecewise Polynomial Fitting (Splines) 236
7.2.3 Polynomial and Trigonometric Terms 242
7.3 Nonparametric Regression 243
7.3.1 Kernel Regression 244
7.3.2 Locally Weighted Regression (Loess) 244
7.3.3 Final Cautions 249
7.4 Polynomial Models in Two or More Variables 249
7.5 Orthogonal Polynomials 255
Problems 261
8. Indicator Variables 268
8.1 General Concept of Indicator Variables 268
8.2 Comments on the Use of Indicator Variables 281
8.2.1 Indicator Variables versus Regression on Allocated Codes 281
8.2.2 Indicator Variables as a Substitute for a Quantitative Regressor 282
8.3 Regression Approach to Analysis of Variance 283
Problems 288
9. Multicollinearity 293
9.1 Introduction 293
9.2 Sources of Multicollinearity 294
9.3 Effects of Multicollinearity 296
9.4 Multicollinearity Diagnostics 300
9.4.1 Examination of the Correlation Matrix 300
9.4.2 Variance Inflation Factors 304
9.4.3 Eigensystem Analysis of X'X 305
9.4.4 Other Diagnostics 310
9.4.5 SAS and R Code for Generating Multicollinearity Diagnostics 311
9.5 Methods for Dealing with Multicollinearity 311
9.5.1 Collecting Additional Data 311
9.5.2 Model Respecification 312
9.5.3 Ridge Regression 312
9.5.4 Principal-Component Regression 329
9.5.5 Comparison and Evaluation of Biased Estimators 334
9.6 Using SAS to Perform Ridge and Principal-Component Regression 336
Problems 338
10. Variable Selection and Model Building 342
10.1 Introduction 342
10.1.1 Model-Building Problem 342
10.1.2 Consequences of Model Misspecification 344
10.1.3 Criteria for Evaluating Subset Regression Models 347
10.2 Computational Techniques for Variable Selection 353
10.2.1 All Possible Regressions 353
10.2.2 Stepwise Regression Methods 359
10.3 Strategy for Variable Selection and Model Building 367
10.4 Case Study: Gorman and Toman Asphalt Data Using SAS 370
Problems 383
11. Validation of Regression Models 388
11.1 Introduction 388
11.2 Validation Techniques 389
11.2.1 Analysis of Model Coefficients and Predicted Values 389
11.2.2 Collecting Fresh Data-Confirmation Runs 391
11.2.3 Data Splitting 393
11.3 Data from Planned Experiments 401
Problems 402
12. Introduction to Nonlinear Regression 405
12.1 Linear and Nonlinear Regression Models 405
12.1.1 Linear Regression Models 405
12.1.2 Nonlinear Regression Models 406
12.2 Origins of Nonlinear Models 407
12.3 Nonlinear Least Squares 411
12.4 Transformation to a Linear Model 413
12.5 Parameter Estimation in a Nonlinear System 416
12.5.1 Linearization 416
12.5.2 Other Parameter Estimation Methods 423
12.5.3 Starting Values 424
12.6 Statistical Inference in Nonlinear Regression 425
12.7 Examples of Nonlinear Regression Models 427
12.8 Using SAS and R 428
Problems 432
13. Generalized Linear Models 440
13.1 Introduction 440
13.2 Logistic Regression Models 441
13.2.1 Models with a Binary Response Variable 441
13.2.2 Estimating the Parameters in a Logistic Regression Model 442
13.2.3 Interpretation of the Parameters in a Logistic Regression Model 447
13.2.4 Statistical Inference on Model Parameters 449
13.2.5 Diagnostic Checking in Logistic Regression 459
13.2.6 Other Models for Binary Response Data 461
13.2.7 More Than Two Categorical Outcomes 461
13.3 Poisson Regression 463
13.4 The Generalized Linear Model 469
13.4.1 Link Functions and Linear Predictors 470
13.4.2 Parameter Estimation and Inference in the GLM 471
13.4.3 Prediction and Estimation with the GLM 473
13.4.4 Residual Analysis in the GLM 475
13.4.5 Using R to Perform GLM Analysis 477
13.4.6 Overdispersion 480
Problems 481
14. Regression Analysis of Time Series Data 495
14.1 Introduction to Regression Models for Time Series Data 495
14.2 Detecting Autocorrelation: The Durbin-Watson Test 496
14.3 Estimating the Parameters in Time Series Regression Models 501
Problems 517
15. Other Topics in the Use of Regression Analysis 521
15.1 Robust Regression 521
15.1.1 Need for Robust Regression 521
15.1.2 M-Estimators 524
15.1.3 Properties of Robust Estimators 531
15.2 Effect of Measurement Errors in the Regressors 532
15.2.1 Simple Linear Regression 532
15.2.2 The Berkson Model 534
15.3 Inverse Estimation-The Calibration Problem 534
15.4 Bootstrapping in Regression 538
15.4.1 Bootstrap Sampling in Regression 539
15.4.2 Bootstrap Confidence Intervals 540
15.5 Classification and Regression Trees (CART) 545
15.6 Neural Networks 547
15.7 Designed Experiments for Regression 549
Problems 557
Appendix A. Statistical Tables 561
Appendix B. Data Sets for Exercises 573
Appendix C. Supplemental Technical Material 602
C.1 Background on Basic Test Statistics 602
C.2 Background from the Theory of Linear Models 605
C.3 Important Results on SS R and SS Res 609
C.4 Gauss-Markov Theorem, Var(e) = s 2 I 615
C.5 Computational Aspects of Multiple Regression 617
C.6 Result on the Inverse of a Matrix 618
C.7 Development of the PRESS Statistic 619
C.8 Development of S(i) 2 621
C.9 Outlier Test Based on R-Student 622
C.10 Independence of Residuals and Fitted Values 624
C.11 Gauss-Markov Theorem, Var(e) = V 625
C.12 Bias in MSRes When the Model Is Underspecified 627
C.13 Computation of Influence Diagnostics 628
C.14 Generalized Linear Models 629
Appendix D. Introduction to SAS 641
D.1 Basic Data Entry 642
D.2 Creating Permanent SAS Data Sets 646
D.3 Importing Data from an EXCEL File 647
D.4 Output Command 648
D.5 Log File 648
D.6 Adding Variables to an Existing SAS Data Set 650
Appendix E. Introduction to R to Perform Linear Regression Analysis 651
E.1 Basic Background on R 651
E.2 Basic Data Entry 652
E.3 Brief Comments on Other Functionality in R 654
E.4 R Commander 655
References 656
Index 670
CHAPTER 1
INTRODUCTION
1.1 REGRESSION AND MODEL BUILDING
Regression analysis is a statistical technique for investigating and modeling the relationship between variables. Applications of regression are numerous and occur in almost every field, including engineering, the physical and chemical sciences, economics, management, life and biological sciences, and the social sciences. Regression analysis is used extensively in data mining and is a basic tool of data science and analytics. Because of its wide applicability to a range of problems, regression analysis may be the most widely used statistical technique.
As an example of a problem in which regression analysis may be helpful, suppose that an industrial engineer employed by a soft drink beverage bottler is analyzing the product delivery and service operations for vending machines. He suspects that the time required by a route deliveryman to load and service a machine is related to the number of cases of product delivered. The engineer visits 25 randomly chosen retail outlets having vending machines, and the in-outlet delivery time (in minutes) and the volume of product delivered (in cases) are observed for each. The 25 observations are plotted in Figure 1.1a. This graph is called a scatter diagram. This display clearly suggests a relationship between delivery time and delivery volume; in fact, the impression is that the data points generally, but not exactly, fall along a straight line. Figure 1.1b illustrates this straight-line relationship.
If we let y represent delivery time and x represent delivery volume, then the equation of a straight line relating these two variables is
(1.1)Figure 1.1 (a) Scatter diagram for delivery volume. (b) Straight-line relationship between delivery time and delivery volume.
where ß0 is the intercept and ß1 is the slope. Now the data points do not fall exactly on a straight line, so Eq. (1.1) should be modified to account for this. Let the difference between the observed value of y and the straight line (ß0 + ß1x) be an error e. It is convenient to think of e as a statistical error; that is, it is a random variable that accounts for the failure of the model to fit the data exactly. The error may be made up of the effects of other variables on delivery time, measurement errors, and so forth. Thus, a more plausible model for the delivery time data is
(1.2)Equation (1.2) is called a linear regression model. Customarily x is called the independent variable and y is called the dependent variable. However, this often causes confusion with the concept of statistical independence, so we refer to x as the predictor or regressor variable and y as the response variable. Because Eq. (1.2) involves only one regressor variable, it is called a simple linear regression model.
To gain some additional insight into the linear regression model, suppose that we can fix the value of the regressor variable x and observe the corresponding value of the response y. Now if x is fixed, the random component e on the right-hand side of Eq. (1.2) determines the properties of y. Suppose that the mean and variance of e are 0 and s2, respectively. Then the mean response at any value of the regressor variable is
Notice that this is the same relationship that we initially wrote down following inspection of the scatter diagram in Figure 1.1a. The variance of y given any value of x is
Thus, the true regression model µy|x = ß0 + ß1x is a line of mean values, that is, the height of the regression line at any value of x is just the expected value of y for that x. The slope, ß1 can be interpreted as the change in the mean of y for a unit change in x. Furthermore, the variability of y at a particular value of x is determined by the variance of the error component of the model, s2. This implies that there is a distribution of y values at each x and that the variance of this distribution is the same at each x.
Figure 1.2 How observations are generated in linear regression.
Figure 1.3 Linear regression approximation of a complex relationship.
For example, suppose that the true regression model relating delivery time to delivery volume is µy|x = 3.5 + 2x, and suppose that the variance is s2 = 2. Figure 1.2 illustrates this situation. Notice that we have used a normal distribution to describe the random variation in e. Since y is the sum of a constant ß0 + ß1x (the mean) and a normally distributed random variable, y is a normally distributed random variable. For example, if x = 10 cases, then delivery time y has a normal distribution with mean 3.5 + 2(10) = 23.5 minutes and variance 2. The variance s2 determines the amount of variability or noise in the observations y on delivery time. When s2 is small, the observed values of delivery time will fall close to the line, and when s2 is large, the observed values of delivery time may deviate considerably from the line.
In almost all applications of regression, the regression equation is only an approximation to the true functional relationship between the variables of interest. These functional relationships are often based on physical, chemical, or other engineering or scientific theory, that is, knowledge of the underlying mechanism. Consequently, these types of models are often called mechanistic models. For example, the familiar physics equation momentum = mass × velocity is a mechanistic model.
Regression models, on the other hand, are thought of as empirical models. Figure 1.3 illustrates a situation where the true relationship between y and x is relatively complex, yet it may be approximated quite well by a linear regression equation. Sometimes the underlying mechanism is more complex, resulting in the need for a more complex approximating function, as in Figure 1.4, where a "piecewise linear" regression function is used to approximate the true relationship between y and x.
Generally regression equations are valid only over the region of the regressor variables contained in the observed data. For example, consider Figure 1.5. Suppose that data on y and x were collected in the interval x1 = x = x2. Over this interval the linear regression equation shown in Figure 1.5 is a good approximation of the true relationship. However, suppose this equation were used to predict values of y for values of the regressor variable in the region x2 = x = x3. Clearly the linear regression model is not going to perform well over this range of x because of model error or equation error.
Figure 1.4 Piecewise linear approximation of a complex relationship.
Figure 1.5 The danger of extrapolation in regression.
In general, the response variable y may be related to k regressors, x1, x2, ., xk, so that
(1.3)This is called a multiple linear regression model because more than one regressor is involved. The adjective linear is employed to indicate that the model is linear in the parameters ß0, ß1, ., ßk, not because y is a linear function of the x's. We shall see subsequently that many models in which y is related to the x's in a nonlinear fashion can still be treated as linear regression models as long as the equation is linear in the ß's.
An important objective of regression analysis is to estimate the unknown parameters in the regression model. This process is also called fitting the model to the data. We study several parameter estimation techniques in this book. One of these techmques is the method of least squares (introduced in Chapter 2). For example, the least-squares fit to the delivery time data is
where is the fitted or estimated value of delivery time corresponding to a delivery volume of x cases. This fitted equation is plotted in Figure 1.1b.
The next phase of a regression analysis is called model adequacy checking, in which the appropriateness of the model is studied and the quality of the fit ascertained. Through such analyses the usefulness of the regression model may be determined. The outcome of adequacy checking may indicate either that the model is reasonable or that the original fit must be modified. Thus, regression analysis is an iterative procedure, in which data lead to a model and a fit of the model to the data is produced. The quality of the fit is then investigated, leading either to modification of the model or the fit or to adoption of the model. This process is illustrated...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.