
An Introduction to Categorical Data Analysis
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Person
ALAN AGRESTI is Distinguished Professor Emeritus at the University of Florida. He has presented short courses on categorical data methods in 35 countries. He is the author of seven books, including the bestselling Categorical Data Analysis (Wiley), Foundations of Linear and Generalized Linear Models (Wiley), Statistics: The Art and Science of Learning from Data (Pearson), and Statistical Methods for the Social Sciences (Pearson).
Content
Preface ix
About the Companion Website xiii
1 Introduction 1
1.1 Categorical Response Data 1
1.2 Probability Distributions for Categorical Data 3
1.3 Statistical Inference for a Proportion 5
1.4 Statistical Inference for Discrete Data 10
1.5 Bayesian Inference for Proportions * 13
1.6 Using R Software for Statistical Inference about Proportions * 17
Exercises 21
2 Analyzing Contingency Tables 25
2.1 Probability Structure for Contingency Tables 26
2.2 Comparing Proportions in 2 × 2 Contingency Tables 29
2.3 The Odds Ratio 31
2.4 Chi-Squared Tests of Independence 36
2.5 Testing Independence for Ordinal Variables 42
2.6 Exact Frequentist and Bayesian Inference * 46
2.7 Association in Three-Way Tables 52
Exercises 56
3 Generalized Linear Models 65
3.1 Components of a Generalized Linear Model 66
3.2 Generalized Linear Models for Binary Data 68
3.3 Generalized Linear Models for Counts and Rates 72
3.4 Statistical Inference and Model Checking 76
3.5 Fitting Generalized Linear Models 82
Exercises 84
4 Logistic Regression 89
4.1 The Logistic Regression Model 89
4.2 Statistical Inference for Logistic Regression 94
4.3 Logistic Regression with Categorical Predictors 98
4.4 Multiple Logistic Regression 102
4.5 Summarizing Effects in Logistic Regression 107
4.6 Summarizing Predictive Power: Classification Tables, ROC Curves, and Multiple Correlation 110
Exercises 113
5 Building and Applying Logistic Regression Models 123
5.1 Strategies in Model Selection 123
5.2 Model Checking 130
5.3 Infinite Estimates in Logistic Regression 136
5.4 Bayesian Inference, Penalized Likelihood, and Conditional Likelihood for Logistic Regression * 140
5.5 Alternative Link Functions: Linear Probability and Probit Models * 145
5.6 Sample Size and Power for Logistic Regression * 150
Exercises 151
6 Multicategory Logit Models 159
6.1 Baseline-Category Logit Models for Nominal Responses 159
6.2 Cumulative Logit Models for Ordinal Responses 167
6.3 Cumulative Link Models: Model Checking and Extensions * 176
6.4 Paired-Category Logit Modeling of Ordinal Responses * 184
Exercises 187
7 Loglinear Models for Contingency Tables and Counts 193
7.1 Loglinear Models for Counts in Contingency Tables 194
7.2 Statistical Inference for Loglinear Models 200
7.3 The Loglinear - Logistic Model Connection 207
7.4 Independence Graphs and Collapsibility 210
7.5 Modeling Ordinal Associations in Contingency Tables 214
7.6 Loglinear Modeling of Count Response Variables * 217
Exercises 221
8 Models for Matched Pairs 227
8.1 Comparing Dependent Proportions for Binary Matched Pairs 228
8.2 Marginal Models and Subject-Specific Models for Matched Pairs 230
8.3 Comparing Proportions for Nominal Matched-Pairs Responses 235
8.4 Comparing Proportions for Ordinal Matched-Pairs Responses 239
8.5 Analyzing Rater Agreement * 243
8.6 Bradley-Terry Model for Paired Preferences * 247
Exercises 249
9 Marginal Modeling of Correlated, Clustered Responses 253
9.1 Marginal Models Versus Subject-Specific Models 254
9.2 Marginal Modeling: The Generalized Estimating Equations (GEE) Approach 255
9.3 Marginal Modeling for Clustered Multinomial Responses 260
9.4 Transitional Modeling, Given the Past 263
9.5 Dealing with Missing Data * 266
Exercises 268
10 Random Effects: Generalized Linear Mixed Models 273
10.1 Random Effects Modeling of Clustered Categorical Data 273
10.2 Examples: Random Effects Models for Binary Data 278
10.3 Extensions to Multinomial Responses and Multiple Random Effect Terms 284
10.4 Multilevel (Hierarchical) Models 288
10.5 Latent Class Models * 291
Exercises 295
11 Classification and Smoothing * 299
11.1 Classification: Linear Discriminant Analysis 300
11.2 Classification: Tree-Based Prediction 302
11.3 Cluster Analysis for Categorical Responses 306
11.4 Smoothing: Generalized Additive Models 310
11.5 Regularization for High-Dimensional Categorical Data (Large p) 313
Exercises 321
12 A Historical Tour of Categorical Data Analysis * 325
Appendix: Software for Categorical Data Analysis 331
A.1 R for Categorical Data Analysis 331
A.2 SAS for Categorical Data Analysis 332
A.3 Stata for Categorical Data Analysis 342
A.4 SPSS for Categorical Data Analysis 346
Brief Solutions to Odd-Numbered Exercises 349
Bibliography 363
Examples Index 365
Subject Index 369
PREFACE
In recent years, the use of specialized statistical methods for categorical data has increased dramatically, particularly for applications in the biomedical and social sciences. Partly this reflects the development during the past few decades of sophisticated methods for analyzing categorical data. It also reflects the increasing methodological sophistication of scientists and applied statisticians, most of whom now realize that it is unnecessary and often inappropriate to use methods for continuous data with categorical responses.
This third edition of the book is a substantial revision of the second edition. The most important change is showing how to conduct all the analyses using R software. As in the first two editions, the main focus is presenting the most important methods for analyzing categorical data. The book summarizes methods that have long played a prominent role, such as chi-squared tests, but gives special emphasis to modeling techniques, in particular to logistic regression.
The presentation in this book has a low technical level and does not require familiarity with advanced mathematics such as calculus or matrix algebra. Readers should possess a background that includes material from a two-semester statistical methods sequence for undergraduate or graduate nonstatistics majors. This background should include estimation and significance testing and exposure to regression modeling.
This book is designed for students taking an introductory course in categorical data analysis, but I also have written it for applied statisticians and practicing scientists involved in data analyses. I hope that the book will be helpful to analysts dealing with categorical response data in the social, behavioral, and biomedical sciences, as well as in public health, marketing, education, biological and agricultural sciences, and industrial quality control.
The basics of categorical data analysis are covered in Chapters 1 to 7. Chapter 2 surveys standard descriptive and inferential methods for contingency tables, such as odds ratios, tests of independence, and conditional versus marginal associations. I feel that an understanding of methods is enhanced, however, by viewing them in the context of statistical models. Thus, the rest of the text focuses on the modeling of categorical responses. I prefer to teach categorical data methods by unifying their models with ordinary regression models. Chapter 3 does this under the umbrella of generalized linear models. That chapter introduces generalized linear models for binary data and count data. Chapters 4 and 5 discuss the most important such model for binary data, logistic regression. Chapter 6 introduces logistic regression models for multicategory responses, both nominal and ordinal. Chapter 7 discusses loglinear models for contingency tables and other types of count data.
I believe that logistic regression models deserve more attention than loglinear models, because applications more commonly focus on the relationship between a categorical response variable and some explanatory variables (which logistic regression models do) than on the association structure among several response variables (which loglinear models do). Thus, I have given main attention to logistic regression in these chapters and in later chapters that discuss extensions of this model.
Chapter 8 presents methods for matched-pairs data. Chapters 9 and 10 extend the matched-pairs methods to apply to clustered, correlated observations. Chapter 9 does this with marginal models, emphasizing the generalized estimating equations (GEE) approach, whereas Chapter 10 uses random effects to model more fully the dependence. Chapter 11 is a new chapter, presenting classification and smoothing methods. That chapter also introduces regularization methods that are increasingly important with the advent of data sets having large numbers of explanatory variables. Chapter 12 provides a historical perspective of the development of the methods. The text concludes with an appendix showing the use of R, SAS, Stata, and SPSS software for conducting nearly all methods presented in this book. Many of the chapters now also show how to use the Bayesian approach to conduct the analyses.
The material in Chapters 1 to 7 forms the heart of an introductory course in categorical data analysis. Sections that can be skipped if desired, to provide more time for other topics, include Sections 1.5, 2.5-2.7, 3.3 and 3.5, 5.4-5.6, 6.3-6.4, and 7.4-7.6. Instructors can choose sections from Chapters 8 to 12 to supplement the topics of primary importance. Sections and subsections labeled with an asterisk can be skipped for those wanting a briefer survey of the methods.
This book has lower technical level than my book Categorical Data Analysis (3rd edition, Wiley 2013). I hope that it will appeal to readers who prefer a more applied focus than that book provides. For instance, this book does not attempt to derive likelihood equations, prove asymptotic distributions, or cite current research work.
Most methods for categorical data analysis require extensive computations. For the most part, I have avoided details about complex calculations, feeling that statistical software should relieve this drudgery. The text shows how to use R to obtain all the analyses presented. The Appendix discusses the use of SAS, Stata, and SPSS. The full data sets analyzed in the book are available at the text website www.stat.ufl.edu/~aa/cat/data. That website also lists typos and errors of which I have become aware since publication. The data files are also available at https://github.com/alanagresti/categorical-data.
Brief solutions to odd-numbered exercises appear at the end of the text. An instructor's manual will be included on the companion website for this edition: www.wiley.com/go/Agresti/CDA_3e. The aforementioned data sets will also be available on the companion website. Additional exercises are available there and at www.stat.ufl.edu/~aa/cat/Extra_Exercises, some taken from the 2nd edition to create space for new material in this edition and some being slightly more technical.
I owe very special thanks to Brian Marx for his many suggestions about the text over the past twenty years. He has been incredibly generous with his time in providing feedback based on teaching courses based on the book. I also thank those individuals who commented on parts of the manuscript or who made suggestions about examples or material to cover or provided other help such as noticing errors. Travis Gerke, Anna Gottard, and Keramat Nourijelyani gave me several helpful comments. Thanks also to Alessandra Brazzale, Debora Giovannelli, David Groggel, Stacey Handcock, Maria Kateri, Bernhard Klingenberg, Ioannis Kosmidis, Mohammad Mansournia, Trevelyan McKinley, Changsoon Park, Tom Piazza, Brett Presnell, Ori Rosen, Ralph Scherer, Claudia Tarantola, Anestis Touloumis, Thomas Yee, Jin Wang, and Sherry Wang. I also owe thanks to those who helped with the first two editions, especially Patricia Altham, James Booth, Jane Brockmann, Brian Caffo, Brent Coull, Al DeMaris, Anna Gottard, Harry Khamis, Svend Kreiner, Carla Rampichini, Stephen Stigler, and Larry Winner. Thanks to those who helped with material for my more advanced text (Categorical Data Analysis) that I extracted here, especially Bernhard Klingenberg, Yongyi Min, and Brian Caffo. Many thanks also to the staff at Wiley for their usual high-quality help.
A truly special by-product for me of writing books about categorical data analysis has been invitations to teach short courses based on them and spend research visits at many institutions around the world. With grateful thanks I dedicate this book to my hosts over the years. In particular, I thank my hosts in Italy (Adelchi Azzalini, Elena Beccalli, Rino Bellocco, Matilde Bini, Giovanna Boccuzzo, Alessandra Brazzale, Silvia Cagnone, Paula Cerchiello, Andrea Cerioli, Monica Chiogna, Guido Consonni, Adriano Decarli, Mauro Gasparini, Alessandra Giovagnoli, Sabrina Giordano, Paolo Giudici, Anna Gottard, Alessandra Guglielmi, Maria Iannario, Gianfranco Lovison, Claudio Lupi, Monia Lupparelli, Maura Mezzetti, Antonietta Mira, Roberta Paroli, Domenico Piccolo, Irene Poli, Alessandra Salvan, Nicola Sartori, Bruno Scarpa, Elena Stanghellini, Claudia Tarantola, Cristiano Varin, Roberta Varriale, Laura Ventura, Diego Zappa), the UK (Phil Brown, Bianca De Stavola, Brian Francis, Byron Jones, Gillian Lancaster, Irini Moustaki, Chris Skinner, Briony Teather), Austria (Regina Dittrich, Gilg Seeber, Helga Wagner), Belgium (Hermann Callaert, Geert Molenberghs), France (Antoine De Falguerolles, Jean-Yves Mary, Agnes Rogel), Germany (Maria Kateri, Gerhard Tutz), Greece (Maria Kateri, Ioannis Ntzoufras), the Netherlands (Ivo Molenaar, Marijte van Duijn, Peter van der Heijden), Norway (Petter Laake), Portugal (Francisco Carvalho, Adelaide Freitas, Pedro Oliveira, Carlos Daniel Paulino), Slovenia (Janez Stare), Spain (Elias Moreno), Sweden (Juni Palmgren, Elisabeth Svensson, Dietrich van Rosen), Switzerland (Anthony Davison, Paul Embrechts), Brazil (Clarice Demetrio, Bent Jörgensen, Francisco Louzada, Denise Santos), Chile (Guido Del Pino), Colombia (Marta Lucia Corrales Bossio, Leonardo Trujillo), Turkey (Aylin Alin), Mexico (Guillermina Eslava), Australia (Chris Lloyd), China (I-Ming Liu, Chongqi Zhang), Japan (Ritei Shibata), and New Zealand (Nye John, I-Ming Liu). Finally, thanks to...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.