
An Introduction to Cochran-Mantel-Haenszel Testing and Nonparametric ANOVA
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Complete reference for applied statisticians and data analysts that uniquely covers the new statistical methodologies that enable deeper data analysis
An Introduction to Cochran-Mantel-Haenszel Testing and Nonparametric ANOVA provides readers with powerful new statistical methodologies that enable deeper data analysis. The book offers applied statisticians an introduction to the latest topics in nonparametrics. The worked examples with supporting R code provide analysts the tools they need to apply these methods to their own problems.
Co-authored by an internationally recognised expert in the field and an early career researcher with broad skills including data analysis and R programming, the book discusses key topics such as:
* NP ANOVA methodology
* Cochran-Mantel-Haenszel (CMH) methodology and design
* Latin squares and balanced incomplete block designs
* Parametric ANOVA F tests for continuous data
* Nonparametric rank tests (the Kruskal-Wallis and Friedman tests)
* CMH MS tests for the nonparametric analysis of categorical response data
Applied statisticians and data analysts, as well as students and professors in data analysis, can use this book to gain a complete understanding of the modern statistical methodologies that are allowing for deeper data analysis.
More details
Other editions
Additional editions


Persons
John Charles William Rayner is an Honorary Professorial Fellow, National Institute for Applied Statistics Research Australia, University of Wollongong, and Conjoint Professor of Statistics, School of Mathematical and Physical Sciences, University of Newcastle, Australia.
Glen Livingston, Jr., is a Lecturer, School of Mathematical and Physical Sciences, University of Newcastle, Australia.
Content
Preface xiii
1 Introduction 1
1.1 What are the CMH and NP ANOVA tests? 1
1.2 Outline 3
1.3 5
1.4 Examples 6
2 The Basic CMH Tests 13
2.1 Genesis: Cochran (1954), and Mantel and Haenszel (1959) 13
2.2 The basic CMH tests 18
2.3 The Nominal CMH tests 22
2.4 The CMH mean scores test 26
2.5 The CMH correlation test 28
3 The Completely Randomised Design 41
3.1 Introduction 41
3.2 The design and parametric model 42
3.3 The Kruskal-Wallis tests 43
3.4 Relating the Kruskal-Wallis and ANOVA F tests 47
3.5 The CMH tests for the CRD 49
3.6 The KW tests are CMH MS tests 52
3.7 Relating the CMH MS and ANOVA F tests 54
3.8 Simulation study 58
3.9 Wald test statistics in the CRD 61
4 The Randomised Block Design 71
4.1 Introduction 71
4.2 The design and parametric model 72
4.3 The Friedman tests 74
4.4 The CMH test statistics in the RBD 77
4.5 The Friedman tests are CMH MS tests 86
4.6 Relating the CMH MS and ANOVA F tests 88
4.7 Simulation study 91
4.8 Wald test statistics in the RBD 94
5 The Balanced Incomplete Block Design 101
5.1 Introduction 101
5.2 The Durbin tests 101
5.3 The relationship between the adjusted Durbin statistic and the ANOVA F statistic 103
5.4 Simulation study 110
5.5 Orthogonal contrasts for balanced designs with ordered treatments 113
5.6 A CMH MS analogue test statistic for the BIBD 124
6 Unconditional Analogues of CMH Tests 129
6.1 Introduction 129
6.2 Unconditional univariate moment tests 132
6.3 Generalised correlations 137
6.4 Unconditional bivariate moment tests 147
6.5 Unconditional general association tests 152
6.6 Stuart's Test 163
7 Higher Moment Extensions To The Ordinal CMH Tests 167
7.1 Introduction 167
7.2 Extensions to the CMH mean scores test 168
7.3 Extensions to the CMH correlation test 172
7.4 Examples 176
8 Unordered Nonparametric ANOVA 183
8.1 Introduction 183
8.2 Unordered NP ANOVA for the CMH design 187
8.3 Singly ordered three-way tables 189
8.4 The Kruskal-Wallis and Friedman tests are NP ANOVA tests 193
8.5 Are the CMH MS and extensions NP ANOVA tests? 197
8.6 Extension to other designs 199
8.7 Latin squares 202
8.8 Balanced incomplete blocks 204
9 The Latin Square Design 207
9.1 Introduction 207
9.2 The Latin square design and parametric model 208
9.3 The RL test 210
9.4 Alignment 212
9.5 Simulation study 216
9.6 Examples 225
9.7 Orthogonal trend contrasts for ordered treatments 232
9.8 Technical derivation of the RL test 238
10 Ordered Nonparametric ANOVA 243
10.1 Introduction 243
10.2 Ordered NP ANOVA for the CMH design 247
10.3 Doubly ordered three-way tables 249
10.4 Extension to other designs 252
10.5 Latin square rank tests 255
10.6 Modelling the moments of the response variable 257
10.7 Lemonade sweetness data 262
10.8 Breakfast cereal data revisited 271
11 Conclusion 275
11.1 CMH or NP ANOVA? 275
11.2 Homosexual marriage data revisited for the last time! 277
11.3 Job satisfaction data 280
11.4 The end 286
A Appendix 289
A.1 Kronecker Products and Direct Sums 289
A.2 The Moore-Penrose Generalised Inverse 292
1
Introduction
1.1 What Are the CMH and NP ANOVA Tests?
The Cochran-Mantel-Haenszel (CMH) tests are a suite of four nonparametric (NP) tests used to test the null hypothesis of no association against various alternatives. They are applicable when several treatments are applied on several independent strata, or blocks. Every treatment is applied on every stratum. The responses are categorical, and so are recorded as counts. Two of the tests require scores, and so are called ordinal tests; the two that don't are called nominal tests. Ranks, or if observations are not distinct then mid-ranks, are often used as category scores. However, there are many other options, such as the class midpoints if the data are real-valued and continuous. Often 'natural' scores are convenient: just score the categories 1, 2, ..
From time to time in the material following we will refer to the CMH design, meaning that categorical responses are recorded for a number of treatments applied to a number of independent strata or blocks. The simplest CMH designs are arguably the completely randomised design (CRD) and the randomised block design (RBD). The methods do not directly apply to the balanced incomplete block design (BIBD) and the Latin square design (LSD). Both of these have, in a sense, missing observations.
One reason why the CMH tests are important is they provide a third option for the two simplest designs: the CRD and the RBD. For the CRD if the data are consistent with certain assumptions the parametric one-way ANOVA test is available. If these assumptions aren't satisfied then the Kruskal-Wallis rank test is often applicable. If the responses are categorical then the CMH tests may be used. Similarly for the RBD the options of most interest are the parametric two-way ANOVA test, the Friedman rank test, and the CMH tests.
Where possible we seek to give analyses that will prove to be as accessible as possible. The nominal tests are simply related to what are often called chi-squared tests, but we prefer to call them Pearson tests. These are well-understood and often available in 'click-and-point packages' such as JMP, and are on call in 1. The Pearson tests are natural alternatives to the nominal CMH tests. We give a simple expression for the CMH correlation test. This expression involves familiar sums of squares and gives useful additional information as a corollary. For the CRD and the RBD we give simple expressions for the other CMH ordinal test, the CMH mean scores test.
This simplicity means that some analyses that can be done by hand, meaning pencil and paper. Some are better suited to 'click-and-point' packages such as JMP. Otherwise analyses are best done by computer packages.
In summary the CMH tests are nonparametric tests for categorical response data. The applicable designs include, but are not limited to, the CRD and the RBD. These designs are of fundamental importance in many areas of application.
The nonparametric (NP) ANOVA tests are competitor tests for the ordinal CMH tests. They apply to data sets for which a fixed effects ANOVA is an appropriate analysis. These tests involve transforming the responses, and possibly, if treatments are ordered, the ordered treatment scores as well, both using orthonormal polynomials. The ANOVA is then applied to the transformed data of given degree. The resulting analysis permits testing univariate treatment effects and bivariate treatment effects. Under weak assumptions tests of different degrees do not affect each other. The NP ANOVA methodology is available more generally than the CMH methodology.
1.2 Outline
A fundamental aim of this book is to introduce users of statistics to new methods in CMH and related testing. Before discussing the new CMH methods, it is necessary to give the old, or basic methods. This will be done in Chapter 2.
This is followed by a discussion of the CMH tests in the CRD and RBD. This will introduce the Kruskal-Wallis and Friedman tests that can be shown to be CMH tests. Although CMH methods are not directly applicable to the BIBD and the LSD, consideration will subsequently also be given to these designs.
Data for the traditional CMH tests are given as tables of counts. Inference assumes that all marginal totals in such a table is known before the data are collected. This is conditional inference, inference conditional on these quantities being known. There are dual methods for which the marginals are not known. This is unconditional inference. Confusion can arise from a lack of clarity as to whether a particular test is conditional or not.
Next we turn to what we call nonparametric ANOVA. The data may be unranked or ranked, categorical or not. The primary objective is to analyse the data using an ANOVA model available via the linear model platform inherent in, for example, JMP and . If only the responses are ordered the method enables higher moment effects to be scrutinised. If both treatments and responses are ordered, then as well as the usual order (1, 1) correlation, those of degree (1, 2) and (1, 2) may be assessed. These reflect umbrella effects. Higher degree generalised correlations may also be scrutinised.
Given that nonparametric ANOVA can assess higher degree effects, it is natural to generalise the ordered CMH tests so that they can do so too. It is then possible to give a comparison of analyses by both methods.
Our discussion will involve some well-known rank tests: the Kruskal-Wallis, Friedman, and Durbin tests. It is therefore sensible to make some general comments about ranking. Of particular interest is the treatment of ties. For general ranking methods see, for example, https://en.wikipedia.org/wiki/Ranking. For a treatment of ties in the sign test, see Rayner and Best (1999).
We now say what we mean by ranks. Given a set of data, , there exists a transformation such that are ordered. To say that has rank , namely, , means and (or the reverse set of inequalities). That the ranks are distinct means these inequalities are strict: . Ties occur when the ranks are not distinct. Of the many ways of dealing with ties, mid-ranks is perhaps the most used. Mid-ranks assigns to a group of tied data the mean of the ranks they would otherwise have been assigned. Thus if then the mid-rank for these data is .
Essentially ranking takes a set of observations, categorises them into categories, and assigns to these categories distinct scores, , say, with . Clearly for untied data and for . Suppose that if an observation of treatment falls in category then the indicator variable , and zero otherwise. Then , the rank for this observation.
1.3
We have written an package called CMHNPA, which will serve as an accompaniment to this text. The package contains all the data sets which are analysed as well as functions written for the statistical methods and techniques discussed. Within each of the chapters there is code where example data sets are used. If the output from the functions is excessive, it will sometimes be suppressed; however, the code will be presented for the reader to execute the functions themselves.
In the following set of example code, the package is loaded along with the car package. Code is also shown to attach a data set called dataset to the workspace. Attaching the data files to the workspace allows the variables within the data frame to be accessed directly when using the functions, as opposed to using dataframe$variable syntax.
All of the code that follows in this text has the type of code shown above omitted. Therefore, if the reader wishes to recreate the output in later chapters, the packages will need to be loaded, and the data set attached to the workspace.
The package is currently available from: https://cran.r-project.org/web/packages/CMHNPA/index.html. It will undergo ongoing development and so output of functions may change and additional options added for functions over time.
This brief introduction will be ended with two examples. The first involves a non-standard design to which the CMH methods do not apply. The purpose is to demonstrate another nonparametric approach: the rank transform (RT) method discussed by Conover and Iman (1981). If, in performing a parametric test, the assumptions are found to be dubious, then the idea is to replace the original data by their ranks - usually mid-ranks if ties occur - and perform the intended analysis on these. It will often be the case that the assumptions will be closer to what is required. On the other hand, the hypotheses will now be about the ranks rather than the original data. Also the reader should be aware there are caveats concerning the rank transform method. See, for example, https://en.wikipedia.org/wiki/ANOVA_on_ranks.
The second example is in archetypical CMH format. We return to the strawberry data in Chapters 8 and 10 and to the homosexual marriage data in Chapters 2, 3, 6, 7, and 11.
1.4 Examples
1.4.1 Strawberry Data
The data in Table 1.1 are from Pearce (1960). Pesticides are applied to strawberry plants to inhibit the growth...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.