
Statistics at Square Two
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
An easy-to-follow exploration of intermediate statistical techniques used in medical research
In the newly revised third edition of Statistics at Square Two: Understanding Modern Statistical Applications in Medicine, a team of distinguished statisticians delivers an accessible and intuitive discussion of advanced statistical methods for readers and users of scientific medical literature. This will allow readers to engage critically with modern research as the authors explain the correct interpretation of results in the medical literature.
The book includes two brand new chapters covering meta-analysis and time-series analysis as well as new references to the many checklists that have appeared in recent years to enable better reporting of contemporary research. Most examples have been updated as well, and each chapter contains practice exercises and answers. Readers will also find sample code (in R) for many of the analyses, in addition to:
* A thorough introduction to models and data, including the different types of data, statistical models, and computer-intensive methods
* Comprehensive explorations of multiple linear regression, including the interpretation of computer output, diagnostic statistics such as influential points, and many uses of multiple regression
* Practical discussions of multiple logistic regression, survival analysis, Poisson regression and random effects models including their uses, examples in the medical literature, and strategies for interpreting computer output
Perfect for anyone hoping to better understand the statistics presented in contemporary medical research, Statistics at Square Two: Understanding Modern Statistical Applications in Medicine will also benefit postgraduate students studying statistics and medicine.
More details
Other editions
Additional editions


Previous edition

Persons
Michael J. Campbell is Emeritus Professor of Medical Statistics at the University of Sheffield in the United Kingdom.
Richard M. Jacques is a Senior Lecturer in Medical Statistics at the University of Sheffield in the United Kingdom.
Content
Preface xi
1 Models, Tests and Data 1
1.1 Types of Data 1
1.2 Confounding, Mediation and Effect Modification 2
1.3 Causal Inference 3
1.4 Statistical Models 5
1.5 Results of Fitting Models 6
1.6 Significance Tests 7
1.7 Confidence Intervals 8
1.8 Statistical Tests Using Models 8
1.9 Many Variables 9
1.10 Model Fitting and Analysis: Exploratory and Confirmatory Analyses 10
1.11 Computer-intensive Methods 11
1.12 Missing Values 11
1.13 Bayesian Methods 12
1.14 Causal Modelling 12
1.15 Reporting Statistical Results in the Medical Literature 14
1.16 Reading Statistics in the Medical Literature 14
2 Multiple Linear Regression 17
2.1 The Model 17
2.2 Uses of Multiple Regression 18
2.3 Two Independent Variables 18
2.3.1 One Continuous and One Binary Independent Variable 19
2.3.2 Two Continuous Independent Variables 22
2.3.3 Categorical Independent Variables 22
2.4 Interpreting a Computer Output 23
2.4.1 One Continuous Variable 24
2.4.2 One Continuous Variable and One Binary Independent Variable 25
2.4.3 One Continuous Variable and One Binary Independent Variable with Their Interaction 26
2.4.4 Two Independent Variables: Both Continuous 27
2.4.5 Categorical Independent Variables 29
2.5 Examples in the Medical Literature 31
2.5.1 Analysis of Covariance: One Binary and One Continuous Independent Variable 31
2.5.2 Two Continuous Independent Variables 32
2.6 Assumptions Underlying the Models 32
2.7 Model Sensitivity 33
2.7.1 Residuals, Leverage and Influence 33
2.7.2 Computer Analysis: Model Checking and Sensitivity 34
2.8 Stepwise Regression 35
2.9 Reporting the Results of a Multiple Regression 36
2.10 Reading about the Results of a Multiple Regression 36
2.11 Frequently Asked Questions 37
2.12 Exercises: Reading the Literature 38
3 Multiple Logistic Regression 41
3.1 Quick Revision 41
3.2 The Model 42
3.2.1 Categorical Covariates 44
3.3 Model Checking 44
3.3.1 Lack of Fit 45
3.3.2 "Extra-binomial" Variation or "Over Dispersion" 45
3.3.3 The Logistic Transform is Inappropriate 46
3.4 Uses of Logistic Regression 46
3.5 Interpreting a Computer Output 47
3.5.1 One Binary Independent Variable 47
3.5.2 Two Binary Independent Variables 51
3.5.3 Two Continuous Independent Variables 53
3.6 Examples in the Medical Literature 54
3.6.1 Comment 55
3.7 Case-control Studies 56
3.8 Interpreting Computer Output: Unmatched Case-control Study 56
3.9 Matched Case-control Studies 58
3.10 Interpreting Computer Output: Matched Case-control Study 58
3.11 Example of Conditional Logistic Regression in the Medical Literature 60
3.11.1 Comment 60
3.12 Alternatives to Logistic Regression 61
3.13 Reporting the Results of Logistic Regression 61
3.14 Reading about the Results of Logistic Regression 61
3.15 Frequently Asked Questions 62
3.16 Exercise 62
4 Survival Analysis 65
4.1 Introduction 65
4.2 The Model 66
4.3 Uses of Cox Regression 68
4.4 Interpreting a Computer Output 68
4.5 Interpretation of the Model 70
4.6 Generalisations of the Model 70
4.6.1 Stratified Models 70
4.6.2 Time Dependent Covariates 71
4.6.3 Parametric Survival Models 71
4.6.4 Competing Risks 71
4.7 Model Checking 72
4.8 Reporting the Results of a Survival Analysis 73
4.9 Reading about the Results of a Survival Analysis 74
4.10 Example in the Medical Literature 74
4.10.1 Comment 75
4.11 Frequently Asked Questions 76
4.12 Exercises 77
5 Random Effects Models 79
5.1 Introduction 79
5.2 Models for Random Effects 80
5.3 Random vs Fixed Effects 81
5.4 Use of Random Effects Models 81
5.4.1 Cluster Randomised Trials 81
5.4.2 Repeated Measures 82
5.4.3 Sample Surveys 83
5.4.4 Multi-centre Trials 83
5.5 Ordinary Least Squares at the Group Level 84
5.6 Interpreting a Computer Output 85
5.6.1 Different Methods of Analysis 85
5.6.2 Likelihood and gee 85
5.6.3 Interpreting Computer Output 86
5.7 Model Checking 89
5.8 Reporting the Results of Random Effects Analysis 89
5.9 Reading about the Results of Random Effects Analysis 90
5.10 Examples of Random Effects Models in the Medical Literature 90
5.10.1 Cluster Trials 90
5.10.2 Repeated Measures 91
5.10.3 Comment 91
5.10.4 Clustering in a Cohort Study 91
5.10.5 Comment 91
5.11 Frequently Asked Questions 91
5.12 Exercises 92
6 Poisson and Ordinal Regression 95
6.1 Poisson Regression 95
6.2 The Poisson Model 95
6.3 Interpreting a Computer Output: Poisson Regression 96
6.4 Model Checking for Poisson Regression 97
6.5 Extensions to Poisson Regression 99
6.6 Poisson Regression Used to Estimate Relative Risks from a 2 × 2 Table 99
6.7 Poisson Regression in the Medical Literature 100
6.8 Ordinal Regression 100
6.9 Interpreting a Computer Output: Ordinal Regression 101
6.10 Model Checking for Ordinal Regression 103
6.11 Ordinal Regression in the Medical Literature 104
6.12 Reporting the Results of Poisson or Ordinal Regression 104
6.13 Reading about the Results of Poisson or Ordinal Regression 104
6.14 Frequently Asked Question 105
6.15 Exercises 105
7 Meta-analysis 107
7.1 Introduction 107
7.2 Models for Meta-analysis 108
7.3 Missing Values 111
7.4 Displaying the Results of a Meta-analysis 111
7.5 Interpreting a Computer Output 113
7.6 Examples from the Medical Literature 114
7.6.1 Example of a Meta-analysis of Clinical Trials 114
7.6.2 Example of a Meta-analysis of Case-control Studies 115
7.7 Reporting the Results of a Meta-analysis 115
7.8 Reading about the Results of a Meta-analysis 116
7.9 Frequently Asked Questions 116
7.10 Exercise 118
8 Time Series Regression 121
8.1 Introduction 121
8.2 The Model 122
8.3 Estimation Using Correlated Residuals 122
8.4 Interpreting a Computer Output: Time Series Regression 123
8.5 Example of Time Series Regression in the Medical Literature 124
8.6 Reporting the Results of Time Series Regression 125
8.7 Reading about the Results of Time Series Regression 125
8.8 Frequently Asked Questions 125
8.9 Exercise 126
Appendix 1 Exponentials and Logarithms 129
Appendix 2 Maximum Likelihood and Significance Tests 133
A2. 1 Binomial Models and Likelihood 133
A. 2 The Poisson Model 135
A2. 3 The Normal Model 135
A2. 4 Hypothesis Testing: the Likelihood Ratio Test 137
A2. 5 The Wald Test 138
A2. 6 The Score Test 138
A2. 7 Which Method to Choose? 139
A2. 8 Confidence Intervals 139
A2. 9 Deviance Residuals for Binary Data 140
A2. 10 Example: Derivation of the Deviances and Deviance Residuals Given in Table 3.3 140
A2.10.1 Grouped Data 140
A2.10.2 Ungrouped Data 140
Appendix 3 Bootstrapping and Variance Robust Standard Errors 143
A3.1 The Bootstrap 143
A3.2 Example of the Bootstrap 144
A3.3 Interpreting a Computer Output: The Bootstrap 145
A3.3.1 Two-sample T-test with Unequal Variances 145
A3.4 The Bootstrap in the Medical Literature 145
A3.5 Robust or Sandwich Estimate SEs 146
A3.6 Interpreting a Computer Output: Robust SEs for Unequal Variances 147
A3.7 Other Uses of Robust Regression 149
A3.8 Reporting the Bootstrap and Robust SEs in the Literature 149
A3.9 Frequently Asked Question 150
Appendix 4 Bayesian Methods 151
A4.1 Bayes' Theorem 151
A4.2 Uses of Bayesian Methods 152
A4.3 Computing in Bayes 153
A4.4 Reading and Reporting Bayesian Methods in the Literature 154
A4.5 Reading about the Results of Bayesian Methods in the Medical Literature 154
Appendix 5 R codes 157
A5. 1 R Code for Chapter 2 157
A5. 3 R Code for Chapter 3 163
A5. 4 R Code for Chapter 4 166
A. 5 R Code for Chapter 5 168
A5. 6 R Code for Chapter 6 170
A5. 7 R Code for Chapter 7 171
A5. 8 R Code for Chapter 8 173
A5. 9 R Code for Appendix 1 173
A5. 10 R Code for Appendix 2 174
A5. 11 R Code for Appendix 3 175
Answers to Exercises 179
Glossary 185
Index 191
1
Models, Tests and Data
Summary
This chapter covers some of the basic concepts in statistical analysis, which are covered in greater depth in Statistics at Square One. It introduces the idea of a statistical model and then links it to statistical tests. The use of statistical models greatly expands the utility of statistical analysis. In particular, they allow the analyst to examine how a variety of variables may affect the result.
1.1 Types of Data
Data can be divided into two main types: quantitative and qualitative. Quantitative data tend to be either continuous variables that one can measure (such as height, weight or blood pressure) or discrete (such as numbers of children per family or numbers of attacks of asthma per child per month). Thus, count data are discrete and quantitative. Continuous variables are often described as having a Normal distribution, or being non-Normal. Having a Normal distribution means that if you plot a histogram of the data it would follow a particular "bell-shaped" curve. In practice, provided the data cluster about a single central point, and the distribution is symmetric about this point, it would be commonly considered close enough to Normal for most tests requiring Normality to be valid. Here one would expect the mean and median to be close. Non-Normal distributions tend to have asymmetric distributions (skewed) and the means and medians differ. Examples of non-Normally distributed variables include ages and salaries in a population. Sometimes the asymmetry is caused by outlying points that are in fact errors in the data and these need to be examined with care.
Note that it is a misnomer to talk of "non-parametric" data instead of non-Normally distributed data. Parameters belong to models, and what is meant by "non-parametric" data is data to which we cannot apply models, although as we shall see later, this is often a too limited view of statistical methods. An important feature of quantitative data is that you can deal with the numbers as having real meaning, so for example you can take averages of the data. This is in contrast to qualitative data, where the numbers are often convenient labels and have no quantitative value.
Qualitative data tend to be categories, thus people are male or female, European, American or Japanese, they have a disease or are in good health and can be described as nominal or categorical. If there are only two categories they are described as binary data. Sometimes the categories can be ordered, so for example a person can "get better", "stay the same" or "get worse". These are ordinal data. Often these will be scored, say, 1, 2, 3, but if you had two patients, one of whom got better and one of whom got worse, it makes no sense to say that on average they stayed the same (a statistician is someone with their head in the oven and their feet in the fridge, but on average they are comfortable!). The important feature about ordinal data is that they can be ordered, but there is no obvious weighting system. For example, it is unclear how to weight "healthy", "ill" or "dead" as outcomes. (Often, as we shall see later, either scoring by giving consecutive whole numbers to the ordered categories and treating the ordinal variable as a quantitative variable or dichotomising the variable and treating it as binary may work well.) Count data, such as numbers of children per family appear ordinal, but here the important feature is that arithmetic is possible (2.4 children per family is meaningful). This is sometimes described as having ratio properties. A family with four children has twice as many children as a family with two, but if we had an ordinal variable with four categories, say "strongly agree", "agree", "disagree" and "strongly disagree", and scored them 1-4, we cannot say that "strongly disagree", scored 4, is twice "agree", scored 2.
Qualitative data can also be formed by categorising continuous data. Thus, blood pressure is a continuous variable, but it can be split into "normotension" or "hypertension". This often makes it easier to summarise, for example 10% of the population have hypertension is easier to comprehend than a statement giving the mean and standard deviation of blood pressure in the population, although from the latter one could deduce the former (and more besides). Note that qualitative data is not necessarily associated with qualitative research. Qualitative research is of rising importance and complements quantitative research. The name derives because it does not quantify measures, but rather identifies themes, often using interviews and focus groups.
It is a parody to suggest that statisticians prefer not to dichotomise data and researchers always do it, but there is a grain of truth in it. Decisions are often binary: treat or not treat. It helps to have a "cut-off", for example treat with anti-hypertensive if diastolic blood pressure is >90 mmHg, although more experienced clinicians would take into account other factors related to the patient's condition and use the cut-off as a point when their likelihood of treating increases. However, statisticians point out the loss of information when data are dichotomised, and are also suspicious of arbitrary cut-offs, which may have been chosen to present a conclusion desired by a researcher. Although there may be good reasons for a cut-off, they are often opaque, for example deaths from Covid are defined as deaths occurring within 30 days of a positive Covid test. Why 30 days, and not 4 weeks (which would be easier to implement) or 3 months? Clearly ten years is too long. In this case it probably matters little which period of time is chosen but it shows how cut-offs are often required and the justification may be lost.
1.2 Confounding, Mediation and Effect Modification
Much medical research can be simplified as an investigation of an input-output relationship. The inputs, or explanatory variables, are thought to be related to the outcome or effect. We wish to investigate whether one or more of the input variables are plausibly causally related to the effect. The relationship is complicated by other factors that are thought to be related to both the cause and the effect; these are confounding factors. A simple example would be the relationship between stress and high blood pressure. Does stress cause high blood pressure? Here the causal variable is a measure of stress, which we assume can be quantified either as a binary or continuous variable, and the outcome is a blood pressure measurement. A confounding factor might be gender; men may be more prone to stress, but they may also be more prone to high blood pressure. If gender is a confounding factor, a study would need to take gender into account. A more precise definition of a confounder states that a confounder should "not be on the causal pathway". For example stress may cause people to drink more alcohol, and it is the increased alcohol consumption which causes high blood pressure. In this case alcohol consumption is not a confounder, and is often termed a mediator.
Another type of variable is an effect modifier. Again, it is easier to explain using an example. It is possible that older people are more likely than younger people to suffer high blood pressure when stressed. Age is not a confounder if older people are not more likely to be stressed than younger people. However, if we had two populations with different age distributions our estimate of the effect of stress on blood pressure would be different in the two populations if we didn't allow for age. Crudely, we wish to remove the effects of confounders, but study effect modifiers.
An important start in the analysis of data is to determine which variables are outputs and which variables are inputs, and of the latter which do we wish to investigate as causal, and which are confounders or effect modifiers. Of course, depending on the question, a variable might serve as any of these. In a survey of the effects of smoking on chronic bronchitis, smoking is a causal variable. In a clinical trial to examine the effects of cognitive behavioural therapy on smoking habit, smoking is an outcome. In the above study of stress and high blood pressure, smoking may also be a confounder.
A common error is to decide which of the variables are confounders by doing significance tests. One might see in a paper: "only variables that were significantly related to the output were included in the model." One issue with this is it makes it more difficult to repeat the research; a different researcher may get a different set of confounders. In later chapters we will discuss how this could go under the name of "stepwise" regression. We emphasise that significance tests are not a good method of choosing the variable to go in a model.
In summary, before any analysis is done, and preferably in the original protocol, the investigator should decide on the causal, outcome and confounder variables. An exploration of how variables relate in a model is given in Section 1.10.
1.3 Causal Inference
Causal inference is a new area of statistics that examines the relationship between a putative cause and an outcome. A useful and simple method of displaying a causal model is with a Direct Acyclic Graph (DAG).1 They can be used to explain the definitions given in the previous section. There are two key features to DAGs: (1) they show...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.