
Medical Statistics from Scratch
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
In an informal and friendly style, Medical Statistics from Scratch provides a practical foundation for everyone whose first interest is probably not medical statistics. Keeping the level of mathematics to a minimum, it clearly illustrates statistical concepts and practice with numerous real-world examples and cases drawn from current medical literature.
Medical Statistics from Scratch is an ideal learning partner for all medical students and health professionals needing an accessible introduction, or a friendly refresher, to the fundamentals of medical statistics.
More details
Other editions
Additional editions

Person
DAVID BOWERS, Leeds Institute of Health Sciences, School of Medicine, University of Leeds, Leeds, UK.
Content
Preface to the 4th Edition xix
Preface to the 3rd Edition xxi
Preface to the 2nd Edition xxiii
Preface to the 1st Edition xxv
Introduction xxvii
I Some Fundamental Stuff 1
1 First things first - the nature of data 3
Variables and data 3
Where are we going ...? 5
The good, the bad, and the ugly - types of variables 5
Categorical data 6
Nominal categorical data 6
Ordinal categorical data 7
Metric data 8
Discrete metric data 8
Continuous metric data 9
How can I tell what type of variable I am dealing with? 10
The baseline table 11
II Descriptive Statistics 15
2 Describing data with tables 17
Descriptive statistics. What can we do with raw data? 18
Frequency tables - nominal data 18
The frequency distribution 19
Relative frequency 20
Frequency tables - ordinal data 20
Frequency tables - metric data 22
Frequency tables with discrete metric data 22
Cumulative frequency 24
Frequency tables with continuous metric data - grouping the raw data 25
Open-ended groups 27
Cross-tabulation - contingency tables 28
Ranking data 30
3 Every picture tells a story - describing data with charts 31
Picture it! 32
Charting nominal and ordinal data 32
The pie chart 32
The simple bar chart 34
The clustered bar chart 35
The stacked bar chart 37
Charting discrete metric data 39
Charting continuous metric data 39
The histogram 39
The box (and whisker) plot 42
Charting cumulative data 44
The cumulative frequency curve with discrete metric data 44
The cumulative frequency curve with continuous metric data 44
Charting time-based data - the time series chart 47
The scatterplot 48
The bubbleplot 49
4 Describing data from its shape 51
The shape of things to come 51
Skewness and kurtosis as measures of shape 52
Kurtosis 55
Symmetric or mound-shaped distributions 56
Normalness - the Normal distribution 56
Bimodal distributions 58
Determining skew from a box plot 59
5 Measures of location - Numbers R us 62
Numbers, percentages, and proportions 62
Preamble 63
N umbers, percentages, and proportions 64
Handling percentages - for those of us who might need a reminder 65
Summary measures of location 67
The mode 68
The median 69
The mean 70
Percentiles 71
Calculating a percentile value 72
What is the most appropriate measure of location? 73
6 Measures of spread - Numbers R us - (again) 75
Preamble 76
The range 76
The interquartile range (IQR) 76
Estimating the median and interquartile range from the cumulative frequency curve 77
The boxplot (also known as the box and whisker plot) 79
Standard deviation 82
Standard deviation and the Normal distribution 84
Testing for Normality 86
Using SPSS 86
Using Minitab 87
Transforming data 88
7 Incidence, prevalence, and standardisation 92
Preamble 93
The incidence rate and the incidence rate ratio (IRR) 93
The incidence rate ratio 94
Prevalence 94
A couple of difficulties with measuring incidence and prevalence 97
Some other useful rates 97
Crude mortality rate 97
Case fatality rate 98
Crude maternal mortality rate 99
Crude birth rate 99
Attack rate 99
Age-specific mortality rate 99
Standardisation - the age-standardised mortality rate 101
The direct method 102
The standard population and the comparative mortality ratio (CMR) 103
The indirect method 106
The standardised mortality rate 107
III The Confounding Problem 111
8 Confounding - like the poor, (nearly) always with us 113
Preamble 114
What is confounding? 114
Confounding by indication 117
Residual confounding 119
Detecting confounding 119
Dealing with confounding - if confounding is such a problem, what can we do about it? 120
Using restriction 120
Using matching 121
Frequency matching 121
One-to-one matching 121
Using stratification 122
Using adjustment 122
Using randomisation 122
IV Design and Data 125
9 Research design - Part I: Observational study designs 127
Preamble 128
Hey ho! Hey ho! it's off to work we go 129
Types of study 129
Observational studies 130
Case reports 130
Case series studies 131
Cross-sectional studies 131
Descriptive cross-sectional studies 132
Confounding in descriptive cross-sectional studies 132
Analytic cross-sectional studies 133
Confounding in analytic cross-sectional studies 134
From here to eternity - cohort studies 135
Confounding in the cohort study design 139
Back to the future - case-control studies 139
Confounding in the case-control study design 141
Another example of a case-control study 142
Comparing cohort and case-control designs 143
Ecological studies 144
The ecological fallacy 145
10 Research design - Part II: getting stuck in - experimental studies 146
Clinical trials 147
Randomisation and the randomised controlled trial (RCT) 148
Block randomisation 149
Stratification 149
Blinding 149
The crossover RCT 150
Selection of participants for an RCT 153
Intention to treat analysis (ITT) 154
11 Getting the participants for your study: ways of sampling 156
From populations to samples - statistical inference 157
Collecting the data - types of sample 158
The simple random sample and its offspring 159
The systematic random sample 159
The stratified random sample 160
The cluster sample 160
Consecutive and convenience samples 161
How many participants should we have? Sample size 162
Inclusion and exclusion criteria 162
Getting the data 163
V Chance Would Be a Fine Thing 165
12 The idea of probability 167
Preamble 167
Calculating probability - proportional frequency 168
Two useful rules for simple probability 169
Rule 1. The multiplication rule for independent events 169
Rule 2. The addition rule for mutually exclusive events 170
Conditional and Bayesian statistics 171
Probability distributions 171
Discrete versus continuous probability distributions 172
The binomial probability distribution 172
The Poisson probability distribution 173
The Normal probability distribution 174
13 Risk and odds 175
Absolute risk and the absolute risk reduction (ARR) 176
The risk ratio 178
The reduction in the risk ratio (or relative risk reduction (RRR)) 178
A general formula for the risk ratio 179
Reference value 179
N umber needed to treat (NNT) 180
What happens if the initial risk is small? 181
Confounding with the risk ratio 182
Odds 183
Why you can't calculate risk in a case-control study 185
The link between probability and odds 186
The odds ratio 186
Confounding with the odds ratio 189
Approximating the risk ratio from the odds ratio 189
VI The Informed Guess - An Introduction to Confidence Intervals 191
14 Estimating the value of a single population parameter - the idea of confidence intervals 193
Confidence interval estimation for a population mean 194
The standard error of the mean 195
How we use the standard error of the mean to calculate a confidence interval for a population mean 197
Confidence interval for a population proportion 200
Estimating a confidence interval for the median of a single population 203
15 Using confidence intervals to compare two population parameters 206
What's the difference? 207
Comparing two independent population means 207
An example using birthweights 208
Assessing the evidence using the confidence interval 211
Comparing two paired population means 215
Within-subject and between-subject variations 215
Comparing two independent population proportions 217
Comparing two independent population medians - the Mann-Whitney rank sums method 219
Comparing two matched population medians - the Wilcoxon signed-ranks method 220
16 Confidence intervals for the ratio of two population parameters 224
Getting a confidence interval for the ratio of two independent population means 225
Confidence interval for a population risk ratio 226
Confidence intervals for a population odds ratio 229
Confidence intervals for hazard ratios 232
VII Putting it to the Test 235
17 Testing hypotheses about the difference between two population parameters 237
Answering the question 238
The hypothesis 238
The null hypothesis 239
The hypothesis testing process 240
The p-value and the decision rule 241
A brief summary of a few of the commonest tests 242
Using the p-value to compare the means of two independent populations 244
Interpreting computer hypothesis test results for the difference in two independent population means - the two-sample t test 245
Output from Minitab - two-sample t test of difference in mean birthweights of babies born to white mothers and to non-white mothers 245
Output from SPSS_: two-sample t test of difference in mean birthweights of babies born to white mothers and to non-white mothers 246
Comparing the means of two paired populations - the matched-pairs t test 248
Using p-values to compare the medians of two independent populations: the Mann-Whitney rank-sums test 248
How the Mann-Whitney test works 249
Correction for multiple comparisons 250
The Bonferroni correction for multiple testing 250
Interpreting computer output for the Mann-Whitney test 252
With Minitab 252
With SPSS 252
Two matched medians - the Wilcoxon signed-ranks test 254
Confidence intervals versus hypothesis testing 254
What could possibly go wrong? 255
Types of error 256
The power of a test 257
Maximising power - calculating sample size 258
Rule of thumb 1. Comparing the means of two independent populations (metric data) 258
Rule of thumb 2. Comparing the proportions of two independent populations (binary data) 259
18 The Chi-squared (¿2) test - what, why, and how? 261
Of all the tests in all the world - you had to walk into my hypothesis testing procedure 262
Using chi-squared to test for related-ness or for the equality of proportions 262
Calculating the chi-squared statistic 265
Using the chi-squared statistic 267
Yate's correction (continuity correction) 268
Fisher's exact test 268
The chi-squared test with Minitab 269
The chi-squared test with SPSS 270
The chi-squared test for trend 272
SPSS output for chi-squared trend test 274
19 Testing hypotheses about the ratio of two population parameters 276
Preamble 276
The chi-squared test with the risk ratio 277
The chi-squared test with odds ratios 279
The chi-squared test with hazard ratios 281
VIII Becoming Acquainted 283
20 Measuring the association between two variables 285
Preamble - plotting data 286
Association 287
The scatterplot 287
The correlation coefficient 290
Pearson's correlation coefficient 290
Is the correlation coefficient statistically significant in the population? 292
Spearman's rank correlation coefficient 294
21 Measuring agreement 298
To agree or not agree: that is the question 298
Cohen's kappa (¿) 300
Some shortcomings of kappa 303
Weighted kappa 303
Measuring the agreement between two metric continuous variables, the Bland-Altmann plot 303
IX Getting into a Relationship 307
22 Straight line models: linear regression 309
Health warning! 310
Relationship and association 310
A causal relationship - explaining variation 312
Refresher - finding the equation of a straight line from a graph 313
The linear regression model 314
First, is the relationship linear? 315
Estimating the regression parameters - the method of ordinary least squares (OLS) 316
Basic assumptions of the ordinary least squares procedure 317
Back to the example - is the relationship statistically significant? 318
Using SPSS to regress birthweight on mother's weight 318
Using Minitab 319
Interpreting the regression coefficients 320
Goodness-of-fit, R2 320
Multiple linear regression 322
Adjusted goodness-of-fit: R¯2 324
Including nominal covariates in the regression model: design variables and coding 326
Building your model. Which variables to include? 327
Automated variable selection methods 328
Manual variable selection methods 329
Adjustment and confounding 330
Diagnostics - checking the basic assumptions of the multiple linear regression model 332
Analysis of variance 333
23 Curvy models: logistic regression 334
A second health warning! 335
The binary outcome variable 335
Finding an appropriate model when the outcome variable is binary 335
The logistic regression model 337
Estimating the parameter values 338
Interpreting the regression coefficients 338
Have we got a significant result? statistical inference in the logistic regression model 340
The Odds Ratio 341
The multiple logistic regression model 343
Building the model 344
Goodness-of-fit 346
24 Counting models: Poisson regression 349
Preamble 350
Poisson regression 350
The Poisson regression equation 351
Estimating ß1 and ß2 with the estimators b0 and b1 352
Interpreting the estimated coefficients of a Poisson regression, b0 and b1 352
Model building - variable selection 355
Goodness-of-fit 357
Zero-inflated Poisson regression 358
Negative binomial regression 359
Zero-inflated negative binomial regression 361
X Four More Chapters 363
25 Measuring survival 365
Preamble 366
Censored data 366
A simple example of survival in a single group 366
Calculating survival probabilities and the proportion surviving: the Kaplan-Meier table 368
The Kaplan-Meier curve 369
Determining median survival time 369
Comparing survival with two groups 370
The log-rank test 371
An example of the log-rank test in practice 372
The hazard ratio 372
The proportional hazards (Cox's) regression model - introduction 373
The proportional hazards (Cox's) regression model - the detail 376
Checking the assumptions of the proportional hazards model 377
An example of proportional hazards regression 377
26 Systematic review and meta-analysis 380
Introduction 381
Systematic review 381
The forest plot 383
Publication and other biases 384
The funnel plot 386
Significance tests for bias - Begg's and Egger's tests 387
Combining the studies: meta-analysis 389
The problem of heterogeneity - the Q and I2 tests 389
27 Diagnostic testing 393
Preamble 393
The measures - sensitivity and specificity 394
The positive prediction and negative prediction values (PPV and NPV) 395
The sensitivity-specificity trade-off 396
Using the ROC curve to find the optimal sensitivity versus specificity trade-off 397
28 Missing data 400
The missing data problem 400
Types of missing data 403
Missing completely at random (MCAR) 403
Missing at Random (MAR) 403
Missing not at random (MNAR) 404
Consequences of missing data 405
Dealing with missing data 405
Do nothing - the wing and prayer approach 406
List-wise deletion 406
Pair-wise deletion 407
Imputation methods - simple imputation 408
Replacement by the Mean 408
Last observation carried forward 409
Regression-based imputation 410
Multiple imputation 411
Full Information Maximum Likelihood (FIML) and other methods 412
Appendix: Table of random numbers 414
References 415
Solutions to Exercises 424
Index 457
1
First things first - the nature of data
Learning objectives
When you have finished this chapter, you should be able to:
- Explain the difference between nominal, ordinal, and metric data.
- Identify the type of any given variable.
- Explain the non-numeric nature of ordinal data.
Variables and data
Let's start with some numbers. Have a look at Figure 1.1.
Figure 1.1 Some numbers. Actually, the birthweight (g) of a sample of 100 babies.
Source: Data from the Born in Bradford cohort study (Born in Bradford, 2012).
These numbers are actually the birthweights of a sample of 100 babies (measured in grams). We call these numbers sample data. These data arise from the variable birthweight. To state the blindingly obvious, a variable is something whose value can vary. Other variables could be blood type, age, parity, and so on; the values of these variables can change from one individual to another. When we measure a variable, we get data - in this case, the variable birthweight produces birthweight data.
Figure 1.2 contains more sample data, in this case, for the gender of the same 100 babies.
Figure 1.2 The gender of the sample of babies in Figure 1.1.
Moreover, Figure 1.3 contains sample data for the variable smoked while pregnant.
Figure 1.3 The variable "smoked while pregnant?" for the mothers of the babies in Figure 1.1.
The data in Figures 1.1-1.3 are known as raw data because they have not been organised or arranged in any way. This makes it difficult to see what interesting characteristics or features the data might contain. The data cannot tell its story, if you like. For example, it is not easy to observe how many babies had a low birthweight (less than 2500 g) from Figure 1.1, or what proportion of the babies were female from Figure 1.2. Moreover, this is for only 100 values. Imagine how much more difficult it would be for 500 or 5000 values. In the next four chapters, we will discuss a number of different ways that we can organise data so that it can tell its story. Then, we can see more easily what's going on.
Exercise 1.1. Why do you think that the data in Figures 1.1-1.3 are referred to as "sample data"?
Exercise 1.2. What percentage of mothers smoked during their pregnancy? How does your value contrast with the evidence which suggests that about 20% of mothers in the United Kingdom smoked when pregnant?
Of course, we gather data not because it is nice to look at or we've got nothing better to do but because we want to answer a question. Such as "Do the babies of mothers who smoked while pregnant have a different (we're probably guessing lower) birthweight than the babies of mothers who did not smoke?" or "On average, do male babies have the same birthweight as female babies?" Later in the book, we will deal with methods which you can use to answer such questions (and ones more complex); however, for now, we need to stick with variables and data.
Where are we going .?
- This book is an introduction to medical statistics.
- Medical statistics is about doing things with data.
- We get data when we determine the value of a variable.
- We need data in order to answer a question.
- What we can do with data depends on what type of data it is.
The good, the bad, and the ugly - types of variables
There are two major types of variable - categorical variables and metric variables; each of them can be further divided into two subtypes, as shown in Figure 1.4.
Figure 1.4 Types of variables.
Each of these variable types produces a different type of data. The differences in these data types are of great importance - some statistical methods are appropriate for some types of data but not for others, and applying an inappropriate procedure can result in a misleading outcome. It is therefore critical that you identify the sort of variable (and data) you are dealing with before you begin any analysis, and we need therefore to examine the differences in data types in a bit more detail. From now on, I will be using the word "data" rather than "variable" because it is the data we will be working with - but remember that data come from variables. We'll start with categorical data.
Categorical data
Nominal categorical data
Consider the gender data shown in Figure 1.2. These data are nominal categorical data (or just nominal data for short).
The data are "nominal" because it usually relates to named things, such as occupation, blood type, or ethnicity. It is particularly not numeric. It is "categorical" because we allocate each value to a specific category. Therefore, for example, we allocate each M value in Figure 1.2 to the category Male and each F value to the category Female. If we do this for all 100 values, we get:
Male 56 Female 44Notice two things about this data, which is typical of all nominal data:
- The data do not have any units of measurement.1
- The ordering of the categories is arbitrary. In other words, the categories cannot be ordered in any meaningful way.2 We could just as easily have written the number of males and females in the order:
By the way, allocating values to categories by hand is pretty tedious as well as error-prone, more so if there are a lot of values. In practice, you would use a computer to do this.
Exercise 1.3. Suggest a few nominal variables.
Ordinal categorical data
Let's now consider data from the Glasgow Coma Scale (GCS) (which some of you may be familiar with). As the name suggests, this scale is used to assess the level of consciousness after head injury. A patient's GCS score is judged by the sum of responses in three areas: eye opening response, verbal response, and motor response. Notice particularly that these responses are assessed rather than measured (as weight, height, or temperature would be). The GCS score can vary from 3 (deeply unconscious) to 15 (fully conscious). In other words, there are 13 possible categories of consciousness.3
Suppose that we have two motorcyclists, let us call them Wayne and Kylie, who have been admitted to the emergency department with head injuries following a road traffic accident. Wayne has a GCS of 5 and Kylie a GCS of 10. We can say that Wayne's level of consciousness is less than that of Kylie (so we can order the values) but we can't say exactly by how much. We certainly cannot say that Wayne is exactly half as conscious as Kylie. Moreover, the levels of consciousness between adjacent scores are not necessarily the same; for example, the difference in the levels of consciousness between two patients with GCS scores of 10 and 11 may not be the same as that between patients with scores of 11 and 12. It's therefore important to recognise that we cannot quantify these differences.
GCS data is ordinal categorical (or just ordinal) data. It is ordinal because the values can be meaningfully ordered, and it is categorical because each value is assigned to a specific category. Notice two things about this variable, which is typical of all ordinal variables:
- The data do not have any units of measurement (so the same as that for nominal variables).
- The ordering of the categories is not arbitrary, as it is with nominal variables.
The seemingly numeric values of ordinal data, such as GCS scores, are not in fact real numbers but only numeric labels which we attach to category values (usually for convenience or for data entry to a computer). The reason is of course (to re-emphasise this important point) that GCS data, and the data generated by most other scales, are not properly measured but assessed in some way by a clinician or a researcher, working with the individual concerned.4 This is a characteristic of all ordinal data.
Because ordinal data are not real numbers, it is not appropriate to apply any of the rules of basic arithmetic to this sort of data. You should not add, subtract, multiply, or divide ordinal values. This limitation has marked implications for the sorts of analyses that we can do with such data - as you will see later in this book. Finally, we should note that ordinal data are almost always integer, that is, they have whole number values.
Exercise 1.4. Suggest a few more scales with which you may be familiar from your clinical work.
Exercise 1.5. Explain why it would not...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.