Medical Statistics from Scratch

Name: Medical Statistics from Scratch | An Introduction for Health Professionals
Brand: Wiley
Price: 40.99 EUR
Availability: OnlineOnly

An Introduction for Health Professionals

David Bowers(Author)

Wiley (Publisher)

4th Edition

Published on 16. August 2019

495 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-119-52394-9 (ISBN)

€40.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Other editions

Person

Content

Preface to the 4th Edition xix

Preface to the 3rd Edition xxi

Preface to the 2nd Edition xxiii

Preface to the 1st Edition xxv

Introduction xxvii

I Some Fundamental Stuff 1

1 First things first - the nature of data 3

Variables and data 3

Where are we going ...? 5

The good, the bad, and the ugly - types of variables 5

Categorical data 6

Nominal categorical data 6

Ordinal categorical data 7

Metric data 8

Discrete metric data 8

Continuous metric data 9

How can I tell what type of variable I am dealing with? 10

The baseline table 11

II Descriptive Statistics 15

2 Describing data with tables 17

Descriptive statistics. What can we do with raw data? 18

Frequency tables - nominal data 18

The frequency distribution 19

Relative frequency 20

Frequency tables - ordinal data 20

Frequency tables - metric data 22

Frequency tables with discrete metric data 22

Cumulative frequency 24

Frequency tables with continuous metric data - grouping the raw data 25

Open-ended groups 27

Cross-tabulation - contingency tables 28

Ranking data 30

3 Every picture tells a story - describing data with charts 31

Picture it! 32

Charting nominal and ordinal data 32

The pie chart 32

The simple bar chart 34

The clustered bar chart 35

The stacked bar chart 37

Charting discrete metric data 39

Charting continuous metric data 39

The histogram 39

The box (and whisker) plot 42

Charting cumulative data 44

The cumulative frequency curve with discrete metric data 44

The cumulative frequency curve with continuous metric data 44

Charting time-based data - the time series chart 47

The scatterplot 48

The bubbleplot 49

4 Describing data from its shape 51

The shape of things to come 51

Skewness and kurtosis as measures of shape 52

Kurtosis 55

Symmetric or mound-shaped distributions 56

Normalness - the Normal distribution 56

Bimodal distributions 58

Determining skew from a box plot 59

5 Measures of location - Numbers R us 62

Numbers, percentages, and proportions 62

Preamble 63

N umbers, percentages, and proportions 64

Handling percentages - for those of us who might need a reminder 65

Summary measures of location 67

The mode 68

The median 69

The mean 70

Percentiles 71

Calculating a percentile value 72

What is the most appropriate measure of location? 73

6 Measures of spread - Numbers R us - (again) 75

Preamble 76

The range 76

The interquartile range (IQR) 76

Estimating the median and interquartile range from the cumulative frequency curve 77

The boxplot (also known as the box and whisker plot) 79

Standard deviation 82

Standard deviation and the Normal distribution 84

Testing for Normality 86

Using SPSS 86

Using Minitab 87

Transforming data 88

7 Incidence, prevalence, and standardisation 92

Preamble 93

The incidence rate and the incidence rate ratio (IRR) 93

The incidence rate ratio 94

Prevalence 94

A couple of difficulties with measuring incidence and prevalence 97

Some other useful rates 97

Crude mortality rate 97

Case fatality rate 98

Crude maternal mortality rate 99

Crude birth rate 99

Attack rate 99

Age-specific mortality rate 99

Standardisation - the age-standardised mortality rate 101

The direct method 102

The standard population and the comparative mortality ratio (CMR) 103

The indirect method 106

The standardised mortality rate 107

III The Confounding Problem 111

8 Confounding - like the poor, (nearly) always with us 113

Preamble 114

What is confounding? 114

Confounding by indication 117

Residual confounding 119

Detecting confounding 119

Dealing with confounding - if confounding is such a problem, what can we do about it? 120

Using restriction 120

Using matching 121

Frequency matching 121

One-to-one matching 121

Using stratification 122

Using adjustment 122

Using randomisation 122

IV Design and Data 125

9 Research design - Part I: Observational study designs 127

Preamble 128

Hey ho! Hey ho! it's off to work we go 129

Types of study 129

Observational studies 130

Case reports 130

Case series studies 131

Cross-sectional studies 131

Descriptive cross-sectional studies 132

Confounding in descriptive cross-sectional studies 132

Analytic cross-sectional studies 133

Confounding in analytic cross-sectional studies 134

From here to eternity - cohort studies 135

Confounding in the cohort study design 139

Back to the future - case-control studies 139

Confounding in the case-control study design 141

Another example of a case-control study 142

Comparing cohort and case-control designs 143

Ecological studies 144

The ecological fallacy 145

10 Research design - Part II: getting stuck in - experimental studies 146

Clinical trials 147

Randomisation and the randomised controlled trial (RCT) 148

Block randomisation 149

Stratification 149

Blinding 149

The crossover RCT 150

Selection of participants for an RCT 153

Intention to treat analysis (ITT) 154

11 Getting the participants for your study: ways of sampling 156

From populations to samples - statistical inference 157

Collecting the data - types of sample 158

The simple random sample and its offspring 159

The systematic random sample 159

The stratified random sample 160

The cluster sample 160

Consecutive and convenience samples 161

How many participants should we have? Sample size 162

Inclusion and exclusion criteria 162

Getting the data 163

V Chance Would Be a Fine Thing 165

12 The idea of probability 167

Preamble 167

Calculating probability - proportional frequency 168

Two useful rules for simple probability 169

Rule 1. The multiplication rule for independent events 169

Rule 2. The addition rule for mutually exclusive events 170

Conditional and Bayesian statistics 171

Probability distributions 171

Discrete versus continuous probability distributions 172

The binomial probability distribution 172

The Poisson probability distribution 173

The Normal probability distribution 174

13 Risk and odds 175

Absolute risk and the absolute risk reduction (ARR) 176

The risk ratio 178

The reduction in the risk ratio (or relative risk reduction (RRR)) 178

A general formula for the risk ratio 179

Reference value 179

N umber needed to treat (NNT) 180

What happens if the initial risk is small? 181

Confounding with the risk ratio 182

Odds 183

Why you can't calculate risk in a case-control study 185

The link between probability and odds 186

The odds ratio 186

Confounding with the odds ratio 189

Approximating the risk ratio from the odds ratio 189

VI The Informed Guess - An Introduction to Confidence Intervals 191

14 Estimating the value of a single population parameter - the idea of confidence intervals 193

Confidence interval estimation for a population mean 194

The standard error of the mean 195

How we use the standard error of the mean to calculate a confidence interval for a population mean 197

Confidence interval for a population proportion 200

Estimating a confidence interval for the median of a single population 203

15 Using confidence intervals to compare two population parameters 206

What's the difference? 207

Comparing two independent population means 207

An example using birthweights 208

Assessing the evidence using the confidence interval 211

Comparing two paired population means 215

Within-subject and between-subject variations 215

Comparing two independent population proportions 217

Comparing two independent population medians - the Mann-Whitney rank sums method 219

Comparing two matched population medians - the Wilcoxon signed-ranks method 220

16 Confidence intervals for the ratio of two population parameters 224

Getting a confidence interval for the ratio of two independent population means 225

Confidence interval for a population risk ratio 226

Confidence intervals for a population odds ratio 229

Confidence intervals for hazard ratios 232

VII Putting it to the Test 235

17 Testing hypotheses about the difference between two population parameters 237

Answering the question 238

The hypothesis 238

The null hypothesis 239

The hypothesis testing process 240

The p-value and the decision rule 241

A brief summary of a few of the commonest tests 242

Using the p-value to compare the means of two independent populations 244

Interpreting computer hypothesis test results for the difference in two independent population means - the two-sample t test 245

Output from Minitab - two-sample t test of difference in mean birthweights of babies born to white mothers and to non-white mothers 245

Output from SPSS_: two-sample t test of difference in mean birthweights of babies born to white mothers and to non-white mothers 246

Comparing the means of two paired populations - the matched-pairs t test 248

Using p-values to compare the medians of two independent populations: the Mann-Whitney rank-sums test 248

How the Mann-Whitney test works 249

Correction for multiple comparisons 250

The Bonferroni correction for multiple testing 250

Interpreting computer output for the Mann-Whitney test 252

With Minitab 252

With SPSS 252

Two matched medians - the Wilcoxon signed-ranks test 254

Confidence intervals versus hypothesis testing 254

What could possibly go wrong? 255

Types of error 256

The power of a test 257

Maximising power - calculating sample size 258

Rule of thumb 1. Comparing the means of two independent populations (metric data) 258

Rule of thumb 2. Comparing the proportions of two independent populations (binary data) 259

18 The Chi-squared (¿²) test - what, why, and how? 261

Of all the tests in all the world - you had to walk into my hypothesis testing procedure 262

Using chi-squared to test for related-ness or for the equality of proportions 262

Calculating the chi-squared statistic 265

Using the chi-squared statistic 267

Yate's correction (continuity correction) 268

Fisher's exact test 268

The chi-squared test with Minitab 269

The chi-squared test with SPSS 270

The chi-squared test for trend 272

SPSS output for chi-squared trend test 274

19 Testing hypotheses about the ratio of two population parameters 276

Preamble 276

The chi-squared test with the risk ratio 277

The chi-squared test with odds ratios 279

The chi-squared test with hazard ratios 281

VIII Becoming Acquainted 283

20 Measuring the association between two variables 285

Preamble - plotting data 286

Association 287

The scatterplot 287

The correlation coefficient 290

Pearson's correlation coefficient 290

Is the correlation coefficient statistically significant in the population? 292

Spearman's rank correlation coefficient 294

21 Measuring agreement 298

To agree or not agree: that is the question 298

Cohen's kappa (¿) 300

Some shortcomings of kappa 303

Weighted kappa 303

Measuring the agreement between two metric continuous variables, the Bland-Altmann plot 303

IX Getting into a Relationship 307

22 Straight line models: linear regression 309

Health warning! 310

Relationship and association 310

A causal relationship - explaining variation 312

Refresher - finding the equation of a straight line from a graph 313

The linear regression model 314

First, is the relationship linear? 315

Estimating the regression parameters - the method of ordinary least squares (OLS) 316

Basic assumptions of the ordinary least squares procedure 317

Back to the example - is the relationship statistically significant? 318

Using SPSS to regress birthweight on mother's weight 318

Using Minitab 319

Interpreting the regression coefficients 320

Goodness-of-fit, R² 320

Multiple linear regression 322

Adjusted goodness-of-fit: R¯²324

Including nominal covariates in the regression model: design variables and coding 326

Building your model. Which variables to include? 327

Automated variable selection methods 328

Manual variable selection methods 329

Adjustment and confounding 330

Diagnostics - checking the basic assumptions of the multiple linear regression model 332

Analysis of variance 333

23 Curvy models: logistic regression 334

A second health warning! 335

The binary outcome variable 335

Finding an appropriate model when the outcome variable is binary 335

The logistic regression model 337

Estimating the parameter values 338

Interpreting the regression coefficients 338

Have we got a significant result? statistical inference in the logistic regression model 340

The Odds Ratio 341

The multiple logistic regression model 343

Building the model 344

Goodness-of-fit 346

24 Counting models: Poisson regression 349

Preamble 350

Poisson regression 350

The Poisson regression equation 351

Estimating ß₁ and ß₂ with the estimators b₀ and b₁ 352

Interpreting the estimated coefficients of a Poisson regression, b₀ and b₁ 352

Model building - variable selection 355

Goodness-of-fit 357

Zero-inflated Poisson regression 358

Negative binomial regression 359

Zero-inflated negative binomial regression 361

X Four More Chapters 363

25 Measuring survival 365

Preamble 366

Censored data 366

A simple example of survival in a single group 366

Calculating survival probabilities and the proportion surviving: the Kaplan-Meier table 368

The Kaplan-Meier curve 369

Determining median survival time 369

Comparing survival with two groups 370

The log-rank test 371

An example of the log-rank test in practice 372

The hazard ratio 372

The proportional hazards (Cox's) regression model - introduction 373

The proportional hazards (Cox's) regression model - the detail 376

Checking the assumptions of the proportional hazards model 377

An example of proportional hazards regression 377

26 Systematic review and meta-analysis 380

Introduction 381

Systematic review 381

The forest plot 383

Publication and other biases 384

The funnel plot 386

Significance tests for bias - Begg's and Egger's tests 387

Combining the studies: meta-analysis 389

The problem of heterogeneity - the Q and I² tests 389

27 Diagnostic testing 393

Preamble 393

The measures - sensitivity and specificity 394

The positive prediction and negative prediction values (PPV and NPV) 395

The sensitivity-specificity trade-off 396

Using the ROC curve to find the optimal sensitivity versus specificity trade-off 397

28 Missing data 400

The missing data problem 400

Types of missing data 403

Missing completely at random (MCAR) 403

Missing at Random (MAR) 403

Missing not at random (MNAR) 404

Consequences of missing data 405

Dealing with missing data 405

Do nothing - the wing and prayer approach 406

List-wise deletion 406

Pair-wise deletion 407

Imputation methods - simple imputation 408

Replacement by the Mean 408

Last observation carried forward 409

Regression-based imputation 410

Multiple imputation 411

Full Information Maximum Likelihood (FIML) and other methods 412

Appendix: Table of random numbers 414

References 415

Solutions to Exercises 424

Index 457

1
First things first - the nature of data

Learning objectives

When you have finished this chapter, you should be able to:

Explain the difference between nominal, ordinal, and metric data.
Identify the type of any given variable.
Explain the non-numeric nature of ordinal data.

Variables and data

Let's start with some numbers. Have a look at Figure 1.1.

Figure 1.1 Some numbers. Actually, the birthweight (g) of a sample of 100 babies.

Source: Data from the Born in Bradford cohort study (Born in Bradford, 2012).

These numbers are actually the birthweights of a sample of 100 babies (measured in grams). We call these numbers sample data. These data arise from the variable birthweight. To state the blindingly obvious, a variable is something whose value can vary. Other variables could be blood type, age, parity, and so on; the values of these variables can change from one individual to another. When we measure a variable, we get data - in this case, the variable birthweight produces birthweight data.

Figure 1.2 contains more sample data, in this case, for the gender of the same 100 babies.

Figure 1.2 The gender of the sample of babies in Figure 1.1.

Moreover, Figure 1.3 contains sample data for the variable smoked while pregnant.

Figure 1.3 The variable "smoked while pregnant?" for the mothers of the babies in Figure 1.1.

The data in Figures 1.1-1.3 are known as raw data because they have not been organised or arranged in any way. This makes it difficult to see what interesting characteristics or features the data might contain. The data cannot tell its story, if you like. For example, it is not easy to observe how many babies had a low birthweight (less than 2500 g) from Figure 1.1, or what proportion of the babies were female from Figure 1.2. Moreover, this is for only 100 values. Imagine how much more difficult it would be for 500 or 5000 values. In the next four chapters, we will discuss a number of different ways that we can organise data so that it can tell its story. Then, we can see more easily what's going on.

Exercise 1.1. Why do you think that the data in Figures 1.1-1.3 are referred to as "sample data"?

Exercise 1.2. What percentage of mothers smoked during their pregnancy? How does your value contrast with the evidence which suggests that about 20% of mothers in the United Kingdom smoked when pregnant?

Of course, we gather data not because it is nice to look at or we've got nothing better to do but because we want to answer a question. Such as "Do the babies of mothers who smoked while pregnant have a different (we're probably guessing lower) birthweight than the babies of mothers who did not smoke?" or "On average, do male babies have the same birthweight as female babies?" Later in the book, we will deal with methods which you can use to answer such questions (and ones more complex); however, for now, we need to stick with variables and data.

Where are we going .?

This book is an introduction to medical statistics.
Medical statistics is about doing things with data.
We get data when we determine the value of a variable.
We need data in order to answer a question.
What we can do with data depends on what type of data it is.

The good, the bad, and the ugly - types of variables

There are two major types of variable - categorical variables and metric variables; each of them can be further divided into two subtypes, as shown in Figure 1.4.

Figure 1.4 Types of variables.

Each of these variable types produces a different type of data. The differences in these data types are of great importance - some statistical methods are appropriate for some types of data but not for others, and applying an inappropriate procedure can result in a misleading outcome. It is therefore critical that you identify the sort of variable (and data) you are dealing with before you begin any analysis, and we need therefore to examine the differences in data types in a bit more detail. From now on, I will be using the word "data" rather than "variable" because it is the data we will be working with - but remember that data come from variables. We'll start with categorical data.

Categorical data

Nominal categorical data

Consider the gender data shown in Figure 1.2. These data are nominal categorical data (or just nominal data for short).

The data are "nominal" because it usually relates to named things, such as occupation, blood type, or ethnicity. It is particularly not numeric. It is "categorical" because we allocate each value to a specific category. Therefore, for example, we allocate each M value in Figure 1.2 to the category Male and each F value to the category Female. If we do this for all 100 values, we get:

Male 56 Female 44

Notice two things about this data, which is typical of all nominal data:

The data do not have any units of measurement.1
The ordering of the categories is arbitrary. In other words, the categories cannot be ordered in any meaningful way.2 We could just as easily have written the number of males and females in the order:

Female 44 Male 56

By the way, allocating values to categories by hand is pretty tedious as well as error-prone, more so if there are a lot of values. In practice, you would use a computer to do this.

Exercise 1.3. Suggest a few nominal variables.

Ordinal categorical data

Let's now consider data from the Glasgow Coma Scale (GCS) (which some of you may be familiar with). As the name suggests, this scale is used to assess the level of consciousness after head injury. A patient's GCS score is judged by the sum of responses in three areas: eye opening response, verbal response, and motor response. Notice particularly that these responses are assessed rather than measured (as weight, height, or temperature would be). The GCS score can vary from 3 (deeply unconscious) to 15 (fully conscious). In other words, there are 13 possible categories of consciousness.3

Suppose that we have two motorcyclists, let us call them Wayne and Kylie, who have been admitted to the emergency department with head injuries following a road traffic accident. Wayne has a GCS of 5 and Kylie a GCS of 10. We can say that Wayne's level of consciousness is less than that of Kylie (so we can order the values) but we can't say exactly by how much. We certainly cannot say that Wayne is exactly half as conscious as Kylie. Moreover, the levels of consciousness between adjacent scores are not necessarily the same; for example, the difference in the levels of consciousness between two patients with GCS scores of 10 and 11 may not be the same as that between patients with scores of 11 and 12. It's therefore important to recognise that we cannot quantify these differences.

GCS data is ordinal categorical (or just ordinal) data. It is ordinal because the values can be meaningfully ordered, and it is categorical because each value is assigned to a specific category. Notice two things about this variable, which is typical of all ordinal variables:

The data do not have any units of measurement (so the same as that for nominal variables).
The ordering of the categories is not arbitrary, as it is with nominal variables.

The seemingly numeric values of ordinal data, such as GCS scores, are not in fact real numbers but only numeric labels which we attach to category values (usually for convenience or for data entry to a computer). The reason is of course (to re-emphasise this important point) that GCS data, and the data generated by most other scales, are not properly measured but assessed in some way by a clinician or a researcher, working with the individual concerned.4 This is a characteristic of all ordinal data.

Because ordinal data are not real numbers, it is not appropriate to apply any of the rules of basic arithmetic to this sort of data. You should not add, subtract, multiply, or divide ordinal values. This limitation has marked implications for the sorts of analyses that we can do with such data - as you will see later in this book. Finally, we should note that ordinal data are almost always integer, that is, they have whole number values.

Exercise 1.4. Suggest a few more scales with which you may be familiar from your clinical work.

Exercise 1.5. Explain why it would not...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Medical Statistics from Scratch

Description

More details

Other editions

Additional editions

Person

Content

1
First things first - the nature of data

Learning objectives

Variables and data

Where are we going .?

The good, the bad, and the ugly - types of variables

Categorical data

Nominal categorical data

Ordinal categorical data

System requirements