
A Practical Approach to Using Statistics in Health Research
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions


Persons
Adam Mackridge, Ph.D., is a Research Pharmacist at Betsi Cadwaladr University Health Board in North Wales. He has over 15 years of experience in planning, conducting and reporting health research. He received his PhD in Pharmacy Practice from Aston University in Birmingham, UK.
Philip Rowe, Ph.D., is a Visiting Research Fellow in the School of Pharmacy and Molecular Sciences at Liverpool John Moores University, Liverpool, UK. He is a Fellow of the Royal Statistical Society and has authored other statistically based books for Wiley.
Content
About the Companion Website xv
1 Introduction 1
1.1 At Whom is This Book Aimed? 1
1.2 At What Scale of Project is This Book Aimed? 2
1.3 Why Might This Book be Useful for You? 2
1.4 How to Use This Book 3
1.5 Computer Based Statistics Packages 4
1.6 Relevant Videos etc. 5
2 Data Types 7
2.1 What Types of Data are There and Why Does it Matter? 7
2.2 Continuous Measured Data 7
2.2.1 Continuous Measured Data - Normal and Non-Normal Distribution 8
2.2.2 Transforming Non-Normal Data 13
2.3 Ordinal Data 13
2.4 Categorical Data 14
2.5 Ambiguous Cases 14
2.5.1 A Continuously Varying Measure that has been Divided into a Small Number of Ranges 14
2.5.2 Composite Scores with a Wide Range of Possible Values 15
2.6 Relevant Videos etc. 15
3 Presenting and Summarizing Data 17
3.1 Continuous Measured Data 17
3.1.1 Normally Distributed Data - Using the Mean and Standard Deviation 18
3.1.2 Data With Outliers, e.g. Skewed Data - Using Quartiles and the Median 18
3.1.3 Polymodal Data - Using the Modes 20
3.2 Ordinal Data 21
3.2.1 Ordinal Scales With a Narrow Range of Possible Values 22
3.2.2 Ordinal Scales With a Wide Range of Possible Values 22
3.2.3 Dividing an Ordinal Scale Into a Small Number of Ranges (e.g. Satisfactory/Unsatisfactory or Poor/Acceptable/Good) 22
3.2.4 Summary for Ordinal Data 23
3.3 Categorical Data 23
3.4 Relevant Videos etc. 24
Appendix 1: An Example of the Insensitivity of the Median When Used to Describe Data from an Ordinal Scale With a Narrow Range of Possible Values 25
4 Choosing a Statistical Test 27
4.1 Identify the Factor and Outcome 27
4.2 Identify the Type of Data Used to Record the Relevant Factor 29
4.3 Statistical Methods Where the Factor is Categorical 30
4.3.1 Identify the Type of Data Used to Record the Outcome 30
4.3.2 Is Continuous Measured Outcome Data Normally Distributed or Can It Be Transformed to Normality? 30
4.3.3 Identify Whether Your Sets of Outcome Data Are Related or Independent 31
4.3.4 For the Factor, How Many Levels Are Being Studied? 32
4.3.5 Determine the Appropriate Statistical Method for Studies with a Categorical Factor 32
4.4 Correlation and Regression with a Measured Factor 34
4.4.1 What Type of Data Was Used to Record Your Factor and Outcome? 34
4.4.2 When Both the Factor and the Outcome Consist of Continuous Measured Values, Select Between Pearson and Spearman Correlation 34
4.5 Relevant Additional Material 38
5 Multiple Testing 39
5.1 What Is Multiple Testing and Why Does It Matter? 39
5.2 What Can We Do to Avoid an Excessive Risk of False Positives? 40
5.2.1 Use of Omnibus Tests 40
5.2.2 Distinguishing Between Primary and Secondary/ Exploratory Analyses 40
5.2.3 Bonferroni Correction 41
6 Common Issues and Pitfalls 43
6.1 Determining Equality of Standard Deviations 43
6.2 How Do I Know, in Advance, How Large My SD Will Be? 43
6.3 One-Sided Versus Two-Sided Testing 44
6.4 Pitfalls That Make Data Look More Meaningful Than It Really Is 45
6.4.1 Too Many Decimal Places 45
6.4.2 Percentages with Small Sample Sizes 47
6.5 Discussion of Statistically Significant Results 47
6.6 Discussion of Non-Significant Results 50
6.7 Describing Effect Sizes with Non-Parametric Tests 51
6.8 Confusing Association with a Cause and Effect Relationship 52
7 Contingency Chi-Square Test 55
7.1 When Is the Test Appropriate? 55
7.2 An Example 55
7.3 Presenting the Data 57
7.3.1 Contingency Tables 57
7.3.2 Clustered or Stacked Bar Charts 57
7.4 Data Requirements 59
7.5 An Outline of the Test 59
7.6 Planning Sample Sizes 59
7.7 Carrying Out the Test 60
7.8 Special Issues 61
7.8.1 Yates Correction 61
7.8.2 Low Expected Frequencies - Fisher's Exact Test 61
7.9 Describing the Effect Size 61
7.9.1 Absolute Risk Difference (ARD) 62
7.9.2 Number Needed to Treat (NNT) 63
7.9.3 Risk Ratio (RR) 63
7.9.4 Odds Ratio (OR) 64
7.9.5 Case: Control Studies 65
7.10 How to Report the Analysis 65
7.10.1 Methods 65
7.10.2 Results 66
7.10.3 Discussion 67
7.11 Confounding and Logistic Regression 67
7.11.1 Reporting the Detection of Confounding 68
7.12 Larger Tables 69
7.12.1 Collapsing Tables 69
7 12.2 Reducing Tables 70
7.13 Relevant Videos etc. 71
8 Independent Samples (Two-Sample) T-Test 73
8.1 When Is the Test Applied? 73
8.2 An Example 73
8.3 Presenting the Data 75
8.3.1 Numerically 75
8.3.2 Graphically 75
8.4 Data Requirements 75
8.4.1 Variables Required 75
8.4.2 Normal Distribution of the Outcome Variable Within the Two Samples 75
8.4.3 Equal Standard Deviations 78
8.4.4 Equal Sample Sizes 78
8.5 An Outline of the Test 78
8.6 Planning Sample Sizes 79
8.7 Carrying Out the Test 79
8.8 Describing the Effect Size 79
8.9 How to Describe the Test, the Statistical and Practical Significance of Your Findings in Your Report 80
8.9.1 Methods Section 80
8.9.2 Results Section 80
8.9.3 Discussion Section 81
8.10 Relevant Videos etc. 81
9 Mann-Whitney Test 83
9.1 When Is the Test Applied? 83
9.2 An Example 83
9.3 Presenting the Data 85
9.3.1 Numerically 85
9.3.2 Graphically 85
9.3.3 Divide the Outcomes into Low and High Ranges 85
9.4 Data Requirements 86
9.4.1 Variables Required 86
9.4.2 Normal Distributions and Equality of Standard Deviations 87
9.4.3 Equal Sample Sizes 87
9.5 An Outline of the Test 87
9.6 Statistical Significance 87
9.7 Planning Sample Sizes 87
9.8 Carrying Out the Test 88
9.9 Describing the Effect Size 88
9.10 How to Report the Test 89
9.10.1 Methods Section 89
9.10.2 Results Section 89
9.10.3 Discussion Section 90
9.11 Relevant Videos etc. 91
10 One-Way Analysis of Variance (ANOVA) - Including Dunnett's and Tukey's Follow Up Tests 93
10.1 When Is the Test Applied? 93
10.2 An Example 93
10.3 Presenting the Data 94
10.3.1 Numerically 94
10.3.2 Graphically 94
10.4 Data Requirements 94
10.4.1 Variables Required 94
10.4.2 Normality of Distribution for the Outcome Variable Within the Three Samples 95
10.4.3 Standard Deviations 96
10.4.4 Sample Sizes 98
10.5 An Outline of the Test 98
10.6 Follow Up Tests 98
10.7 Planning Sample Sizes 99
10.8 Carrying Out the Test 100
10.9 Describing the Effect Size 101
10.10 How to Report the Test 101
10.10.1 Methods 101
10.10.2 Results Section 102
10.10.3 Discussion Section 102
10.11 Relevant Videos etc. 103
11 Kruskal-Wallis 105
11.1 When Is the Test Applied? 105
11.2 An Example 105
11.3 Presenting the Data 106
11.3.1 Numerically 106
11.3.2 Graphically 107
11.4 Data Requirements 109
11.4.1 Variables Required 109
11.4.2 Normal Distributions and Standard Deviations 109
11.4.3 Equal Sample Sizes 110
11.5 An Outline of the Test 110
11.6 Planning Sample Sizes 110
11.7 Carrying Out the Test 110
11.8 Describing the Effect Size 111
11.9 Determining Which Group Differs from Which Other 111
11.10 How to Report the Test 111
11.10.1 Methods Section 111
11.10.2 Results Section 112
11.10.3 Discussion Section 113
11.11 Relevant Videos etc. 114
12 McNemar's Test 115
12.1 When Is the Test Applied? 115
12.2 An Example 115
12.3 Presenting the Data 116
12.4 Data Requirements 116
12.5 An Outline of the Test 118
12.6 Planning Sample Sizes 118
12.7 Carrying Out the Test 119
12.8 Describing the Effect Size 119
12.9 How to Report the Test 119
12.9.1 Methods Section 119
12.9.2 Results Section 120
12.9.3 Discussion Section 120
12.10 Relevant Videos etc. 121
13 Paired T-Test 123
13.1 When Is the Test Applied? 123
13.2 An Example 125
13.3 Presenting the Data 125
13.3.1 Numerically 125
13.3.2 Graphically 125
13.4 Data Requirements 126
13.4.1 Variables Required 126
13.4.2 Normal Distribution of the Outcome Data 126
13.4.3 Equal Standard Deviations 128
13.4.4 Equal Sample Sizes 128
13.5 An Outline of the Test 128
13.6 Planning Sample Sizes 129
13.7 Carrying Out the Test 129
13.8 Describing the Effect Size 129
13.9 How to Report the Test 130
13.9.1 Methods Section 130
13.9.2 Results Section 130
13.9.3 Discussion Section 131
13.10 Relevant Videos etc. 131
14 Wilcoxon Signed Rank Test 133
14.1 When Is the Test Applied? 133
14.2 An Example 134
14.3 Presenting the Data 134
14.3.1 Numerically 134
14.3.2 Graphically 136
14.4 Data Requirements 136
14.4.1 Variables Required 136
14.4.2 Normal Distributions and Equal Standard Deviations 137
14.4.3 Equal Sample Sizes 137
14.5 An Outline of the Test 137
14.6 Planning Sample Sizes 138
14.7 Carrying Out the Test 139
14.8 Describing the Effect Size 139
14.9 How to Report the Test 140
14.9.1 Methods Section 140
14.9.2 Results Section 140
14.9.3 Discussion Section 141
14.10 Relevant Videos etc. 141
15 Repeated Measures Analysis of Variance 143
15.1 When Is the Test Applied? 143
15.2 An Example 144
15.3 Presenting the Data 144
15.3.1 Numerical Presentation of the Data 145
15.3.2 Graphical Presentation of the Data 145
15.4 Data Requirements 146
15.4.1 Variables Required 146
15.4.2 Normal Distribution of the Outcome Data 148
15.4.3 Equal Standard Deviations 148
15.4.4 Equal Sample Sizes 148
15.5 An Outline of the Test 148
15.6 Planning Sample Sizes 149
15.7 Carrying Out the Test 150
15.8 Describing the Effect Size 150
15.9 How to Report the Test 151
15.9.1 Methods Section 151
15.9.2 Results Section 151
15.9.3 Discussion Section 152
15.10 Relevant Videos etc. 153
16 Friedman Test 155
16.1 When Is the Test Applied? 155
16.2 An Example 157
16.3 Presenting the Data 157
16.3.1 Bar Charts of the Outcomes at Various Stages 157
16.3.2 Summarizing the Data via Medians or Means 157
16.3.3 Splitting the Data at Some Critical Point in the Scale 159
16.4 Data Requirements 160
16.4.1 Variables Required 160
16.4.2 Normal Distribution and Standard Deviations in the Outcome Data 160
16.4.3 Equal Sample Sizes 160
16.5 An Outline of the Test 160
16.6 Planning Sample Sizes 161
16.7 Follow Up Tests 161
16.8 Carrying Out the Tests 162
16.9 Describing the Effect Size 162
16.9.1 Median or Mean Values Among the Individual Changes 162
16.9.2 Split the Scale 162
16.10 How to Report the Test 162
16.10.1 Methods Section 162
16.10.2 Results Section 163
16.10.3 Discussion Section 164
16.11 Relevant Videos etc. 164
17 Pearson Correlation 165
17.1 Presenting the Data 165
17.2 Correlation Coefficient and Statistical Significance 166
17.3 Planning Sample Sizes 167
17.4 Effect Size and Practical Relevance 167
17.5 Regression 169
17.6 How to Report the Analysis 170
17.6.1 Methods 170
17.6.2 Results 170
17.6.3 Discussion 171
17.7 Relevant Videos etc. 171
18 Spearman Correlation 173
18.1 Presenting the Data 173
18.2 Testing for Evidence of Inappropriate Distributions 174
18.3 Rho and Statistical Significance 174
18.4 An Outline of the Significance Test 175
18.5 Planning Sample Sizes 175
18.6 Effect Size 176
18.7 Where Both Measures Are Ordinal 176
18.7.1 Educational Level and Willingness to Undertake Internet Research - An Example Where Both Measures Are Ordinal 176
18.7.2 Presenting the Data 177
18.7.3 Rho and Statistical Significance 177
18.7.4 Effect Size 178
18.8 How to Report Spearman Correlation Analyses 178
18.8.1 Methods 178
18.8.2 Results 179
18.8.3 Discussion 180
18.9 Relevant Videos etc. 180
19 Logistic Regression 181
19.1 Use of Logistic Regression with Categorical Outcomes 181
19.2 An Outline of the Significance Test 182
19.3 Planning Sample Sizes 182
19.4 Results of the Analysis 184
19.5 Describing the Effect Size 184
19.6 How to Report the Analysis 185
19.6.1 Methods 185
19.6.2 Results 186
19.6.3 Discussion 186
19.7 Relevant Videos etc. 187
20 Cronbach's Alpha 189
20.1 Appropriate Situations for the Use of Cronbach's Alpha 189
20.2 Inappropriate Uses of Alpha 190
20.3 Interpretation 190
20.4 Reverse Scoring 191
20.5 An Example 191
20.6 Performing and Interpreting the Analysis 192
20.7 How to Report Cronbach's Alpha Analyses 193
20.7.1 Methods Section 193
20.7.2 Results 194
20.7.3 Discussion 194
20.7 Relevant Videos etc. 195
Glossary 197
Videos 209
Index 211
Chapter 2
Data Types
2.1 What Types of Data are There and Why Does it Matter?
Before you can select a statistical method, you will need to identify what types of data you plan to collect. The choice of descriptive and analytical methods depends crucially on the type of data involved. There are three types:
- Continuous measured/Scale (such as blood pressure measured in mmHg).
- Ordinal (such as a Likert scale - Strongly disagree to Strongly agree).
- Categorical/Nominal (such as which ward a patient is on).
The first two types are concerned with the measurement of some characteristic. The final type is just a classification with no sense of measurement.
2.2 Continuous Measured Data
This is also known as "Interval" or "Scale" data. Clinical observations often produce continuous measured data: these include weights, volumes, timings, concentrations, pressures, etc. The important aspects of this type of data are:
- The characteristic being assessed varies continuously. For example, we measure blood pressure using discrete steps of one mmHg, but the reality is that pressure could be 91.25 mmHg (or any figure with an unlimited number of decimal places) - we just choose not to measure to this degree of precision.
- There is a large number of possible different values that might be recorded. For example, diastolic blood pressures typically vary over a range of 60 to 120 mmHg, giving 61 different recorded values.
- Each step up the scale of one unit is of equal size. E.g. the difference between pressures of 80 and 81 mmHg is exactly the same as that between 94 and 95 mmHg.
2.2.1 Continuous Measured Data - Normal and Non-Normal Distribution
Continuous measured data needs to be further subdivided according to whether it follows a normal distribution or not, as many statistical methods will only work with normally distributed data. Non-normal data needs alternative approaches. It is best to avoid statistical tests for normality (Kolmogorov-Smirnov, Anderson-Darling etc.) as the results are easily misinterpreted (See Rowe 2015, Chapter 4).1 However, there are a number of different approaches to determining whether your data is normally distributed or not.
Whether data is or is not normally distributed is most easily decided by preparing a histogram of the data. Unless your sample sizes are very large, group your results into a small number of bars as you only want to check the general shape of the distribution. Figure 2.1a shows an example of normally distributed data. This is commonly referred to as a "bell-shaped" distribution and it has three features:
- The greatest frequencies (tallest histogram bars) are somewhere near the middle of the range of values observed. What we do not want is to see is the tallest histogram bars at either the far left or right hand end of the horizontal scale (as shown in Figure 2.1b).
- The results are all clustered around a single point; they do not fall into two distinct groups of high and low values (as shown in Figure 2.1c). Normally distributed data is said to be "Unimodal."
- For both low and high values, frequencies decline steadily toward zero with no sudden cut-off (as shown in Figure 2.1d).
Figure 2.1 A continuously varying measure with (a) normal, (b) skewed, and (c) bimodal distribution. In (d) the highest and lowest values (tails) from an otherwise normal distribution are missing.
Figure 2.1b shows a common form of non-normality. The majority of individuals are clustered at the low end of the scale, and there is a long tail of high values. This is referred to as "Positive skew."
Figure 2.1c shows a further form of non-normality - bimodality. The data forms two distinct clusters of low and high values. (The term "Poymodality" applies to any case with more than one cluster of values.)
Figure 2.1d shows a final form of non-normality. Results are cut off suddenly at values of around 50 and 70; for a true normal distribution, there should be a more gradual decline in frequencies beyond these limits.
NOTE: Do not expect to get ideal bell-shaped distributions, especially with small samples. The distribution shown in Figure 2.1a is perfectly acceptable. We just don't want to see obvious deviations from normality such as seen in Figure 2.1b, c, or d.
The only important deviation from normality that is not easily detected from a histogram is where there are outlying extreme values in both the low and high tail (Referred to as "Long-tailed" distributions). Figure 2.2 shows data that suffers this problem, but this form of non-normality is not always easy to diagnose from a histogram, and an alternative method is required. The next paragraph shows how this problem can be detected.
Figure 2.2 Histogram of "Long-tailed" data, i.e. data that includes both low and high outlying values.
To detect whether your data has a long-tailed distribution, it is useful to produce a "normal probability plot," which uses the mean and standard deviation (SD) of the sample and looks for any differences between the data distribution that would be expected for a normal distribution and what you actually observe (the video mentioned at the end of the chapter shows how to produce these plots). An ideal line where all points should fall is usually added to the graph by the software. Figure 2.3a shows such a plot for a set of data that is almost exactly normally distributed; all the points are very close to the line of perfect fit for normality. Figure 2.3b shows a plot for a long-tailed distribution. The points for the lowest values are all displaced to the left of the ideal line, i.e. these values are markedly lower than they should be for a true normal distribution ("low outliers"). At the other end of the scale, the highest points are to the right of the ideal line (values that are higher than they ought to be - "high outliers"). If your data produces a normal probability plot like that shown in Figure 2.3b, then you should treat your data as non-normal. In further chapters that discuss methods requiring normality, you will be advised how to handle non-normality.
Figure 2.3 Normal probability plots of (a) normally distributed data and (b) long-tailed data.
See the video listed at the end of the chapter for details of using a statistical package to check data normality.
2.2.2 Transforming Non-Normal Data
When data is not normally distributed, it may be possible to manipulate it to bring it closer to normality. This is referred to as "data transformation."
- Data showing positive skew (as in Figure 2.1b) can often be returned to normality by a technique called "Log transformation." A video listed at the end of this chapter gives practical details.
- Data showing bimodality, severe absence of tails, or long tails (Figures 2.1c and d and 2.2) cannot easily be transformed to normality.
Rowe (2015) Chapters 4 and 21 gives further details on non-normal distributions and their transformation to normality.
2.3 Ordinal Data
Here, the characteristic to be measured is often subjective in nature. For example, we might assess how patients feel about a treatment they have received, using a score, of (say) one to five with the following equivalences:
- 1 = Strongly dissatisfied
- 2 = Somewhat dissatisfied
- 3 = Neither satisfied nor dissatisfied
- 4 = Generally satisfied
- 5 = Very satisfied
The data consists of categorizations, but the important thing is that the categories have a natural order to them. There is a definite ranking from "Strongly dissatisfied" to "Very satisfied" via three intermediate grades.
This type of data has the following characteristics:
- It is discontinuous. Only these five integer values are available. There are no scores of 1.45 or any other fractional value.
- Ordinal scales usually allow only small numbers of possible values (five in the current case).
- It often cannot be assumed that all the steps up the scale are of equal significance. Although you may have scored gradings as 1,2,3 etc., you cannot assume that the difference between "Strongly dissatisfied" and "Somewhat dissatisfied" is of exactly the same importance as that between "Somewhat dissatisfied" and "Neither satisfied nor dissatisfied."
2.4 Categorical Data
This is also known as "Nominal" data. Here there is no intention to measure a characteristic: we are just categorizing. For example, advice might be provided by nurses, pharmacists, or physiotherapists. These do not form any sort of scale; they are just three different professions. Frequently there are just two options, obvious cases being Male/Female, Yes/No or Successful/Failed; these are referred to as "Binary" or "Dichotomous."
2.5 Ambiguous Cases
There are a couple of...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.