Makes mathematical and statistical analysis understandable to even the least math-minded biology student
This unique textbook aims to demystify statistical formulae for the average biology student. Written in a lively and engaging style, Statistics for Terrified Biologists, 2nd Edition draws on the author's 30 years of lecturing experience to teach statistical methods to even the most guarded of biology students. It presents basic methods using straightforward, jargon-free language. Students are taught to use simple formulae and how to interpret what is being measured with each test and statistic, while at the same time learning to recognize overall patterns and guiding principles. Complemented by simple examples and useful case studies, this is an ideal statistics resource tool for undergraduate biology and environmental science students who lack confidence in their mathematical abilities.
Statistics for Terrified Biologists presents readers with the basic foundations of parametric statistics, the t-test, analysis of variance, linear regression and chi-square, and guides them to important extensions of these techniques. It introduces them to non-parametric tests, and includes a checklist of non-parametric methods linked to their parametric counterparts. The book also provides many end-of-chapter summaries and additional exercises to help readers understand and practice what they've learned.
Presented in a clear and easy-to-understand style
Makes statistics tangible and enjoyable for even the most hesitant student
Features multiple formulas to facilitate comprehension
Written by of the foremost entomologists of his generation
This second edition of Statistics for Terrified Biologists is an invaluable guide that will be of great benefit to pre-health and biology undergraduate students.
<b>Makes mathematical and statistical analysis understandable to even the least math-minded biology student</b>
This unique textbook aims to demystify statistical formulae for the average biology student. Written in a lively and engaging style, <i>Statistics for Terrified Biologists, 2<sup>nd</sup> Edition</i> draws on the author's 30 years of lecturing experience to teach statistical methods to even the most guarded of biology students. It presents basic methods using straightforward, jargon-free language. Students are taught to use simple formulae and how to interpret what is being measured with each test and statistic, while at the same time learning to recognize overall patterns and guiding principles. Complemented by simple examples and useful case studies, this is an ideal statistics resource tool for undergraduate biology and environmental science students who lack confidence in their mathematical abilities.
<i>Statistics for Terrified Biologists</i> presents readers with the basic foundations of parametric statistics, the t-test, analysis of variance, linear regression and chi-square, and guides them to important extensions of these techniques. It introduces them to non-parametric tests, and includes a checklist of non-parametric methods linked to their parametric counterparts. The book also provides many end-of-chapter summaries and additional exercises to help readers understand and practice what they've learned.
<ul><li>Presented in a clear and easy-to-understand style</li><li>Makes statistics tangible and enjoyable for even the most hesitant student</li><li>Features multiple formulas to facilitate comprehension</li><li>Written by of the foremost entomologists of his generation</li></ul>
This second edition of <i>Statistics for Terrified Biologists</i> is an invaluable guide that will be of great benefit to pre-health and biology undergraduate students.
Preface to the second edition xv
Preface to the first edition xvii
<b>1 How to use this book 1</b>
Introduction 1
The text of the chapters 1
What should you do if you run into trouble? 2
Elephants 3
The numerical examples in the text 3
Boxes 4
Spare-time activities 4
Executive summaries 5
Why go to all that bother? 5
The bibliography 7
<b>2 Introduction 9</b>
What are statistics? 9
Notation 10
Notation for calculating the mean 12
<b>3 Summarising variation 13</b>
Introduction 13
Different summaries of variation 14
Range 14
Total deviation 14
Mean deviation 15
Variance 16
Why <i>n</i>?1? 17
Why are the deviations squared? 18
The standard deviation 19
The next chapter 21
Spare-time activities 21
<b>4 When are sums of squares NOT sums of squares? 23</b>
Introduction 23
Calculating machines offer a quicker method of calculating the sum of squares 24
Added squares 24
The correction factor 24
Avoid being confused by the term <i>sum of squares </i>24
Summary of the calculator method for calculations as far as the standard deviation 25
Spare-time activities 26
<b>5 The normal distribution 27</b>
Introduction 27
Frequency distributions 27
The normal distribution 28
What percentage is a standard deviation worth? 30
Are the percentages always the same as these? 30
Other similar scales in everyday life 33
The standard deviation as an estimate of the frequency of a number occurring in a sample 33
From percentage to probability 34
Executive Summary 1 - The standard deviation 36
<b>6 The relevance of the normal distribution to biological data 39</b>
To recap 39
Is our observed distribution normal? 41
Checking for normality 42
What can we do about a distribution that clearly is not normal? 42
Transformation 42
Grouping samples 47
Doing nothing! 47
How many samples are needed? 47
Type 1 and Type 2 errors 48
Calculating how many samples are needed 49
<b>7 Further calculations from the normal distribution 51</b>
Introduction 51
Is A bigger than B? 52
The yardstick for deciding 52
The standard error of a difference between two means of three eggs 53
Derivation of the standard error of a difference between two means 53
Step 1: from variance of single data to variance of means 55
Step 2: From variance of single data to <i>variance of differences </i>57
Step 3: The combination of Steps 1 and 2: the standard error of difference between means (s.e.d.m.) 58
Recap of the calculation of s.e.d.m. from the variance calculated from the individual values 61
The importance of the standard error of differences between means 61
Summary of this chapter 62
Executive Summary 2 - Standard error of a difference between two means 66
Spare-time activities 67
<b>8 The<i>t</i>-test 69</b>
Introduction 69
The principle of the <i>t</i>-test 70
The <i>t</i>-test in statistical terms 71
Why <i>t</i>? 71
Tables of the <i>t</i>-distribution 72
The standard <i>t</i>-test 75
The procedure 76
The actual <i>t</i>-test 81
<i>t</i>-test for means associated with unequal variances 81
The s.e.d.m. when variances are unequal 82
A worked example of the <i>t</i>-test for means associated with unequal variances 85
The paired <i>t</i>-test 87
Pair when possible 90
Executive Summary 3 - The <i>t</i>-test 92
Spare-time activities 94
<b>9 One tail or two? 95</b>
Introduction 95
Why is the analysis of variance <i>F</i>-test one-tailed? 95
The two-tailed <i>F</i>-test 96
Howmany tails has the <i>t</i>-test? 98
The final conclusion on number of tails 99
<b>10 Analysis of variance (ANOVA): what is it? How does it work? 101</b>
Introduction 101
Sums of squares in ANOVA 102
Some 'made-up' variation to analyse by ANOVA 102
The sum of squares table 104
Using ANOVA to sort out the variation in Table C 104
Phase 1 104
Phase 2 105
SqADS: an important acronym 107
Back to the sum of squares table 108
How well does the analysis reflect the input? 109
End phase 109
Degrees of freedom in ANOVA 110
The completion of the end phase 112
The variance ratio 113
The relationship between <i>t </i>and <i>F </i>114
Constraints on ANOVA 115
Adequate size of experiment 115
Equality of variance between treatments 117
Testing the homogeneity of variance 117
The element of chance: randomisation 118
Comparison between treatment means in ANOVA 119
The least significant difference 121
A caveat about using the LSD 123
Executive Summary 4 - The principle of ANOVA 124
<b>11 Experimental designs for analysis of variance (ANOVA) 129</b>
Introduction 129
Fully randomised 130
Data for analysis of a fully randomised experiment 131
Prelims 132
Phase 1 132
Phase 2 133
End phase 133
Randomised blocks 135
Data for analysis of a randomised block experiment 137
Prelims 138
Phase 1 139
Phase 2 140
End phase 141
Incomplete blocks 142
Latin square 145
Data for the analysis of a Latin square 145
Prelims 146
Phase 1 150
Phase 2 150
End phase 151
Further comments on the Latin square design 152
Split plot 154
Types of analysis of variance 154
One- and two-way analysis of variance 155
Fixed-, random-, and mixed-effects analysis of variance 156
Executive Summary 5 - Analysis of a one-way randomised block experiment 158
Spare-time activities 159
<b>12 Introduction to factorial experiments 163</b>
What is a factorial experiment? 163
Interaction: what does it mean biologically? 165
If there is no interaction 167
What if there IS interaction? 167
How about a biological example? 168
Measuring any interaction between factors is often the main/only purpose of an experiment 170
How does a factorial experiment change the form of the analysis of variance? 171
Degrees of freedom for interactions 171
The similarity between the <i>residual </i>in Phase 2 and the <i>interaction </i>in Phase 3 172
Sums of squares for interactions 172
<b>13 2-Factor factorial experiments 175</b>
Introduction 175
An example of a 2-factor experiment 175
Analysis of the 2-factor experiment 176
Prelims 176
Phase 1 177
Phase 2 177
End phase (of Phase 2) 178
Phase 3 179
End phase (of Phase 3) 183
Two important things to remember about factorials before tackling the next chapter 185
Analysis of factorial experiments with unequal replication 185
Executive Summary 6 - Analysis of a 2-factor randomised block experiment 188
Spare-time activity 190
<b>14 Factorial experiments with more than two factors - leave this out if you wish! 191</b>
Introduction 191
Different 'orders' of interaction 191
Example of a 4-factor experiment 192
Prelims 194
Phase 1 196
Phase 2 196
Phase 3 197
To the end phase 205
Spare-time activity 214
<b>15 Factorial experiments with split plots 217</b>
Introduction 217
Deriving the split plot design from the randomised block design 218
Degrees of freedom in a split plot analysis 221
Main plots 221
Sub-plots 222
Numerical example of a split plot experiment and its analysis 224
Calculating the sums of squares 225
End phase 229
Comparison of split plot and randomised block experiments 229
Uses of split plot designs 233
Spare-time activity 235
<b>16 The <i>t</i>-test in the analysis of variance 237</b>
Introduction 237
Brief recap of relevant earlier sections of this book 238
Least significant difference test 239
Multiple range tests 240
Operating the multiple range test 242
Testing differences between means 246
My rules for testing differences between means 246
Presentation of the results of tests of differences between means 247
The results of the experiments analysed by analysis of variance in Chapters 11-15 249
Fully randomised design (p. 131) 250
Randomised block experiment (p. 137) 251
Latin square design (p. 146) 253
2-Factor experiment (p. 176) 255
4-Factor experiment (p. 195) 257
Split plot experiment (p. 224) 259
Some final advice 261
Spare-time activities 261
<b>17 Linear regression and correlation 263</b>
Introduction 263
Cause and effect 264
Other traps waiting for you to fall into 264
Extrapolating beyond the range of your data 264
Is a straight line appropriate? 265
The distribution of variability 268
Regression 268
Independent and dependent variables 272
The regression coefficient (<i>b</i>) 272
Calculating the regression coefficient (<i>b</i>) 275
The regression equation 281
A worked example on some real data 282
The data 282
Calculating the regression coefficient (<i>b</i>), i.e. the slope of the regression line 282
Calculating the intercept (<i>a</i>) 284
Drawing the regression line 285
Testing the significance of the slope (<i>b</i>) of the regression 286
How well do the points fit the line? The coefficient of determination (<i>r</i><sup>2</sup>) 290
Correlation 291
Derivation of the correlation coefficient (<i>r</i>) 291
An example of correlation 292
Is there a correlation line? 293
Extensions of regression analysis 296
Nonlinear regression 297
Multiple linear regression 298
Multiple nonlinear regression 300
Executive Summary - Linear regression 301
Spare time activities 303
<b>18 Analysis of covariance (ANCOVA) 305</b>
Introduction 305
A worked example of ANCOVA 307
Data: cholesterol levels of subjects given different diets 307
Data: ages of subjects in experiment 308
Regression of cholesterol level on age 309
The structure of the ANCOVA table 312
Total sum of squares 313
Residual sum of squares 314
Corrected means 316
Test for significant difference between means 316
Executive Summary 8 - Analysis of covariance (ANCOVA) 319
Spare-time activity 320
<b>19 Chi-square tests 323</b>
Introduction 323
When not and where not to use <i>𝜒 </i><sup>2</sup> 324
The problem of low frequencies 325
Yates' correction for continuity 325
The <i>𝜒 </i><sup>2</sup> test for <i>goodness of fit </i>326
The case of more than two classes 328
<i>𝜒 </i><sup>2</sup> with heterogeneity 331
Heterogeneity <i>𝜒 </i><sup>2</sup> Analysis with 'Covariance' 333
Association (or contingency) <i>𝜒 </i><sup>2</sup> 335
2 x 2 contingency table 336
Fisher's exact test for a 2 x 2 table 338
Larger contingency tables 340
Interpretation of contingency tables 341
Spare-time activities 343
<b>20 Nonparametric methods (what are they?) 345</b>
Disclaimer 345
Introduction 346
Advantages and disadvantages of parametric and nonparametric methods 347
Where nonparametric methods score 347
Where parametric methods score 349
Some ways data are organised for nonparametric tests 349
The sign test 350
The Kruskal-Wallis analysis of ranks 350
Kendall's rank correlation coefficient 352
The main nonparametric methods that are available 353
Analysis of two replicated treatments as in the <i>t</i>-test (Chapter 8) 353
Analysis of more than two replicated treatments as in the analysis of variance (Chapter 11) 354
Correlation of two variables (Chapter 17) 354
Appendix A How many replicates? 355
Appendix B Statistical tables 365
Appendix C Solutions to spare-time activities 373
Appendix D Bibliography 393
Index 397
1
How to use this book
Chapter features
Introduction
Don't be misled! This book cannot replace effort on your part. All it can aspire to do is to make that effort effective. The detective thriller only succeeds because you have read it too fast and not really concentrated - with that approach, you'll find this book just as mysterious.
In fact, you may not get very far if you just read this book at any speed! You will only succeed if you interact with the text, and how you might do this is the topic of most of this chapter.
The text of the chapters
The chapters, particularly 2-8, develop a train of thought essential to the subject of analysing biological data. You just have to take these chapters in order and quite slowly. There is only one way I know for you to maintain the concentration necessary to comprehension, and that is for you to make your own summary notes as you go along.
My Head of Department when I first joined the staff at Reading used to define a university lecture as 'a technique for transferring information from a piece of paper in front of the lecturer to a piece of paper in front of the student, without passing through the heads of either'. That's why I stress making your own summary notes. You will retain very little by just reading the text; you'll find that after a while you've been thinking about something totally different while 'reading' several pages - we've all been there! The message you should take from my Head of Department's quote above is that just repeating in your writing what you are reading is little better than taking no notes at all: the secret is to digest what you have read and reproduce it in your own words and in summary form. Use plenty of headings and subheadings, boxes linked by arrows, cartoon drawings, etc. Another suggestion is to use different coloured pens for different recurring statistics, such as variance and correction factor. In fact, use anything that forces you to convert my text into as different a form as possible from the original; that will force you to concentrate, to involve your brain and to make it clear to you whether or not you have really understood that bit in the book so that it is safe to move on.
The actual process of making the notes is the critical step - you can throw the notes away at a later stage if you wish, though there's no harm in keeping them for a time for revision and reference.
So DON'T MOVE ON until you are ready. You'll only undo the value of previous effort if you persuade yourself that you are ready to move on when in your heart of hearts you know you are fooling yourself!
A key point in the book is Figure 7.5 on p. 64. Take real care to lay an especially good foundation up to there. If you really feel at home with this diagram, it is a sure sign that you have conquered any hang-ups and are no longer a 'terrified biologist'.
What should you do if you run into trouble?
The obvious first step is to go back to the point in the book where you last felt confident, and start again from there.
However, it often helps to see how someone else has explained the same topic, so it's a good idea to have a look at the relevant pages of a different statistics text (see Appendix D for some suggestions). You could also look up the topic on the Internet, where many statisticians have put articles and their lectures to students.
A third possibility is to see if someone can explain things to you face to face. Do you know or have access to someone who might be able to help? If you are at university, it could be a fellow student or even one of the staff. The person who tried to teach statistics to my class at university failed completely as far as I was concerned, but later on I found he could explain things to me quite brilliantly in a one-to-one situation.
Elephants
At certain points in the text you will find the sign of the elephant, i.e. .
They say 'elephants never forget' and the symbol means just that: NEVER FORGET! I have used it to mark some key statistical concepts which, in my experience, people easily forget and as a result run into trouble later on and find it hard to see where they have gone wrong. So, take it from me that it is really well worth making sure these matters are firmly embedded in your memory.
The numerical examples in the text
As stated in the Preface to the First Edition, I soon learnt that biologists don't like x. For some reason they prefer a real number but are more prepared to accept, say, 45 as representing any number than they are an x! Therefore, in order to avoid 'algebra' as far as possible, I have used actual numbers to illustrate the working of statistical analyses and tests. You probably won't gain a lot by keeping up with me on a hand calculator as I describe the different steps of a calculation, but you should make sure at each step that you understand where each number in a calculation has come from and why it has been included in that way.
When you reach the end of each worked analysis or test, however, you should go back to the original source of the data in the book and try to rework on a hand calculator the calculations which follow from just those original data. Try not to look up later stages in the calculations unless you are irrevocably stuck, and then use the executive summary (if there is one at the end of the chapter) rather than the main text.
Boxes
There will be a lot of individual variation among readers of this book in the knowledge and experience of statistics they have gained in the past, and in their ability to grasp and retain statistical concepts. At certain points, therefore, some will be happy to move on without any further explanation from me or any further repetition of calculation procedures.
For those less happy to take things for granted at such points, I have placed the material and calculations they are likely to find helpful in boxes in order not to hold back or irritate the others. Calculations in the boxes may prove particularly helpful if, as suggested above, you are reworking a numerical example from the text and need to refer to a box to find out why you are stuck or perhaps where you went wrong.
Spare-time activities
These are numerical exercises you should be equipped to complete by the time you reach them at the end of several of the chapters.
That is the time to stop and do them. Unlike the within-chapter numerical examples, you should feel quite free to use any material in previous chapters or executive summaries to remind you of the procedures involved and guide you through them. Use a hand calculator and remember to write down the results of intermediate calculations. This will make it much easier for you to detect where you went wrong if your answers do not match the solution to that exercise given in Appendix C. Do read the beginning of that appendix early on: it explains that you should not worry or waste time recalculating if your numbers are similar, even if they are not identical. I can assure you, you will recognise - when you compare your figures with the 'solution' - if you have followed the statistical steps of the exercise correctly; you will also immediately recognise if you have not.
Doing these exercises conscientiously with a hand calculator or spreadsheet, and when you reach them in the book rather than much later, is really important. They are the best things in the book for impressing the subject into your long-term memory and for giving you confidence that you understand what you are doing.
The authors of most other statistics books recognise this and also include exercises. If you're willing, I would encourage you to gain more confidence and experience by going on to try the methods as described in this book on their exercises.
By the way, a blank spreadsheet such as Excel makes a grand substitute for a hand calculator, with the added advantage that repeat calculations (e.g. squaring numbers) can be copied and pasted from the first number to all the others.
Executive summaries
Certain chapters end with such a summary, which aims to condense the meat of the chapter into little over a page or so. The summaries provide a condensed reference source for the calculations scattered throughout the previous chapter, with hopefully enough explanatory wording to jog your memory about how the calculations were derived. They will therefore prove useful when you tackle the spare-time activities.
Why go to all that bother?
You might ask (and some of the reviews of the first edition did): why teach how to do statistical analyses on a hand calculator when we can type the data into a computer program and get all the calculations...