Makes mathematical and statistical analysis understandable to even the least math-minded biology student

This unique textbook aims to demystify statistical formulae for the average biology student. Written in a lively and engaging style, Statistics for Terrified Biologists, 2nd Edition draws on the author's 30 years of lecturing experience to teach statistical methods to even the most guarded of biology students. It presents basic methods using straightforward, jargon-free language. Students are taught to use simple formulae and how to interpret what is being measured with each test and statistic, while at the same time learning to recognize overall patterns and guiding principles. Complemented by simple examples and useful case studies, this is an ideal statistics resource tool for undergraduate biology and environmental science students who lack confidence in their mathematical abilities.

Statistics for Terrified Biologists presents readers with the basic foundations of parametric statistics, the t-test, analysis of variance, linear regression and chi-square, and guides them to important extensions of these techniques. It introduces them to non-parametric tests, and includes a checklist of non-parametric methods linked to their parametric counterparts. The book also provides many end-of-chapter summaries and additional exercises to help readers understand and practice what they've learned.

Presented in a clear and easy-to-understand style

Makes statistics tangible and enjoyable for even the most hesitant student

Features multiple formulas to facilitate comprehension

Written by of the foremost entomologists of his generation

This second edition of Statistics for Terrified Biologists is an invaluable guide that will be of great benefit to pre-health and biology undergraduate students.

<b>Makes mathematical and statistical analysis understandable to even the least math-minded biology student</b>

This unique textbook aims to demystify statistical formulae for the average biology student. Written in a lively and engaging style, <i>Statistics for Terrified Biologists, 2<sup>nd</sup> Edition</i> draws on the author's 30 years of lecturing experience to teach statistical methods to even the most guarded of biology students. It presents basic methods using straightforward, jargon-free language. Students are taught to use simple formulae and how to interpret what is being measured with each test and statistic, while at the same time learning to recognize overall patterns and guiding principles. Complemented by simple examples and useful case studies, this is an ideal statistics resource tool for undergraduate biology and environmental science students who lack confidence in their mathematical abilities.

<i>Statistics for Terrified Biologists</i> presents readers with the basic foundations of parametric statistics, the t-test, analysis of variance, linear regression and chi-square, and guides them to important extensions of these techniques. It introduces them to non-parametric tests, and includes a checklist of non-parametric methods linked to their parametric counterparts. The book also provides many end-of-chapter summaries and additional exercises to help readers understand and practice what they've learned.

<ul><li>Presented in a clear and easy-to-understand style</li><li>Makes statistics tangible and enjoyable for even the most hesitant student</li><li>Features multiple formulas to facilitate comprehension</li><li>Written by of the foremost entomologists of his generation</li></ul>

This second edition of <i>Statistics for Terrified Biologists</i> is an invaluable guide that will be of great benefit to pre-health and biology undergraduate students.

Preface to the second edition xv

Preface to the first edition xvii

<b>1 How to use this book 1</b>

Introduction 1

The text of the chapters 1

What should you do if you run into trouble? 2

Elephants 3

The numerical examples in the text 3

Boxes 4

Spare-time activities 4

Executive summaries 5

Why go to all that bother? 5

The bibliography 7

<b>2 Introduction 9</b>

What are statistics? 9

Notation 10

Notation for calculating the mean 12

<b>3 Summarising variation 13</b>

Introduction 13

Different summaries of variation 14

Range 14

Total deviation 14

Mean deviation 15

Variance 16

Why <i>n</i>?1? 17

Why are the deviations squared? 18

The standard deviation 19

The next chapter 21

Spare-time activities 21

<b>4 When are sums of squares NOT sums of squares? 23</b>

Introduction 23

Calculating machines offer a quicker method of calculating the sum of squares 24

Added squares 24

The correction factor 24

Avoid being confused by the term <i>sum of squares </i>24

Summary of the calculator method for calculations as far as the standard deviation 25

Spare-time activities 26

<b>5 The normal distribution 27</b>

Introduction 27

Frequency distributions 27

The normal distribution 28

What percentage is a standard deviation worth? 30

Are the percentages always the same as these? 30

Other similar scales in everyday life 33

The standard deviation as an estimate of the frequency of a number occurring in a sample 33

From percentage to probability 34

Executive Summary 1 - The standard deviation 36

<b>6 The relevance of the normal distribution to biological data 39</b>

To recap 39

Is our observed distribution normal? 41

Checking for normality 42

What can we do about a distribution that clearly is not normal? 42

Transformation 42

Grouping samples 47

Doing nothing! 47

How many samples are needed? 47

Type 1 and Type 2 errors 48

Calculating how many samples are needed 49

<b>7 Further calculations from the normal distribution 51</b>

Introduction 51

Is A bigger than B? 52

The yardstick for deciding 52

The standard error of a difference between two means of three eggs 53

Derivation of the standard error of a difference between two means 53

Step 1: from variance of single data to variance of means 55

Step 2: From variance of single data to <i>variance of differences </i>57

Step 3: The combination of Steps 1 and 2: the standard error of difference between means (s.e.d.m.) 58

Recap of the calculation of s.e.d.m. from the variance calculated from the individual values 61

The importance of the standard error of differences between means 61

Summary of this chapter 62

Executive Summary 2 - Standard error of a difference between two means 66

Spare-time activities 67

<b>8 The<i>t</i>-test 69</b>

Introduction 69

The principle of the <i>t</i>-test 70

The <i>t</i>-test in statistical terms 71

Why <i>t</i>? 71

Tables of the <i>t</i>-distribution 72

The standard <i>t</i>-test 75

The procedure 76

The actual <i>t</i>-test 81

<i>t</i>-test for means associated with unequal variances 81

The s.e.d.m. when variances are unequal 82

A worked example of the <i>t</i>-test for means associated with unequal variances 85

The paired <i>t</i>-test 87

Pair when possible 90

Executive Summary 3 - The <i>t</i>-test 92

Spare-time activities 94

<b>9 One tail or two? 95</b>

Introduction 95

Why is the analysis of variance <i>F</i>-test one-tailed? 95

The two-tailed <i>F</i>-test 96

Howmany tails has the <i>t</i>-test? 98

The final conclusion on number of tails 99

<b>10 Analysis of variance (ANOVA): what is it? How does it work? 101</b>

Introduction 101

Sums of squares in ANOVA 102

Some 'made-up' variation to analyse by ANOVA 102

The sum of squares table 104

Using ANOVA to sort out the variation in Table C 104

Phase 1 104

Phase 2 105

SqADS: an important acronym 107

Back to the sum of squares table 108

How well does the analysis reflect the input? 109

End phase 109

Degrees of freedom in ANOVA 110

The completion of the end phase 112

The variance ratio 113

The relationship between <i>t </i>and <i>F </i>114

Constraints on ANOVA 115

Adequate size of experiment 115

Equality of variance between treatments 117

Testing the homogeneity of variance 117

The element of chance: randomisation 118

Comparison between treatment means in ANOVA 119

The least significant difference 121

A caveat about using the LSD 123

Executive Summary 4 - The principle of ANOVA 124

<b>11 Experimental designs for analysis of variance (ANOVA) 129</b>

Introduction 129

Fully randomised 130

Data for analysis of a fully randomised experiment 131

Prelims 132

Phase 1 132

Phase 2 133

End phase 133

Randomised blocks 135

Data for analysis of a randomised block experiment 137

Prelims 138

Phase 1 139

Phase 2 140

End phase 141

Incomplete blocks 142

Latin square 145

Data for the analysis of a Latin square 145

Prelims 146

Phase 1 150

Phase 2 150

End phase 151

Further comments on the Latin square design 152

Split plot 154

Types of analysis of variance 154

One- and two-way analysis of variance 155

Fixed-, random-, and mixed-effects analysis of variance 156

Executive Summary 5 - Analysis of a one-way randomised block experiment 158

Spare-time activities 159

<b>12 Introduction to factorial experiments 163</b>

What is a factorial experiment? 163

Interaction: what does it mean biologically? 165

If there is no interaction 167

What if there IS interaction? 167

How about a biological example? 168

Measuring any interaction between factors is often the main/only purpose of an experiment 170

How does a factorial experiment change the form of the analysis of variance? 171

Degrees of freedom for interactions 171

The similarity between the <i>residual </i>in Phase 2 and the <i>interaction </i>in Phase 3 172

Sums of squares for interactions 172

<b>13 2-Factor factorial experiments 175</b>

Introduction 175

An example of a 2-factor experiment 175

Analysis of the 2-factor experiment 176

Prelims 176

Phase 1 177

Phase 2 177

End phase (of Phase 2) 178

Phase 3 179

End phase (of Phase 3) 183

Two important things to remember about factorials before tackling the next chapter 185

Analysis of factorial experiments with unequal replication 185

Executive Summary 6 - Analysis of a 2-factor randomised block experiment 188

Spare-time activity 190

<b>14 Factorial experiments with more than two factors - leave this out if you wish! 191</b>

Introduction 191

Different 'orders' of interaction 191

Example of a 4-factor experiment 192

Prelims 194

Phase 1 196

Phase 2 196

Phase 3 197

To the end phase 205

Spare-time activity 214

<b>15 Factorial experiments with split plots 217</b>

Introduction 217

Deriving the split plot design from the randomised block design 218

Degrees of freedom in a split plot analysis 221

Main plots 221

Sub-plots 222

Numerical example of a split plot experiment and its analysis 224

Calculating the sums of squares 225

End phase 229

Comparison of split plot and randomised block experiments 229

Uses of split plot designs 233

Spare-time activity 235

<b>16 The <i>t</i>-test in the analysis of variance 237</b>

Introduction 237

Brief recap of relevant earlier sections of this book 238

Least significant difference test 239

Multiple range tests 240

Operating the multiple range test 242

Testing differences between means 246

My rules for testing differences between means 246

Presentation of the results of tests of differences between means 247

The results of the experiments analysed by analysis of variance in Chapters 11-15 249

Fully randomised design (p. 131) 250

Randomised block experiment (p. 137) 251

Latin square design (p. 146) 253

2-Factor experiment (p. 176) 255

4-Factor experiment (p. 195) 257

Split plot experiment (p. 224) 259

Some final advice 261

Spare-time activities 261

<b>17 Linear regression and correlation 263</b>

Introduction 263

Cause and effect 264

Other traps waiting for you to fall into 264

Extrapolating beyond the range of your data 264

Is a straight line appropriate? 265

The distribution of variability 268

Regression 268

Independent and dependent variables 272

The regression coefficient (<i>b</i>) 272

Calculating the regression coefficient (<i>b</i>) 275

The regression equation 281

A worked example on some real data 282

The data 282

Calculating the regression coefficient (<i>b</i>), i.e. the slope of the regression line 282

Calculating the intercept (<i>a</i>) 284

Drawing the regression line 285

Testing the significance of the slope (<i>b</i>) of the regression 286

How well do the points fit the line? The coefficient of determination (<i>r</i><sup>2</sup>) 290

Correlation 291

Derivation of the correlation coefficient (<i>r</i>) 291

An example of correlation 292

Is there a correlation line? 293

Extensions of regression analysis 296

Nonlinear regression 297

Multiple linear regression 298

Multiple nonlinear regression 300

Executive Summary - Linear regression 301

Spare time activities 303

<b>18 Analysis of covariance (ANCOVA) 305</b>

Introduction 305

A worked example of ANCOVA 307

Data: cholesterol levels of subjects given different diets 307

Data: ages of subjects in experiment 308

Regression of cholesterol level on age 309

The structure of the ANCOVA table 312

Total sum of squares 313

Residual sum of squares 314

Corrected means 316

Test for significant difference between means 316

Executive Summary 8 - Analysis of covariance (ANCOVA) 319

Spare-time activity 320

<b>19 Chi-square tests 323</b>

Introduction 323

When not and where not to use <i>𝜒 </i><sup>2</sup> 324

The problem of low frequencies 325

Yates' correction for continuity 325

The <i>𝜒 </i><sup>2</sup> test for <i>goodness of fit </i>326

The case of more than two classes 328

<i>𝜒 </i><sup>2</sup> with heterogeneity 331

Heterogeneity <i>𝜒 </i><sup>2</sup> Analysis with 'Covariance' 333

Association (or contingency) <i>𝜒 </i><sup>2</sup> 335

2 x 2 contingency table 336

Fisher's exact test for a 2 x 2 table 338

Larger contingency tables 340

Interpretation of contingency tables 341

Spare-time activities 343

<b>20 Nonparametric methods (what are they?) 345</b>

Disclaimer 345

Introduction 346

Advantages and disadvantages of parametric and nonparametric methods 347

Where nonparametric methods score 347

Where parametric methods score 349

Some ways data are organised for nonparametric tests 349

The sign test 350

The Kruskal-Wallis analysis of ranks 350

Kendall's rank correlation coefficient 352

The main nonparametric methods that are available 353

Analysis of two replicated treatments as in the <i>t</i>-test (Chapter 8) 353

Analysis of more than two replicated treatments as in the analysis of variance (Chapter 11) 354

Correlation of two variables (Chapter 17) 354

Appendix A How many replicates? 355

Appendix B Statistical tables 365

Appendix C Solutions to spare-time activities 373

Appendix D Bibliography 393

Index 397

# 1

How to use this book

## Chapter features

## Introduction

Don't be misled! This book cannot replace effort on your part. All it can aspire to do is to make that effort effective. The detective thriller only succeeds because you have read it too fast and not really concentrated **-** with **that** approach, you'll find this book just as mysterious.

In fact, you may not get very far if you just read this book at any speed! You will only succeed if you interact with the text, and how you might do this is the topic of most of this chapter.

## The text of the chapters

The chapters, particularly 2-8, develop a train of thought essential to the subject of analysing biological data. You just have to take these chapters in order and quite slowly. There is only one way I know for you to maintain the concentration necessary to comprehension, and that is for you to **make your own summary notes** as you go along.

My Head of Department when I first joined the staff at Reading used to define a university lecture as 'a technique for transferring information from a piece of paper in front of the lecturer to a piece of paper in front of the student, without passing through the heads of either'. That's why I stress **making your own summary notes**. You will retain very little by just reading the text; you'll find that after a while you've been thinking about something totally different while 'reading' several pages **-** we've all been there! The message you should take from my Head of Department's quote above is that just repeating in your writing what you are reading is little better than taking no notes at all: the secret is to digest what you have read and reproduce it in your own words and in summary form. Use plenty of headings and subheadings, boxes linked by arrows, cartoon drawings, etc. Another suggestion is to use different coloured pens for different recurring statistics, such as *variance* and *correction factor*. In fact, use anything that forces you to convert my text into as different a form as possible from the original; **that** will force you to concentrate, to involve your brain and to make it clear to you whether or not you have really understood that bit in the book so that it is safe to move on.

The actual process of making the notes is the critical step **-** you can throw the notes away at a later stage if you wish, though there's no harm in keeping them for a time for revision and reference.

So DON'T MOVE ON until you are ready. You'll only undo the value of previous effort if you persuade yourself that you are ready to move on when in your heart of hearts you know you are fooling yourself!

A key point in the book is Figure 7.5 on p. 64. Take real care to lay an especially good foundation up to there. If you **really** feel at home with this diagram, it is a sure sign that you have conquered any hang-ups and are no longer a 'terrified biologist'.

## What should you do if you run into trouble?

The obvious first step is to go back to the point in the book where you last felt confident, and start again from there.

However, it often helps to see how someone else has explained the same topic, so it's a good idea to have a look at the relevant pages of a different statistics text (see Appendix D for some suggestions). You could also look up the topic on the Internet, where many statisticians have put articles and their lectures to students.

A third possibility is to see if someone can explain things to you face to face. Do you know or have access to someone who might be able to help? If you are at university, it could be a fellow student or even one of the staff. The person who tried to teach statistics to my class at university failed completely as far as I was concerned, but later on I found he could explain things to me quite brilliantly in a one-to-one situation.

## Elephants

At certain points in the text you will find the *sign of the elephant*, i.e. .

They say 'elephants never forget' and the symbol means just that: NEVER FORGET! I have used it to mark some key statistical concepts which, in my experience, people easily forget and as a result run into trouble later on and find it hard to see where they have gone wrong. So, take it from me that it is really well worth making sure these matters are firmly embedded in **your** memory.

## The numerical examples in the text

As stated in the Preface to the First Edition, I soon learnt that biologists don't like *x*. For some reason they prefer a real number but are more prepared to accept, say, 45 as representing any number than they are an *x*! Therefore, in order to avoid 'algebra' as far as possible, I have used actual numbers to illustrate the working of statistical analyses and tests. You probably won't gain a lot by keeping up with me on a hand calculator as I describe the different steps of a calculation, but you should make sure at each step that you understand where each number in a calculation has come from and why it has been included in that way.

When you reach the end of each worked analysis or test, however, you should go back to the original source of the data in the book and try to rework on a hand calculator the calculations which follow from just those original data. Try not to look up later stages in the calculations unless you are irrevocably stuck, and then use the *executive summary* (if there is one at the end of the chapter) rather than the main text.

## Boxes

There will be a lot of individual variation among readers of this book in the knowledge and experience of statistics they have gained in the past, and in their ability to grasp and retain statistical concepts. At certain points, therefore, some will be happy to move on without any further explanation from me or any further repetition of calculation procedures.

For those less happy to take things for granted at such points, I have placed the material and calculations they are likely to find helpful in boxes in order not to hold back or irritate the others. Calculations in the boxes may prove particularly helpful if, as suggested above, you are reworking a numerical example from the text and need to refer to a box to find out why you are stuck or perhaps where you went wrong.

## Spare-time activities

These are numerical exercises you should be equipped to complete by the time you reach them at the end of several of the chapters.

That is the time to stop and do them. Unlike the within-chapter numerical examples, you should feel quite free to use any material in previous chapters or executive summaries to remind you of the procedures involved and guide you through them. Use a **hand calculator** and remember to write down the results of intermediate calculations. This will make it much easier for you to detect where you went wrong if your answers do not match the solution to that exercise given in Appendix C. Do read the beginning of that appendix early on: it explains that you should not worry or waste time recalculating if your numbers are similar, even if they are not identical. I can assure you, you will recognise **-** when you compare your figures with the 'solution' **-** if you have followed the statistical steps of the exercise correctly; you will also immediately recognise if you have not.

Doing these exercises conscientiously with a hand calculator or spreadsheet, and when you reach them in the book rather than much later, is really important. They are the best things in the book for impressing the subject into your long-term memory and for giving you confidence that you understand what you are doing.

The authors of most other statistics books recognise this and also include exercises. If you're willing, I would encourage you to gain more confidence and experience by going on to try the methods as described in this book on their exercises.

By the way, a blank spreadsheet such as Excel makes a grand substitute for a hand calculator, with the added advantage that repeat calculations (e.g. squaring numbers) can be copied and pasted from the first number to all the others.

## Executive summaries

Certain chapters end with such a summary, which aims to condense the meat of the chapter into little over a page or so. The summaries provide a condensed reference source for the calculations scattered throughout the previous chapter, with hopefully enough explanatory wording to jog your memory about how the calculations were derived. They will therefore prove useful when you tackle the *spare-time activities*.

## Why go to all that bother?

You might ask (and some of the reviews of the first edition did): why teach how to do statistical analyses on a hand calculator when we can type the data into a computer program and get all the calculations...