Statistical Analysis of Geographical Data

Name: Statistical Analysis of Geographical Data | An Introduction
Brand: Wiley-Blackwell
Price: 38.99 EUR
Availability: OnlineOnly

An Introduction

Simon James Dadson(Author)

Wiley-Blackwell (Publisher)

Published on 8. March 2017

264 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-118-52514-2 (ISBN)

€38.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Other editions

Person

Content

Preface xi

1 Dealing with data 1

1.1 The role of statistics in geography 1

1.1.1 Why do geographers need to use statistics? 1

1.2 About this book 3

1.3 Data and measurement error 3

1.3.1 Types of geographical data: nominal, ordinal, interval, and ratio 3

1.3.2 Spatial data types 5

1.3.3 Measurement error, accuracy and precision 6

1.3.4 Reporting data and uncertainties 7

1.3.5 Significant figures 9

1.3.6 Scientific notation (standard form) 10

1.3.7 Calculations in scientific notation 11

Exercises 12

2 Collecting and summarizing data 13

2.1 Sampling methods 13

2.1.1 Research design 13

2.1.2 Random sampling 15

2.1.3 Systematic sampling 16

2.1.4 Stratified sampling 17

2.2 Graphical summaries 17

2.2.1 Frequency distributions and histograms 17

2.2.2 Time series plots 21

2.2.3 Scatter plots 22

2.3 Summarizing data numerically 24

2.3.1 Measures of central tendency: mean, median and mode 24

2.3.2 Mean 24

2.3.3 Median 25

2.3.4 Mode 25

2.3.5 Measures of dispersion 28

2.3.6 Variance 29

2.3.7 Standard deviation 30

2.3.8 Coefficient of variation 30

2.3.9 Skewness and kurtosis 33

Exercises 33

3 Probability and sampling distributions 37

3.1 Probability 37

3.1.1 Probability, statistics and random variables 37

3.1.2 The properties of the normal distribution 38

3.2 Probability and the normal distribution: z-scores 39

3.3 Sampling distributions and the central limit theorem 43

Exercises 47

4 Estimating parameters with confidence intervals 49

4.1 Confidence intervals on the mean of a normal distribution: the basics 49

4.2 Confidence intervals in practice: the t-distribution 50

4.3 Sample size 53

4.4 Confidence intervals for a proportion 53

Exercises 54

5 Comparing datasets 55

5.1 Hypothesis testing with one sample: general principles 55

5.1.1 Comparing means: one-sample z-test 56

5.1.2 p-values 60

5.1.3 General procedure for hypothesis testing 61

5.2 Comparing means from small samples: one-sample t-test 61

5.3 Comparing proportions for one sample 63

5.4 Comparing two samples 64

5.4.1 Independent samples 64

5.4.2 Comparing means: t-test with unknown population variances assumed equal 64

5.4.3 Comparing means: t-test with unknown population variances assumed unequal 68

5.4.4 t-test for use with paired samples (paired t-test) 71

5.4.5 Comparing variances: F-test 74

5.5 Non-parametric hypothesis testing 75

5.5.1 Parametric and non-parametric tests 75

5.5.2 Mann-whitney U-test 75

Exercises 79

6 Comparing distributions: the Chi-squared test 81

6.1 Chi-squared test with one sample 81

6.2 Chi-squared test for two samples 84

Exercises 87

7 Analysis of variance 89

7.1 One-way analysis of variance 90

7.2 Assumptions and diagnostics 99

7.3 Multiple comparison tests after analysis of variance 101

7.4 Non-parametric methods in the analysis of variance 105

7.5 Summary and further applications 106

Exercises 107

8 Correlation 109

8.1 Correlation analysis 109

8.2 Pearson's product-moment correlation coefficient 110

8.3 Significance tests of correlation coefficient 112

8.4 Spearman's rank correlation coefficient 114

8.5 Correlation and causality 116

Exercises 117

9 Linear regression 121

9.1 Least-squares linear regression 121

9.2 Scatter plots 122

9.3 Choosing the line of best fit: the 'least-squares' procedure 124

9.4 Analysis of residuals 128

9.5 Assumptions and caveats with regression 130

9.6 Is the regression significant? 131

9.7 Coefficient of determination 135

9.8 Confidence intervals and hypothesis tests concerning regression parameters 137

9.8.1 Standard error of the regression parameters 137

9.8.2 Tests on the regression parameters 138

9.8.3 Confidence intervals on the regression parameters 139

9.8.4 Confidence interval about the regression line 140

9.9 Reduced major axis regression 140

9.10 Summary 142

Exercises 142

10 Spatial statistics 145

10.1 Spatial data 145

10.1.1 Types of spatial data 145

10.1.2 Spatial data structures 146

10.1.3 Map projections 149

10.2 Summarizing spatial data 157

10.2.1 Mean centre 157

10.2.2 Weighted mean centre 157

10.2.3 Density estimation 158

10.3 Identifying clusters 159

10.3.1 Quadrat test 159

10.3.2 Nearest neighbour statistics 162

10.4 Interpolation and plotting contour maps 162

10.5 Spatial relationships 163

10.5.1 Spatial autocorrelation 163

10.5.2 Join counts 164

Exercises 171

11 Time series analysis 173

11.1 Time series in geographical research 173

11.2 Analysing time series 174

11.2.1 Describing time series: definitions 174

11.2.2 Plotting time series 175

11.2.3 Decomposing time series: trends, seasonality and irregular fluctuations 179

11.2.4 Analysing trends 180

11.2.5 Removing trends ('detrending' data) 186

11.2.6 Quantifying seasonal variation 187

11.2.7 Autocorrelation 189

11.3 Summary 190

Exercises 190

Appendix A: Introduction to the R package 193

Appendix B: Statistical tables 205

References 241

Index 243

1
Dealing with data

STUDY OBJECTIVES

Understand the nature and purpose of statistical analysis in geography.
View statistical analysis as a means of thinking critically with quantitative information.
Distinguish between the different types of geographical data and their uses and limitations.
Understand the nature of measurement error and the need to account for error when making quantitative statements.
Distinguish between accuracy and precision and to understand how to report the precision of geographical measurements.
Appreciate the methodological limitations of statistical data analysis.

1.1 The role of statistics in geography

1.1.1 Why do geographers need to use statistics?

Statistical analysis involves the collection, analysis and presentation of numerical information. It involves establishing the degree to which numerical summaries about observations can be justified, and provides the basis for forming judgements from empirical data.

Take the following media headlines, for example:

We know in the next 20 years the world population will increase to something like 8.3 billion people.

Sir John Beddington, UK Government Chief Scientist1

2010 hits global temperature high.

BBC News, 20th January 20112

Each of these statements invites critical scrutiny. The reliability of their sources encourages us to take them seriously, but how do we know that they are correct? It is hard enough to try to predict what one human being will do in any particular year, let alone what several billion are going to do in the next 20 years. How were these predictions made? How was the rate of change of world population calculated? What were the assumptions? What does the author mean by 'something like'? The number 8.3 billion is quite a precise number: why didn't the author just say 8 billion or almost 10 billion?

Similarly, how do we know that 2010 is the global temperature high, when temperature is only measured at a small number of measuring stations? How would we go on to investigate whether anthropogenic warming caused the record-breaking temperature in 2010 or whether it was just a fluke?

Statistical analysis provides some of the tools that can answer some of these questions. This book introduces a set of techniques that allow you to make sure that the statistical statements that you make in your own work are based on a sound interpretation of the data that you collect.

There are four main reasons to use statistical techniques:

to describe and measure the things that you observe;
to characterize measurement error in your observations;
to test hypotheses and theories;
to predict and explain the relationships between variables.

1.2 About this book

One of the best ways to learn any mathematical skill is through repeated practice, so the approach taken in this book uses many examples. The presentation of each topic begins with an introduction to the theoretical principles: this is then followed by a worked example. Additional exercises are given to allow the reader to develop their understanding of the topics involved.

The use of computer packages is now common in statistical analysis in geography: it removes many of the tedious aspects of statistical calculation leaving the analyst to focus on experimental design, data collection, and interpretation. Nevertheless, it is essential to understand how the properties of the underlying data affect the value of the resulting statistics or the outcome of the test under evaluation.

Two kinds of computer software are referred to in this book. The more basic calculations can be performed using a spreadsheet such as Microsoft Excel. The advantages of Excel are that its user interface is well-known and it is almost universally available in university departments and on student computers. For more advanced analysis, and in situations where the user wishes to process large quantities of data automatically, more specialized statistical software is better. This book also refers to the open-source statistical package called 'R' which is freely available from http://www.r-project.org/. In addition to offering a comprehensive collection of well-documented statistical routines, the R software provides a scripting facility for automation of complex data analysis tasks and can produce publication-quality graphics.

1.3 Data and measurement error

1.3.1 Types of geographical data: nominal, ordinal, interval, and ratio

Four main types of data are of interest to geographers: nominal, ordinal, interval, and ratio. Nominal data are recorded using categories. For example, if you were to interview a group of people and record their gender, the resulting data would be on a nominal, or categorical, scale. Similarly, if an ecologist were to categorize the plant species found in an area by counting the number of individual plants observed in different categories, the resulting dataset would be categorical, or nominal. The distinguishing property of nominal data is that the categories are simply names - they cannot be ranked relative to each other.

Observations recorded on an ordinal scale can be put into an order relative to one another. For example, a study in which countries are ranked by their popularity as tourist destinations would result in an ordinal dataset. A requirement here is that it is possible to identify whether one observation is larger or smaller than another, based on some measure defined by the analyst.

In contrast with nominal and ordinal scale data, interval scale data are measured on a continuous scale where the differences between different measurements are meaningful. A good example is air temperature, which can be measured to a degree of precision dictated by the quality of the thermometer being used, among other factors. Whilst it is possible to add and subtract interval scale data, they cannot be multiplied or divided. For example, it is correct to say that 30 degrees is 10 degrees hotter than 20 degrees, but it is not correct to say that 200 degrees is twice as hot as 100 degrees. This is because the Celsius temperature scale, like the Fahrenheit scale, has an arbitrarily defined origin.

Ratio scale data are similar to interval scale data but a true zero point is required, and multiplication and division are valid operations when dealing with ratio scale data. Mass is a good example: an adult with a mass of 70 kg is twice as heavy as a child with a mass of 35 kg. Temperature measured on the Kelvin scale, which has an absolute zero point, is also defined as a ratio scale measurement.

It is important from the outset of any investigation to be aware of the different types of geographical data that can be recorded, because some statistical techniques can only be applied to certain types of data. Whilst it is usually possible to convert interval data into ordinal or nominal data (e.g. rainfall values can be ranked or put into categories), it is not possible to make the conversion the other way around.

1.3.2 Spatial data types

Geographers collect data about many different subjects. Some geographical datasets have distinctly spatial components to them. In other words, they contain information about the location of a particular entity, or information about how a particular quantity varies across a region of interest. In many contexts, it is advantageous to collect information on the locations of objects in space, or to record details of the spatial relationships between entities. The two main types of spatial data that can be used are vector data and raster (or gridded) data. Vector data consist of information that is stored as a set of points that are connected to known locations in space (e.g. to represent towns, sampling locations, or places of interest). The points may be connected to form lines (e.g. to represent linear features such as roads, rivers and railways), and the lines may be connected to form polygons (e.g. to represent areas of different land cover, geological units, or administrative units).

The locations of points must be given with reference to a coordinate system which may be rectangular (i.e. given using eastings and northings in linear units such as metres), or spherical (i.e. given using latitudes and longitudes in angular units such as degrees), but which always requires the definition of unit vectors and a fixed point of origin. The most common spherical coordinate system is that of latitude and longitude, which measures points by their angular distance from an origin which is located at the equator (zero latitude) and the Greenwich meridian (zero longitude). Thus the latitude of Buckingham Palace in London, UK, is 0.14°W, 51.50°N indicating that it is 0.14 degrees west of Greenwich and 51.5 degrees north of the equator.

Whilst spherical coordinate systems are commonly used in aviation and marine navigation, and with the arrival of GPS, terrestrial navigation usually uses rectangular coordinate systems. In order to use rectangular coordinates, the spherical form of the Earth must be represented on a flat surface. This is achieved using a map projection. An example of a map projection that is used to obtain a rectangular coordinate system is the Great Britain National Grid, in which locations are defined in metres east and north of a fixed origin that is located to the south west of the Scilly Isles. Thus to give a grid...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Statistical Analysis of Geographical Data

Description

More details

Other editions

Additional editions

Person

Content

1
Dealing with data

STUDY OBJECTIVES

1.1 The role of statistics in geography

1.1.1 Why do geographers need to use statistics?

1.2 About this book

1.3 Data and measurement error

1.3.1 Types of geographical data: nominal, ordinal, interval, and ratio

1.3.2 Spatial data types

System requirements