
Statistical Analysis of Geographical Data
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions


Person
Content
Preface xi
1 Dealing with data 1
1.1 The role of statistics in geography 1
1.1.1 Why do geographers need to use statistics? 1
1.2 About this book 3
1.3 Data and measurement error 3
1.3.1 Types of geographical data: nominal, ordinal, interval, and ratio 3
1.3.2 Spatial data types 5
1.3.3 Measurement error, accuracy and precision 6
1.3.4 Reporting data and uncertainties 7
1.3.5 Significant figures 9
1.3.6 Scientific notation (standard form) 10
1.3.7 Calculations in scientific notation 11
Exercises 12
2 Collecting and summarizing data 13
2.1 Sampling methods 13
2.1.1 Research design 13
2.1.2 Random sampling 15
2.1.3 Systematic sampling 16
2.1.4 Stratified sampling 17
2.2 Graphical summaries 17
2.2.1 Frequency distributions and histograms 17
2.2.2 Time series plots 21
2.2.3 Scatter plots 22
2.3 Summarizing data numerically 24
2.3.1 Measures of central tendency: mean, median and mode 24
2.3.2 Mean 24
2.3.3 Median 25
2.3.4 Mode 25
2.3.5 Measures of dispersion 28
2.3.6 Variance 29
2.3.7 Standard deviation 30
2.3.8 Coefficient of variation 30
2.3.9 Skewness and kurtosis 33
Exercises 33
3 Probability and sampling distributions 37
3.1 Probability 37
3.1.1 Probability, statistics and random variables 37
3.1.2 The properties of the normal distribution 38
3.2 Probability and the normal distribution: z-scores 39
3.3 Sampling distributions and the central limit theorem 43
Exercises 47
4 Estimating parameters with confidence intervals 49
4.1 Confidence intervals on the mean of a normal distribution: the basics 49
4.2 Confidence intervals in practice: the t-distribution 50
4.3 Sample size 53
4.4 Confidence intervals for a proportion 53
Exercises 54
5 Comparing datasets 55
5.1 Hypothesis testing with one sample: general principles 55
5.1.1 Comparing means: one-sample z-test 56
5.1.2 p-values 60
5.1.3 General procedure for hypothesis testing 61
5.2 Comparing means from small samples: one-sample t-test 61
5.3 Comparing proportions for one sample 63
5.4 Comparing two samples 64
5.4.1 Independent samples 64
5.4.2 Comparing means: t-test with unknown population variances assumed equal 64
5.4.3 Comparing means: t-test with unknown population variances assumed unequal 68
5.4.4 t-test for use with paired samples (paired t-test) 71
5.4.5 Comparing variances: F-test 74
5.5 Non-parametric hypothesis testing 75
5.5.1 Parametric and non-parametric tests 75
5.5.2 Mann-whitney U-test 75
Exercises 79
6 Comparing distributions: the Chi-squared test 81
6.1 Chi-squared test with one sample 81
6.2 Chi-squared test for two samples 84
Exercises 87
7 Analysis of variance 89
7.1 One-way analysis of variance 90
7.2 Assumptions and diagnostics 99
7.3 Multiple comparison tests after analysis of variance 101
7.4 Non-parametric methods in the analysis of variance 105
7.5 Summary and further applications 106
Exercises 107
8 Correlation 109
8.1 Correlation analysis 109
8.2 Pearson's product-moment correlation coefficient 110
8.3 Significance tests of correlation coefficient 112
8.4 Spearman's rank correlation coefficient 114
8.5 Correlation and causality 116
Exercises 117
9 Linear regression 121
9.1 Least-squares linear regression 121
9.2 Scatter plots 122
9.3 Choosing the line of best fit: the 'least-squares' procedure 124
9.4 Analysis of residuals 128
9.5 Assumptions and caveats with regression 130
9.6 Is the regression significant? 131
9.7 Coefficient of determination 135
9.8 Confidence intervals and hypothesis tests concerning regression parameters 137
9.8.1 Standard error of the regression parameters 137
9.8.2 Tests on the regression parameters 138
9.8.3 Confidence intervals on the regression parameters 139
9.8.4 Confidence interval about the regression line 140
9.9 Reduced major axis regression 140
9.10 Summary 142
Exercises 142
10 Spatial statistics 145
10.1 Spatial data 145
10.1.1 Types of spatial data 145
10.1.2 Spatial data structures 146
10.1.3 Map projections 149
10.2 Summarizing spatial data 157
10.2.1 Mean centre 157
10.2.2 Weighted mean centre 157
10.2.3 Density estimation 158
10.3 Identifying clusters 159
10.3.1 Quadrat test 159
10.3.2 Nearest neighbour statistics 162
10.4 Interpolation and plotting contour maps 162
10.5 Spatial relationships 163
10.5.1 Spatial autocorrelation 163
10.5.2 Join counts 164
Exercises 171
11 Time series analysis 173
11.1 Time series in geographical research 173
11.2 Analysing time series 174
11.2.1 Describing time series: definitions 174
11.2.2 Plotting time series 175
11.2.3 Decomposing time series: trends, seasonality and irregular fluctuations 179
11.2.4 Analysing trends 180
11.2.5 Removing trends ('detrending' data) 186
11.2.6 Quantifying seasonal variation 187
11.2.7 Autocorrelation 189
11.3 Summary 190
Exercises 190
Appendix A: Introduction to the R package 193
Appendix B: Statistical tables 205
References 241
Index 243
1
Dealing with data
STUDY OBJECTIVES
- Understand the nature and purpose of statistical analysis in geography.
- View statistical analysis as a means of thinking critically with quantitative information.
- Distinguish between the different types of geographical data and their uses and limitations.
- Understand the nature of measurement error and the need to account for error when making quantitative statements.
- Distinguish between accuracy and precision and to understand how to report the precision of geographical measurements.
- Appreciate the methodological limitations of statistical data analysis.
1.1 The role of statistics in geography
1.1.1 Why do geographers need to use statistics?
Statistical analysis involves the collection, analysis and presentation of numerical information. It involves establishing the degree to which numerical summaries about observations can be justified, and provides the basis for forming judgements from empirical data.
Take the following media headlines, for example:
We know in the next 20 years the world population will increase to something like 8.3 billion people.
Sir John Beddington, UK Government Chief Scientist1
2010 hits global temperature high.
BBC News, 20th January 20112
Each of these statements invites critical scrutiny. The reliability of their sources encourages us to take them seriously, but how do we know that they are correct? It is hard enough to try to predict what one human being will do in any particular year, let alone what several billion are going to do in the next 20 years. How were these predictions made? How was the rate of change of world population calculated? What were the assumptions? What does the author mean by 'something like'? The number 8.3 billion is quite a precise number: why didn't the author just say 8 billion or almost 10 billion?
Similarly, how do we know that 2010 is the global temperature high, when temperature is only measured at a small number of measuring stations? How would we go on to investigate whether anthropogenic warming caused the record-breaking temperature in 2010 or whether it was just a fluke?
Statistical analysis provides some of the tools that can answer some of these questions. This book introduces a set of techniques that allow you to make sure that the statistical statements that you make in your own work are based on a sound interpretation of the data that you collect.
There are four main reasons to use statistical techniques:
- to describe and measure the things that you observe;
- to characterize measurement error in your observations;
- to test hypotheses and theories;
- to predict and explain the relationships between variables.
1.2 About this book
One of the best ways to learn any mathematical skill is through repeated practice, so the approach taken in this book uses many examples. The presentation of each topic begins with an introduction to the theoretical principles: this is then followed by a worked example. Additional exercises are given to allow the reader to develop their understanding of the topics involved.
The use of computer packages is now common in statistical analysis in geography: it removes many of the tedious aspects of statistical calculation leaving the analyst to focus on experimental design, data collection, and interpretation. Nevertheless, it is essential to understand how the properties of the underlying data affect the value of the resulting statistics or the outcome of the test under evaluation.
Two kinds of computer software are referred to in this book. The more basic calculations can be performed using a spreadsheet such as Microsoft Excel. The advantages of Excel are that its user interface is well-known and it is almost universally available in university departments and on student computers. For more advanced analysis, and in situations where the user wishes to process large quantities of data automatically, more specialized statistical software is better. This book also refers to the open-source statistical package called 'R' which is freely available from http://www.r-project.org/. In addition to offering a comprehensive collection of well-documented statistical routines, the R software provides a scripting facility for automation of complex data analysis tasks and can produce publication-quality graphics.
1.3 Data and measurement error
1.3.1 Types of geographical data: nominal, ordinal, interval, and ratio
Four main types of data are of interest to geographers: nominal, ordinal, interval, and ratio. Nominal data are recorded using categories. For example, if you were to interview a group of people and record their gender, the resulting data would be on a nominal, or categorical, scale. Similarly, if an ecologist were to categorize the plant species found in an area by counting the number of individual plants observed in different categories, the resulting dataset would be categorical, or nominal. The distinguishing property of nominal data is that the categories are simply names - they cannot be ranked relative to each other.
Observations recorded on an ordinal scale can be put into an order relative to one another. For example, a study in which countries are ranked by their popularity as tourist destinations would result in an ordinal dataset. A requirement here is that it is possible to identify whether one observation is larger or smaller than another, based on some measure defined by the analyst.
In contrast with nominal and ordinal scale data, interval scale data are measured on a continuous scale where the differences between different measurements are meaningful. A good example is air temperature, which can be measured to a degree of precision dictated by the quality of the thermometer being used, among other factors. Whilst it is possible to add and subtract interval scale data, they cannot be multiplied or divided. For example, it is correct to say that 30 degrees is 10 degrees hotter than 20 degrees, but it is not correct to say that 200 degrees is twice as hot as 100 degrees. This is because the Celsius temperature scale, like the Fahrenheit scale, has an arbitrarily defined origin.
Ratio scale data are similar to interval scale data but a true zero point is required, and multiplication and division are valid operations when dealing with ratio scale data. Mass is a good example: an adult with a mass of 70 kg is twice as heavy as a child with a mass of 35 kg. Temperature measured on the Kelvin scale, which has an absolute zero point, is also defined as a ratio scale measurement.
It is important from the outset of any investigation to be aware of the different types of geographical data that can be recorded, because some statistical techniques can only be applied to certain types of data. Whilst it is usually possible to convert interval data into ordinal or nominal data (e.g. rainfall values can be ranked or put into categories), it is not possible to make the conversion the other way around.
1.3.2 Spatial data types
Geographers collect data about many different subjects. Some geographical datasets have distinctly spatial components to them. In other words, they contain information about the location of a particular entity, or information about how a particular quantity varies across a region of interest. In many contexts, it is advantageous to collect information on the locations of objects in space, or to record details of the spatial relationships between entities. The two main types of spatial data that can be used are vector data and raster (or gridded) data. Vector data consist of information that is stored as a set of points that are connected to known locations in space (e.g. to represent towns, sampling locations, or places of interest). The points may be connected to form lines (e.g. to represent linear features such as roads, rivers and railways), and the lines may be connected to form polygons (e.g. to represent areas of different land cover, geological units, or administrative units).
The locations of points must be given with reference to a coordinate system which may be rectangular (i.e. given using eastings and northings in linear units such as metres), or spherical (i.e. given using latitudes and longitudes in angular units such as degrees), but which always requires the definition of unit vectors and a fixed point of origin. The most common spherical coordinate system is that of latitude and longitude, which measures points by their angular distance from an origin which is located at the equator (zero latitude) and the Greenwich meridian (zero longitude). Thus the latitude of Buckingham Palace in London, UK, is 0.14°W, 51.50°N indicating that it is 0.14 degrees west of Greenwich and 51.5 degrees north of the equator.
Whilst spherical coordinate systems are commonly used in aviation and marine navigation, and with the arrival of GPS, terrestrial navigation usually uses rectangular coordinate systems. In order to use rectangular coordinates, the spherical form of the Earth must be represented on a flat surface. This is achieved using a map projection. An example of a map projection that is used to obtain a rectangular coordinate system is the Great Britain National Grid, in which locations are defined in metres east and north of a fixed origin that is located to the south west of the Scilly Isles. Thus to give a grid...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.