
Paleontological Data Analysis
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
An up-to-date edition of the indispensable guide to analysing paleontological data
Paleontology has developed in recent decades into an increasingly data-driven discipline, which brings to bear a huge variety of statistical tools. Applying statistical methods to paleontological data requires a discipline-specific understanding of which methods and parameters are the most appropriate ones, and how to account for statistical bias inherent in the fossil record. By guiding the reader to these and other fundamental questions in the statistical analysis of fossilized specimens, Paleontological Data Analysis has become the standard text for anyone with an interest in quantitative analysis of the fossil record. Now fully updated to reflect the latest statistical methods and disciplinary advances, it is an essential tool for practitioners and students alike.
Readers of the second edition of Paleontological Data Analysis readers will also find:
* New sections on machine learning, Bayesian inference, phylogenetic comparative methods, analysis of CT data, and much more
* New use cases and examples using PAST, R, and Python software packages
* Full color illustrations throughout
Paleontological Data Analysis is ideal for paleontologists, evolutionary biologists, taxonomists, and students in any of these fields.
More details
Other editions
Additional editions

Persons
Øyvind Hammer, PhD, is Professor of Paleontology at the University of Oslo, Norway. He has published very widely on paleontological subjects, and is co-author of the paleontological data analysis software PAST.
David A.T. Harper, DSc, is Emeritus Professor of Paleontology at Durham University, UK. He has published extensively, including numerous monographs and textbooks, and developed the software PAST along with Øyvind Hammer.
Content
Preface ix
Acknowledgements xi
1 Introduction 1
1.1 The nature of paleontological data 1
1.2 Advantages and pitfalls of paleontological data analysis 5
1.3 Software 7
References 8
2 Statistical concepts 9
2.1 The population and the sample 9
2.2 The frequency distribution of the population 9
2.3 The normal distribution 11
2.4 Cumulative probability 12
2.5 The statistical sample, estimation of distribution parameters 14
2.6 Null hypothesis significance testing 16
2.7 Bayesian inference 20
2.8 Exploratory data analysis 22
References 22
3 Introduction to data visualization 24
3.1 Graphic design principles 24
3.2 Line charts 25
3.3 Scatter plots 26
3.4 Histograms 26
3.5 Bar chart, box, and violin plots 29
3.6 Normal probability plot 29
3.7 Pie charts 31
3.8 Ternary plots 32
3.9 Heat maps, 3D plots, and Geographic Information System 33
3.10 Plotting with R and Python 33
References 37
4 Univariate and bivariate statistical methods 38
4.1 Parameter estimation and confidence intervals 38
4.2 Testing for distribution 40
4.3 Two-sample tests 43
4.4 Multiple-sample tests 52
4.5 Correlation 58
4.6 Bivariate linear regression 64
4.7 Generalized linear models 70
4.8 Polynomial and nonlinear regression 73
4.9 Mixture analysis 74
4.10 Counts and contingency tables 76
References 78
5 Introduction to multivariate data analysis 81
5.1 Multivariate distributions 82
5.2 Parametric multivariate tests - Hotelling's T 2 82
5.3 Nonparametric multivariate tests - permutation test 85
5.4 Hierarchical cluster analysis 86
5.5 K-means and k-medoids cluster analysis 92
References 94
6 Morphometrics 96
6.1 The allometric equation 97
6.2 Principal components analysis 101
6.3 Multivariate allometry 108
6.4 Linear discriminant analysis 112
6.5 Multivariate analysis of variance 116
6.6 Fourier shape analysis in polar coordinates 116
6.7 Elliptic Fourier analysis 119
6.8 Hangle Fourier analysis 122
6.9 Eigenshape analysis 123
6.10 Landmarks and size measures 125
6.11 Procrustes fitting 127
6.12 PCA of landmark data 130
6.13 Thin-plate spline deformations 132
6.14 Principal and partial warps 136
6.15 Relative warps 139
6.16 Regression of warp scores 141
6.17 Common allometric component analysis 142
6.18 Landmarks in 3D 143
6.19 Disparity measures 144
6.20 Morphogroup identification with machine learning 146
6.21 Case study: the ontogeny of a Silurian trilobite 153
References 157
7 Directional and spatial data analysis 162
7.1 Analysis of directions and orientations in 2D 162
7.2 Analysis of directions and orientations in 3D 164
7.3 Spatial point pattern analysis 166
References 173
8 Analysis of tomographic and 3D-scan data 174
8.1 The technology of x-ray tomography 174
8.2 Processing of volume data 175
8.3 Functional morphology with 3D data 180
References 182
9 Estimating paleobiodiversity 184
9.1 Species richness estimation 185
9.2 Rarefaction and related methods 187
9.3 Diversity curves, origination, and extinction rates 192
9.4 Abundance-based biodiversity indices 196
9.5 Taxonomic distinctness 202
9.6 Comparison of diversity indices 207
9.7 Abundance models 208
References 212
10 Paleoecology and paleobiogeography 216
10.1 Paleobiogeography 216
10.2 Paleoecology 217
10.3 Association similarity indices for presence-absence data 219
10.4 Association similarity indices for abundance data 223
10.5 ANOSIM and PerMANOVA 228
10.6 Principal coordinates analysis 229
10.7 Non-metric multidimensional scaling 232
10.8 Correspondence analysis 236
10.9 Detrended correspondence analysis 240
10.10 Seriation 242
10.11 Nonlinear dimensionality reduction 245
10.12 Canonical correspondence analysis 248
10.13 Indicator species 251
10.14 Network analysis 252
10.15 Size-frequency and survivorship curves 254
10.16 Case study: Devonian paleobiogeography 256
References 259
11 Calibration - estimating paleoenvironments 263
11.1 Modern analog technique 263
11.2 Weighted averaging 265
11.3 Weighted averaging partial least squares 267
11.4 Which calibration method? 269
11.5 Case study: Late Holocene temperature inferred from chironomids 271
References 271
12 Time series analysis 273
12.1 Spectral analysis 274
12.2 Wavelet analysis 282
12.3 Autocorrelation 284
12.4 Cross-correlation 287
12.5 Runs test 290
12.6 Time Series Trends and Regression 291
12.7 Smoothing and filtering 293
References 297
13 Quantitative biostratigraphy 299
13.1 Zonation of a single section 299
13.2 Confidence intervals on stratigraphic ranges 301
13.3 Regional and global biostratigraphic correlation 304
13.4 Age models 330
References 335
14 Phylogenetic analysis 338
14.1 A dictionary of cladistics 338
14.2 Parsimony analysis 339
14.3 Characters 341
14.4 Algorithms for Parsimony Analysis 342
14.5 Character state reconstruction 347
14.6 Evaluation of characters and trees 348
14.7 Case study: the systematics of heterosporous ferns 355
14.8 Other methods for phylogenetic analysis 359
14.9 Phylogenetic Comparative Methods 362
References 368
Index 371
1
Introduction
1.1 The nature of paleontological data
Paleontology is a diverse field, with many different types of data and a corresponding variety of analytical methods. For example, the types of data used in ecological, morphological, and phylogenetic analyses are often quite different in both form and quality. Data relevant to the investigation of morphological variation in Ordovician brachiopods are quite different from those gathered from Neogene mammal-dominated assemblages for paleoecological analyses. Nevertheless, there is a surprising commonality of techniques that can be implemented in the investigation of such apparently contrasting data sets.
1.1.1 Univariate measurements
Perhaps the simplest type of data is straightforward, continuous measurements such as length or width. Such measurements are typically made on a number of specimens in one or more samples, commonly from collections from different localities or species. Typically, the investigator is interested in characterizing, and subsequently analyzing, the sets of measurements, and comparing two or more samples to see how they differ. Measurements in units of arbitrary origin (such as temperature on the Celsius or Fahrenheit scales) are known as interval data, while measurements in units of a fixed origin (such as distance and mass) are called ratio data. Methods for interval and ratio data are presented in chapter 4.
1.1.2 Bivariate measurements
Also relatively easy to handle are bivariate interval or ratio measurements on a number of specimens. Each specimen is then characterized by a pair of values, such as both length and width of a given fossil. In addition to comparing samples, we will then typically wish to know if and how the two variables are interrelated. Do they, for example, fit on a straight line, indicating a static (isometric) mode of growth or do they exhibit more complex (anisometric) growth patterns? Bivariate data are discussed in chapters 4 and 6.
1.1.3 Multivariate morphometric measurements
The next step in increasing complexity is multivariate morphometric data, involving multiple variables such as length, width, thickness, and, say, distance between ribs (Fig. 1.1). We may wish to investigate the structure of such data and whether two or more samples are different. Multivariate data sets can be difficult to visualize, and special methods have been devised to emphasize any inherent structure. Special types of multivariate morphometric data include digitized outlines and the coordinates of landmarks. Multivariate methods useful for morphometrics are discussed in chapters 5 and 6.
Figure 1.1 Typical set of measurements made on conjoined valves of the living brachiopod Terebratulina from the west coast of Scotland. A: Measurements of the length of ventral (LENPV) and dorsal (LENBV) valves, together with hinge (HINWI) and maximum (MAXWI) widths. B: Thickness (THICK). C: Position of maximum width (POSMW) and numbers of ribs per mm at, say, 2.5 mm (RAT25) and 5.0 mm (RAT50) from the posterior margin can form the basis for univariate, bivariate, and multivariate analysis of the morphological features of this brachiopod genus.
1.1.4 Character matrices for phylogenetic analysis
A special type of morphological (or molecular) data is the character matrix for phylogenetic analysis (cladistics). In such a matrix, taxa are conventionally entered in rows and characters in columns. The state of each character for each taxon is typically coded with an integer. Character matrices and their analyses are treated in chapter 14.
1.1.5 Paleoecology and paleobiogeography - taxa in samples
In paleoecology and paleobiogeography, the most common data type consists of taxonomic counts at different localities or stratigraphic levels. Such data are typically given in an abundance matrix, either with taxa in rows and samples in columns or vice versa. Each cell contains a specimen count (or abundance) for a particular taxon in a particular sample (Table 1.1).
In some cases, we do not have specimen counts available; instead, we only know whether a taxon is present or absent in the sample. Such information is specified in a presence-absence table, where absences are typically coded with zeros and presences with ones. This type of binary data may be more relevant to biogeographic or stratigraphic analyses, where each taxon is weighted equally.
Typical questions include whether the samples are significantly different from each other, whether the biodiversity is different, whether there are well-separated groups of localities, or any indication of an environmental gradient. Methods for analyzing taxa-in-samples data are discussed in chapters 5, 9, and 10 (Fig. 1.2).
1.1.6 Time series
A time series is a sequence of values through time. Paleontological time series include diversity curves, geochemical data from fossils through time, and thicknesses of bands from a series of growth increments. For such a time series, we may wish to investigate whether there is any trend or periodicity. Some appropriate methods are given in chapter 12.
1.1.7 Biostratigraphic data
Biostratigraphy is the correlation and zonation of sedimentary strata based on fossils. Biostratigraphic data are mainly of two different types. The first is a single table with localities in rows and events (such as the first or last appearance of a species) in columns. Each cell of the table contains the stratigraphic level, in meters, feet, or ranked order, of a particular event at a particular locality (Table 1.2).
Table 1.1 Example of an abundance matrix.
Spooky Creek Scary Ridge Creepy Canyon M. horridus 0 43 15 A. cornuta 7 12 94 P. giganteus 23 0 32Figure 1.2 A muddy-sand community from the Early Jurassic. Community reconstruction is based on the presence of and partly the relative abundance of taxa.
After Benton and Harper (2020), modified from McKerrow (1978).
Table 1.2 Example of an event table.
M. horridus FAD A. cornuta FAD M. horridus FAD Spooky Creek 3.2 4.0 7.6 Scary Ridge 1.4 5.3 4.2 Creepy Canyon 13.8 13.8 15.9FAD, first appearance datum.
Such event tables form the input to biostratigraphic methods such as ranking-scaling and constrained optimization.
The second main type of biostratigraphic data consists of a single presence-absence matrix for each locality. Within each table, the samples are sorted according to stratigraphic level. Such data are used in the method of unitary associations.
Quantitative biostratigraphy is the subject of chapter 13 (Fig. 1.3).
Figure 1.3 Range chart of fossils, mainly graptolites, through Darriwilian and Sandbian (Middle and Upper Ordovician) strata in southern Sweden. The boundary between the two stages (green horizontal line) is fixed to coincide with the first appearance of the graptolite Nemagraptus gracilis.
Harper et al. (2022)/Geological Society of London.
1.2 Advantages and pitfalls of paleontological data analysis
During the last few decades, paleontology has comfortably joined the other sciences in its emphasis on quantitative methodologies. Statistical and explorative analytical methods, most of them depending on computers for their practical implementation, are now used in all branches of paleontology, from systematics and morphology to paleoecology and biostratigraphy. Over the last 100 years, techniques have evolved with available hardware, from the longhand calculations of the early twentieth century, through the time-consuming mainframe implementation of algorithms in the mid-twentieth century to the microcomputer revolution of the late twentieth century.
In general terms, there are three main components to any paleontological investigation. A detailed description of a taxon or community is followed by an analysis of the data (this may involve a look at ontogenetic or size-independent shape variation in a taxon or the population dynamics and structure of a community) with finally a comparison with other relevant and usually similar paleontological units. Numerical techniques have greatly enhanced the description, analysis, and comparison of fossil taxa and assemblages. Scientific hypotheses can be more clearly framed and statistically tested with numerical data....
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.