Paleontological Data Analysis

Name: Paleontological Data Analysis
Brand: Wiley-ISTE
Price: 110.99 EUR
Availability: OnlineOnly

Øyvind Hammer David A. T. Harper(Author)

Wiley-ISTE (Publisher)

2nd Edition

Published on 5. March 2024

740 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-119-93395-3 (ISBN)

€110.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Other editions

Persons

Content

Preface ix

Acknowledgements xi

1 Introduction 1

1.1 The nature of paleontological data 1

1.2 Advantages and pitfalls of paleontological data analysis 5

1.3 Software 7

References 8

2 Statistical concepts 9

2.1 The population and the sample 9

2.2 The frequency distribution of the population 9

2.3 The normal distribution 11

2.4 Cumulative probability 12

2.5 The statistical sample, estimation of distribution parameters 14

2.6 Null hypothesis significance testing 16

2.7 Bayesian inference 20

2.8 Exploratory data analysis 22

References 22

3 Introduction to data visualization 24

3.1 Graphic design principles 24

3.2 Line charts 25

3.3 Scatter plots 26

3.4 Histograms 26

3.5 Bar chart, box, and violin plots 29

3.6 Normal probability plot 29

3.7 Pie charts 31

3.8 Ternary plots 32

3.9 Heat maps, 3D plots, and Geographic Information System 33

3.10 Plotting with R and Python 33

References 37

4 Univariate and bivariate statistical methods 38

4.1 Parameter estimation and confidence intervals 38

4.2 Testing for distribution 40

4.3 Two-sample tests 43

4.4 Multiple-sample tests 52

4.5 Correlation 58

4.6 Bivariate linear regression 64

4.7 Generalized linear models 70

4.8 Polynomial and nonlinear regression 73

4.9 Mixture analysis 74

4.10 Counts and contingency tables 76

References 78

5 Introduction to multivariate data analysis 81

5.1 Multivariate distributions 82

5.2 Parametric multivariate tests - Hotelling's T 2 82

5.3 Nonparametric multivariate tests - permutation test 85

5.4 Hierarchical cluster analysis 86

5.5 K-means and k-medoids cluster analysis 92

References 94

6 Morphometrics 96

6.1 The allometric equation 97

6.2 Principal components analysis 101

6.3 Multivariate allometry 108

6.4 Linear discriminant analysis 112

6.5 Multivariate analysis of variance 116

6.6 Fourier shape analysis in polar coordinates 116

6.7 Elliptic Fourier analysis 119

6.8 Hangle Fourier analysis 122

6.9 Eigenshape analysis 123

6.10 Landmarks and size measures 125

6.11 Procrustes fitting 127

6.12 PCA of landmark data 130

6.13 Thin-plate spline deformations 132

6.14 Principal and partial warps 136

6.15 Relative warps 139

6.16 Regression of warp scores 141

6.17 Common allometric component analysis 142

6.18 Landmarks in 3D 143

6.19 Disparity measures 144

6.20 Morphogroup identification with machine learning 146

6.21 Case study: the ontogeny of a Silurian trilobite 153

References 157

7 Directional and spatial data analysis 162

7.1 Analysis of directions and orientations in 2D 162

7.2 Analysis of directions and orientations in 3D 164

7.3 Spatial point pattern analysis 166

References 173

8 Analysis of tomographic and 3D-scan data 174

8.1 The technology of x-ray tomography 174

8.2 Processing of volume data 175

8.3 Functional morphology with 3D data 180

References 182

9 Estimating paleobiodiversity 184

9.1 Species richness estimation 185

9.2 Rarefaction and related methods 187

9.3 Diversity curves, origination, and extinction rates 192

9.4 Abundance-based biodiversity indices 196

9.5 Taxonomic distinctness 202

9.6 Comparison of diversity indices 207

9.7 Abundance models 208

References 212

10 Paleoecology and paleobiogeography 216

10.1 Paleobiogeography 216

10.2 Paleoecology 217

10.3 Association similarity indices for presence-absence data 219

10.4 Association similarity indices for abundance data 223

10.5 ANOSIM and PerMANOVA 228

10.6 Principal coordinates analysis 229

10.7 Non-metric multidimensional scaling 232

10.8 Correspondence analysis 236

10.9 Detrended correspondence analysis 240

10.10 Seriation 242

10.11 Nonlinear dimensionality reduction 245

10.12 Canonical correspondence analysis 248

10.13 Indicator species 251

10.14 Network analysis 252

10.15 Size-frequency and survivorship curves 254

10.16 Case study: Devonian paleobiogeography 256

References 259

11 Calibration - estimating paleoenvironments 263

11.1 Modern analog technique 263

11.2 Weighted averaging 265

11.3 Weighted averaging partial least squares 267

11.4 Which calibration method? 269

11.5 Case study: Late Holocene temperature inferred from chironomids 271

References 271

12 Time series analysis 273

12.1 Spectral analysis 274

12.2 Wavelet analysis 282

12.3 Autocorrelation 284

12.4 Cross-correlation 287

12.5 Runs test 290

12.6 Time Series Trends and Regression 291

12.7 Smoothing and filtering 293

References 297

13 Quantitative biostratigraphy 299

13.1 Zonation of a single section 299

13.2 Confidence intervals on stratigraphic ranges 301

13.3 Regional and global biostratigraphic correlation 304

13.4 Age models 330

References 335

14 Phylogenetic analysis 338

14.1 A dictionary of cladistics 338

14.2 Parsimony analysis 339

14.3 Characters 341

14.4 Algorithms for Parsimony Analysis 342

14.5 Character state reconstruction 347

14.6 Evaluation of characters and trees 348

14.7 Case study: the systematics of heterosporous ferns 355

14.8 Other methods for phylogenetic analysis 359

14.9 Phylogenetic Comparative Methods 362

References 368

Index 371

1
Introduction

1.1 The nature of paleontological data

Paleontology is a diverse field, with many different types of data and a corresponding variety of analytical methods. For example, the types of data used in ecological, morphological, and phylogenetic analyses are often quite different in both form and quality. Data relevant to the investigation of morphological variation in Ordovician brachiopods are quite different from those gathered from Neogene mammal-dominated assemblages for paleoecological analyses. Nevertheless, there is a surprising commonality of techniques that can be implemented in the investigation of such apparently contrasting data sets.

1.1.1 Univariate measurements

Perhaps the simplest type of data is straightforward, continuous measurements such as length or width. Such measurements are typically made on a number of specimens in one or more samples, commonly from collections from different localities or species. Typically, the investigator is interested in characterizing, and subsequently analyzing, the sets of measurements, and comparing two or more samples to see how they differ. Measurements in units of arbitrary origin (such as temperature on the Celsius or Fahrenheit scales) are known as interval data, while measurements in units of a fixed origin (such as distance and mass) are called ratio data. Methods for interval and ratio data are presented in chapter 4.

1.1.2 Bivariate measurements

Also relatively easy to handle are bivariate interval or ratio measurements on a number of specimens. Each specimen is then characterized by a pair of values, such as both length and width of a given fossil. In addition to comparing samples, we will then typically wish to know if and how the two variables are interrelated. Do they, for example, fit on a straight line, indicating a static (isometric) mode of growth or do they exhibit more complex (anisometric) growth patterns? Bivariate data are discussed in chapters 4 and 6.

1.1.3 Multivariate morphometric measurements

The next step in increasing complexity is multivariate morphometric data, involving multiple variables such as length, width, thickness, and, say, distance between ribs (Fig. 1.1). We may wish to investigate the structure of such data and whether two or more samples are different. Multivariate data sets can be difficult to visualize, and special methods have been devised to emphasize any inherent structure. Special types of multivariate morphometric data include digitized outlines and the coordinates of landmarks. Multivariate methods useful for morphometrics are discussed in chapters 5 and 6.

Figure 1.1 Typical set of measurements made on conjoined valves of the living brachiopod Terebratulina from the west coast of Scotland. A: Measurements of the length of ventral (LENPV) and dorsal (LENBV) valves, together with hinge (HINWI) and maximum (MAXWI) widths. B: Thickness (THICK). C: Position of maximum width (POSMW) and numbers of ribs per mm at, say, 2.5 mm (RAT25) and 5.0 mm (RAT50) from the posterior margin can form the basis for univariate, bivariate, and multivariate analysis of the morphological features of this brachiopod genus.

1.1.4 Character matrices for phylogenetic analysis

A special type of morphological (or molecular) data is the character matrix for phylogenetic analysis (cladistics). In such a matrix, taxa are conventionally entered in rows and characters in columns. The state of each character for each taxon is typically coded with an integer. Character matrices and their analyses are treated in chapter 14.

1.1.5 Paleoecology and paleobiogeography - taxa in samples

In paleoecology and paleobiogeography, the most common data type consists of taxonomic counts at different localities or stratigraphic levels. Such data are typically given in an abundance matrix, either with taxa in rows and samples in columns or vice versa. Each cell contains a specimen count (or abundance) for a particular taxon in a particular sample (Table 1.1).

In some cases, we do not have specimen counts available; instead, we only know whether a taxon is present or absent in the sample. Such information is specified in a presence-absence table, where absences are typically coded with zeros and presences with ones. This type of binary data may be more relevant to biogeographic or stratigraphic analyses, where each taxon is weighted equally.

Typical questions include whether the samples are significantly different from each other, whether the biodiversity is different, whether there are well-separated groups of localities, or any indication of an environmental gradient. Methods for analyzing taxa-in-samples data are discussed in chapters 5, 9, and 10 (Fig. 1.2).

1.1.6 Time series

A time series is a sequence of values through time. Paleontological time series include diversity curves, geochemical data from fossils through time, and thicknesses of bands from a series of growth increments. For such a time series, we may wish to investigate whether there is any trend or periodicity. Some appropriate methods are given in chapter 12.

1.1.7 Biostratigraphic data

Biostratigraphy is the correlation and zonation of sedimentary strata based on fossils. Biostratigraphic data are mainly of two different types. The first is a single table with localities in rows and events (such as the first or last appearance of a species) in columns. Each cell of the table contains the stratigraphic level, in meters, feet, or ranked order, of a particular event at a particular locality (Table 1.2).

Table 1.1 Example of an abundance matrix.

Spooky Creek Scary Ridge Creepy Canyon M. horridus 0 43 15 A. cornuta 7 12 94 P. giganteus 23 0 32

Figure 1.2 A muddy-sand community from the Early Jurassic. Community reconstruction is based on the presence of and partly the relative abundance of taxa.

After Benton and Harper (2020), modified from McKerrow (1978).

Table 1.2 Example of an event table.

M. horridus FAD A. cornuta FAD M. horridus FAD Spooky Creek 3.2 4.0 7.6 Scary Ridge 1.4 5.3 4.2 Creepy Canyon 13.8 13.8 15.9

FAD, first appearance datum.

Such event tables form the input to biostratigraphic methods such as ranking-scaling and constrained optimization.

The second main type of biostratigraphic data consists of a single presence-absence matrix for each locality. Within each table, the samples are sorted according to stratigraphic level. Such data are used in the method of unitary associations.

Quantitative biostratigraphy is the subject of chapter 13 (Fig. 1.3).

Figure 1.3 Range chart of fossils, mainly graptolites, through Darriwilian and Sandbian (Middle and Upper Ordovician) strata in southern Sweden. The boundary between the two stages (green horizontal line) is fixed to coincide with the first appearance of the graptolite Nemagraptus gracilis.

Harper et al. (2022)/Geological Society of London.

1.2 Advantages and pitfalls of paleontological data analysis

During the last few decades, paleontology has comfortably joined the other sciences in its emphasis on quantitative methodologies. Statistical and explorative analytical methods, most of them depending on computers for their practical implementation, are now used in all branches of paleontology, from systematics and morphology to paleoecology and biostratigraphy. Over the last 100 years, techniques have evolved with available hardware, from the longhand calculations of the early twentieth century, through the time-consuming mainframe implementation of algorithms in the mid-twentieth century to the microcomputer revolution of the late twentieth century.

In general terms, there are three main components to any paleontological investigation. A detailed description of a taxon or community is followed by an analysis of the data (this may involve a look at ontogenetic or size-independent shape variation in a taxon or the population dynamics and structure of a community) with finally a comparison with other relevant and usually similar paleontological units. Numerical techniques have greatly enhanced the description, analysis, and comparison of fossil taxa and assemblages. Scientific hypotheses can be more clearly framed and statistically tested with numerical data....

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Paleontological Data Analysis

Description

More details

Other editions

Additional editions

Persons

Content

1
Introduction

1.1 The nature of paleontological data

1.1.1 Univariate measurements

1.1.2 Bivariate measurements

1.1.3 Multivariate morphometric measurements

1.1.4 Character matrices for phylogenetic analysis

1.1.5 Paleoecology and paleobiogeography - taxa in samples

1.1.6 Time series

1.1.7 Biostratigraphic data

1.2 Advantages and pitfalls of paleontological data analysis

System requirements