Handbook of Statistical Systems Biology

Name: Handbook of Statistical Systems Biology
Brand: Wiley
Price: 152.99 EUR
Availability: OnlineOnly

Michael Stumpf David J. Balding Mark Girolami(Author)

Wiley (Publisher)

Published on 9. September 2011

530 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-119-95204-6 (ISBN)

€152.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

Systems Biology is now entering a mature phase in which the keyissues are characterising uncertainty and stochastic effects inmathematical models of biological systems. The area is movingtowards a full statistical analysis and probabilistic reasoningover the inferences that can be made from mathematical models. Thishandbook presents a comprehensive guide to the discipline forpractitioners and educators, in providing a full and detailedtreatment of these important and emerging subjects. Leading expertsin systems biology and statistics have come together to provideinsight in to the major ideas in the field, and in particularmethods of specifying and fitting models, and estimating theunknown parameters. This book: * Provides a comprehensive account of inference techniques insystems biology. * Introduces classical and Bayesian statistical methods forcomplex systems. * Explores networks and graphical modeling as well as a widerange of statistical models for dynamical systems. * Discusses various applications for statistical systems biology,such as gene regulation and signal transduction. * Features statistical data analysis on numerous technologies,including metabolic and transcriptomic technologies. * Presents an in-depth presentation of reverse engineeringapproaches. * Provides colour illustrations to explain key concepts. This handbook will be a key resource for researchers practisingsystems biology, and those requiring a comprehensive overview ofthis important field.

Reviews / Votes

"A very remarkable collection of essays. Stronglyrecommended to workers in this area." (InternationalStatistical Review, 1 October 2013) "I would highly recommend this book as a useful guide forthe students and practitioners of systems biology." (Science Progress, 1 September 2012) "This handbook will be a key resource for researcherspractising systems biology, and those requiring a comprehensiveoverview of this important field." (Zentralblatt MATH,2012)

More details

Other editions

Persons

Content

Chapter 1 Two challenges of systems biology. Chapter 2 Introduction to Statistical Methods for ComplexSystems. Chapter 3 Bayesian Inference and Computation. Chapter 4 Data Integration: Towards Understanding BiologicalComplexity. Chapter 5 Control Engineering Approaches to Reverse EngineeringBiomolecular Approaches. Chapter 6 Algebraic Statistics and Methods in SystemsBiology. B. Technology-based Chapters. Chapter 7 Transcriptomic Technologies and Statistical DataAnalysis. Chapter 8 Statistical Data Analysis in Metabolomics. Chaper 9 Imaging and Single-Cell Measurement Technologies. Chapter 10 Protein Interaction Networks and Their StatisticalAnalysis. C. Networks and Graphical Models. Chapter 11 Introduction to Graphical Modelling. Chapter 12 Recovering Genetic Network from Continuous Data withDynamic Bayesian Networks. Chapter 13 Advanced Applications of Bayesian Networks in SystemsBiology. Chapter 14 Random Graph Models and Their Application toProtein-Protein Interaction Networks. Chapter 15 Modelling Biological Networks Via Tailored RandomGraphs. D. Dynamical Systems. Chapter 16 Nonlinear Dynamics: a Brief Introduction. Chapter 17 Qualitative Inference for Dynamical Systems. Chapter 18 Stochastic Dynamical Systems. Chapter 19 State-Space models. Chapter 20 Model Identification by Utilizing Likelihood-BasedMethods. E. Application Areas. Chapter 21 Inference of Signalling Pathway Models. Chapter 22 Modelling Transcription Factor Activity. Chapter 23 Host-Pathogen Systems Biology. Chapter 24 Statistical Metabolomics: Bayesian Challenges in theAnalysis of Metabolomic Data. Chapter 25 Systems Biology of microRNA.

Chapter 2

Introduction to Statistical Methods for Complex Systems

Tristan Mary-Huard and Stéphane Robin

Agro ParisTech and INRA, Paris, France

2.1 Introduction

The aim of the present chapter is to introduce and illustrate some concepts of statistical inference useful in systems biology. Here we limit ourselves to the classical, so-called ‘frequentist’ statistical inference where parameters are fixed quantities that need to be estimated. The Bayesian approach will be presented in Chapter 3.

Modelling and inference techniques are illustrated in three recurrent problems in systems biology:

Class comparison aims at assessing the effect of some treatment or experimental condition on some biological response. This requires proper statistical modelling to account for the experimental design, various covariates or stratifications or dependence between the measurements. As systems biology often deals with high-throughput technologies, it also raises multiple testing issues.

Class prediction refers to learning techniques that aim at building a rule to predict the status (e.g. well or ill) of an individual, based on a set of biological descriptors. An exhaustive list of classification algorithms is out of reach, but general techniques such as regularization or aggregation are of prime interest in systems biology where the number of variables often exceeds the number of observations by far. Evaluating the performances of a classifier also requires relevant tools.

Class discovery aims at uncovering some structure in a set of observations. These techniques include distance-based or model-based clustering methods and allow to determine distinct groups of individuals in the absence of a prior classification. However, the underlying structure may have more complex forms, each raising specific issues in terms of inference.

This chapter focuses on generic statistical concepts and methods, that can be applied no matter which technology is used for the data acquisition. In practice, applications to any biological problem will necessitate both a relevant strategy for the data collection, and a careful tuning of the methods to obtain meaningful results. These two steps of data collection (or experimental design conception) and adaptation of the generic methods require taking into account the nature of the data. Therefore, they are dependent on the data acquisition technology, and will be discussed in Part B of this Handbook.

In this chapter, the data are assumed to arise from a static process. The analysis of a dynamic biological system would require more sophisticated methods, such as partial differential equations or network modelling.

These topics are not discussed here as they will be reviewed in depth in Parts C and D.

Lastly, a basic knowledge in statistics is assumed, covering topics including point estimation (in particular maximum likelihood estimation), hypothesis testing, and a background in regression and linear models.

2.2 Class Comparison

We consider here the general problem of assessing the effect of some treatment, experimental condition or covariate on some response. We first address the problem of modelling the data resulting from the experiments, focusing on how to account for the dependency between the observations. We then turn to the problem of multiple testing, which is recurrent in high-throughput data analyses.

2.2.1 Models for Dependent Data

Many biological experiments aim at observing the effects of a given treatment (or combination of treatments) on a given response. ‘Treatment’ is used here in a very broad sense, including controlled experimental conditions, uncontrolled covariates, time, population structure, etc. In the following will stand for the total number of experiments.

Linear (Gaussian) models (Searle 1971; Dobson 1990) provide a general framework to describe the influence of a set of controlled conditions and/or uncontrolled covariates, summarized in a -dimensional matrix , on the observed response gathered in a -dimensional vector as

(2.1)

where is the -dimensional vector containing all parameters. In the most classical setting, the response is supposed to be Gaussian, and the dependency structure between the observations is then fully specified by the (co-)variance matrix which contains the variance of each observation on the diagonal, and the covariances between pairs of observations elsewhere. In the most simple setting, the responses are supposed to be independent with same variance , that is .

2.2.1.1 Writing the Right (Mixed) Model

In more complex experiments, the assumption that observations are independent does not hold and the structure of needs to be adapted. Because it contains parameters, the shape of has to be strongly constrained to allow good inference. We first present here some typical experimental settings, and the associated dependency structures.

Variance Components

Consider the study of the combined effects of the genotype (indexed by ) and of the cell type () on some gene expression. Several individuals () from each genotype are included and cells from each type are harvested in each of them. In such a setting the expected response is , which is often decomposed into a genotype effect, a cell type effect and an interaction as .

The most popular way to account for the dependency between measures obtained on the same individual is to add a random term associated with each individual. The complete model can then be written as

(2.2)

where all and are independent centred Gaussian variables with variance and , respectively. The variance of one observation is then , where is the ‘biological’ variance and is the ‘technical’ one (Kerr and Churchill 2001). The random effect induces a uniform correlation between observations from the same individual since:

(2.3)

and 0 if . The matrix form of this model is a generalization of (2.1):

(2.4)

where describes the individual structure: each row corresponds to one measurement and each column to one individual and contains a 1 at the intersection if the measurement has been made on the individual, and a 0 otherwise. The denomination ‘mixed’ of ‘linear mixed models’ comes from the simultaneous presence of fixed and random effects. It corresponds to the simplest form of so-called ‘variance components’ models. The variance matrix corresponding to (2.3) is . Application of such a model to gene expression data can be found in Wolfinger et al.(2001) or Tempelman (2008).

Repeated Measurements

One considers a similar design where, in place of cell types, we compare successive harvesting times (indexed by ) within each individual. The uniform correlation within each individual given in (2.3) may then seem inappropriate, for it does not account for the delay between times of observation. A common dependency form is then the so-called ‘autoregressive’, which states that

and 0 otherwise. This is to assume that the correlation decreases (at an exponential rate) with the time delay. Such a variance structure cannot be put in a simple matrix form similar to (2.4). Note that Equation (2.1) is still valid, but with nondiagonal variance matrix .

Spatial Dependency

It is also desirable to account for spatial dependency when observations have some spatial localization. Suppose one wants to compare treatments (indexed by ), and that replicates () have respective localizations . A typical variance structure (Cressie 1993) is

where accounts for the measurement error variability and controls the speed at which the dependency decreases with distance.

The dependency structures described above can of course be combined. Also note that this list is far from exhaustive. The limitations often come from the software at hand or the specific computing developments that can be made. A large catalogue of such structures can be found in software such as SAS (2002-03) or R (www.r-project.org).

2.2.1.2 Inference

Some problems related to the inference of mixed linear models are still unresolved. We only provide here an introduction to the most popular approaches and emphasize some practical issues that can be faced when using them.

Estimation

Mixed model inference requires to estimate both and . We start with the estimation of , which reduces to the estimation of a few variance parameters such as in the examples given above.

Moment estimates can be obtained (Searle 1971; Demindenko 2004), typically for variance component models. Such estimates are often based on sums of squares, that are squared distances between and its projection on various linear spaces, such as span, span or span(. The expectation of these sums of squares can often be related to the different variance parameters and the estimation then reduces to solving a set of linear equations.

The maximum likelihood (ML) estimator is defined as

and can be used for all models. Unfortunately, ML variance estimates are known to be biased in many (almost all) situations, because both and have to be estimated at the same time. The most popular way to circumvent this problem consists of changing to a model where is known (Verbeke and Molenberghs 2000). Defining some matrix such that , we may define the Gaussian vector which satisfies

The most natural choice for is the projector on the linear space orthogonal to span. The so-called...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Handbook of Statistical Systems Biology

Description

Reviews / Votes

More details

Other editions

Additional editions

Persons

Content

System requirements