Computational Network Analysis with R

Name: Computational Network Analysis with R | Applications in Biology, Medicine and Chemistry
Brand: Wiley-VCH
Price: 151.99 EUR
Availability: OnlineOnly

Applications in Biology, Medicine and Chemistry

Matthias Dehmer Yongtang Shi Frank Emmert-Streib(Editor)

Wiley-VCH (Publisher)

1st Edition

Published on 9. August 2016

XVIII, 343 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-3-527-69437-2 (ISBN)

€151.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Other editions

Persons

Matthias Dehmer studied mathematics at the University of Siegen (Germany) and received his Ph.D. in computer science from the Technical University of Darmstadt (Germany). Afterwards, he was a research fellow at Vienna Bio Center (Austria), Vienna University of Technology, and University of Coimbra (Portugal). He obtained his habilitation in applied discrete mathematics from the Vienna University of Technology. Currently, he is Professor at UMIT - The Health and Life Sciences University (Austria) and also holds a position at the Universität der Bundeswehr München. His research interests are in applied mathematics, bioinformatics, systems biology, graph theory, complexity and information theory. He has written over 180 publications in his research areas.

Yongtang Shi studied mathematics at Northwest University (Xi'an, China) and received his Ph.D in applied mathematics from Nankai University (Tianjin, China). He visited Technische Universität Bergakademie Freiberg (Germany), UMIT (Austria) and Simon Fraser University (Canada). Currently, he is an associate professor at the Center for Combinatorics of Nankai University. His research interests are in graph theory and its applications, especially the applications of graph theory in mathematical chemistry, computer science and information theory. He has written over 40 publications in graph theory and its applications.

Frank Emmert-Streib studied physics at the University of Siegen (Germany) gaining his PhD in theoretical physics from the University of Bremen (Germany). He received postdoctoral training from the Stowers Institute for Medical Research (Kansas City, USA) and the University of Washington (Seattle, USA). Currently, he is associate professor for Computational Biology at Tampere University of Technology (Finland). His main research interests are in the field of computational medicine, network biology and statistical genomics.

Editor

Matthias Dehmer

UMIT Health and Life Sciences University, Hall, Austria
ISNI: 0000 0001 2100 2265

Yongtang Shi

UMIT Health and Life Sciences University, Hall, Austria
ISNI: 0000 0004 1845 537X

Frank Emmert-Streib

UMIT Health and Life Sciences University, Hall, Austria
ISNI: 0000 0001 1563 7555

Series Editor

Matthias Dehmer

UMIT Health and Life Sciences University, Hall, Austria

Frank Emmert-Streib

UMIT Health and Life Sciences University, Hall, Austria
ISNI: 0000 0001 1563 7555

Content

Differential correlation technique to analyze biological networks: DiffCorr
Challenges of computational network analysis with R
Software and practices for visualizing network data in biology and medicine
Efficient anomaly detection in dynamic, attributed graphs by using R
Chemical informatics functionality in R
Biological network comparison
Degradation analysis in R using uDEMO
Penalized methods in high-dimensional Gaussian graphical models

Chapter 1
Using the DiffCorr Package to Analyze and Visualize Differential Correlations in Biological Networks

Atsushi Fukushima and Kozo Nishida

RIKEN Center for Sustainable Resource Science, 1-7-22 Suehirocho, Tsurumi, Yokohama, 230-0045, Japan

RIKEN Quantitative Biology Center, Laboratory for Biochemical Simulation, Osaka, Japan

1.1 Introduction

1.1.1 An Introduction to Omics and Systems Biology

In this century, a high-throughput technology is being harnessed in various applications to solve a diverse range of biological problems and to explore biological phenomena. Next-generation sequencers (NGS) can be used for measuring and monitoring thousands of small molecules simultaneously [1-4] and large genomic sequences can be acquired quickly and routinely. RNA sequencing with NGS (RNA-seq) measures nearly every transcript of cellular systems (i.e., transcriptome) [5-7]. The term omics refers to the comprehensive analysis of biological systems and approaches including genomics, transcriptomics, and metabolomics that have become a promising way to inspect complex network interactions in cellular systems. To understand the organizing principle of cellular functions at different levels, an integrative approach with large-scale omics data including genomics, transcriptomics, proteomics, and metabolomics, is required [8-10]. Although it means different things to different scientists, systems biology [11] is the study of the behavior of complex biological processes using integrated approaches and a collection of omics-based data sets, quantitative measurements of the behavior of interacting cellular components, and mathematical/computational models to predict and describe complex dynamic behaviors.

1.1.2 Correlation Networks in Omics and Systems Biology

Molecular interactions can be expressed simply as a network by measuring associations among molecules in omics data (e.g., see [12, 13]). Typical network analysis is based on transcriptome data sets obtained from microarray experiments and RNA-seq. This is known as gene co-expression analysis (e.g., see reviews [14-17]). Correlation relationships are special cases of association that can be measured by correlation-based measures such as the Pearson correlation coefficient, r (Figure 1.1a), which can range from -1 to 1, where r = 1 represents a perfect positive linear relationship between gene expressions, while r = -1 indicates a perfect negative relationship. While r = 0 indicates no linear relationship between gene expressions, it does not mean that two gene expressions are statistically independent. Calculation of the Pearson correlation coefficient is not robust for outliers and assumes that the data are from a standard normal distribution. On the other hand, the Spearman rank correlation coefficient is more robust with respect to outliers; it measures a monotonic relationship between gene expressions. If the correlation between two gene expressions exceeds a threshold, these genes can be considered as co-expressed. Such associations can be described as "co-expression networks" or generally as "correlation networks," where nodes represent genes and links between nodes represent significant correlations that are above a given threshold. Typical co-expression network analysis is based on the correlation coefficient between preselected gene(s) and the rest of the genes in a data set; this is called a guide-gene approach [18]. Although a correlation does not always indicate a causal relationship, a network approach can provide clues about the regulatory mechanisms that underlie the biological processes, and it has been used to characterize genes involved in plant-specialized secondary metabolisms [14, 17, 19].

Figure 1.1 A gene-gene association measure and causal inferences in co-expression analysis. (a) Two kinds of major methods to measure the association between gene expressions. Although the Pearson correlation coefficient (PCC) is widely used in co-expression analysis in plant science, it can only be used to estimate a linear relationship between variables. A gene-gene association is not always a linear correlation. In general, information-theoretic measures can estimate a nonlinear relationship. Note that the Spearman correlation coefficient (SCC) can estimate a nonlinear relationship such as a monotonic function. (b) A concept of differential co-expression networks.

1.1.3 Network Modules and Differential Network Approaches

When assessing gene co-expression network data generated from a high-throughput microarray system, one can visualize a giant network component from a large number of interactions (e.g., see [20]). There are many approaches for summarizing such large-scale networks: graph clustering [21] has been used and differential co-expressions or differential correlations [22] have been identified by means of network analysis using omics data. In general, graph clustering such as Markov clustering [23] and DPClus [24] can be used for detecting co-expressed modules or clusters in a nonbiased manner. Graph clustering is an algorithm for efficiently extracting densely connected genes in co-expression networks. This approach has also provided insights into transcriptional organization in Arabidopsis thaliana (Arabidopsis) and Oryza sativa (rice) as well as Solanum lycopersicum (tomato) [25-29]. In addition to the mean levels of abundance [the identification of so-called "differentially expressed genes (DEGs)" between two samples] and the detection of clustered molecules with similar profile patterns, changes in the correlation patterns between molecules, referred to as differential correlations, are also informative [22, 30]. Differential network approaches can be performed by comparing two different networks, for example, normal and disease networks (Figure 1.1b). This type of differential network strategy [31] has been applied to animals and plants [19, 22, 30, 32]. Differential correlation analysis in metabolomics has been used for dissecting complex metabolisms [33-35].

1.1.4 Aims of this Chapter

This chapter aims to (i) introduce the differential network concept in biological networks, (ii) demonstrate typical correlation network analysis using transcriptome and metabolome data sets, and (iii) highlight caveats in the correlation approach including the influence of the experimental setup used to generate correlation networks and the statistical approaches applied to assess these networks. We illustrate the utility of our DiffCorr package [36] by demonstrating biologically relevant, differentially correlated molecules in transcriptome co-expression and metabolite-to-metabolite correlation networks. The R code used in this chapter can be downloaded from the github repository: http://afukushima.github.io/diffcorrbook.

1.2 What is DiffCorr?

1.2.1 Background

There are a number of algorithms for detecting the differential correlation for large-scale omics data sets. Typical approaches for identifying differential correlations include topological overlap in a graph [37-40], extension of the traditional F-statistic [41], an additive model [42], Fisher's z-test [30, 36], an interaction score based on Renyi relative entropy [43], the Haar basis [32], the combination of the graphical Gaussian model and the posterior odds ratio [44], the liquid association concept [45, 46], a combination of robust correlations and hypothetical testing (called ROS-DET (RObust Switching mechanisms DETector)) [47], random re-sampling methods [48], graph-theoretic statistics [49], and an empirical Bayesian approach [50, 51]. Liu and coworkers implemented several of these methods to identify differential co-expressions in their R package DCGL [52, 53] (see also the review by Kayano et al. [54]). A tool to identify differential correlation patterns in omics data in an efficient and unbiased manner is needed. The simplest technique, based on Fisher's z-test of correlation coefficient to identify differential correlations, is not yet widely used and, to the best of our knowledge, is not implemented for omics data in the available R packages. We developed the DiffCorr package [36], a simple method for identifying pattern changes between two experimental conditions in correlation networks, which builds on a commonly used association measure, such as Pearson's correlation coefficient. DiffCorr calculates correlation matrices for each data set, identifies the first principal component-based "eigen-molecules" in the correlation networks, and tests differential correlations between the two groups based on Fisher's z-test [36].

1.2.2 Methods

Fisher's z-test was used to identify significant differences between two correlations based on its stringency test and its provision of conservative estimates of true differential correlations among molecules between two experimental conditions in the omics data [36]. To test whether the two correlation coefficients were significantly different, we first transformed the correlation coefficients for each of the two conditions, rA and rB, into ZA and ZB, respectively. The Fisher's z-transformation of coefficient rA is defined by ZA = 1/2[log(1 + rA)/(1 -...

Content (EPUB)

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Computational Network Analysis with R

Description

More details

Other editions

Additional editions

Persons

Content

Chapter 1
Using the DiffCorr Package to Analyze and Visualize Differential Correlations in Biological Networks

1.1 Introduction

1.1.1 An Introduction to Omics and Systems Biology

1.1.2 Correlation Networks in Omics and Systems Biology

1.1.3 Network Modules and Differential Network Approaches

1.1.4 Aims of this Chapter

1.2 What is DiffCorr?

1.2.1 Background

1.2.2 Methods

System requirements

Schweitzer Fachinformationen

Computational Network Analysis with R

Description

More details

Other editions

Additional editions

Persons

Content

Chapter 1 Using the DiffCorr Package to Analyze and Visualize Differential Correlations in Biological Networks

1.1 Introduction

1.1.1 An Introduction to Omics and Systems Biology

1.1.2 Correlation Networks in Omics and Systems Biology

1.1.3 Network Modules and Differential Network Approaches

1.1.4 Aims of this Chapter

1.2 What is DiffCorr?

1.2.1 Background

1.2.2 Methods

System requirements

Chapter 1
Using the DiffCorr Package to Analyze and Visualize Differential Correlations in Biological Networks