
Computational Network Analysis with R
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
With its easy-to-follow introduction to the theoretical background and application-oriented chapters, the book demonstrates that R is a powerful language for statistically analyzing networks and for solving such large-scale phenomena as network sampling and bootstrapping.
Written by editors and authors with an excellent track record in the field, this is the ultimate reference for R in Network Analysis.
More details
Other editions
Additional editions


Persons
Yongtang Shi studied mathematics at Northwest University (Xi'an, China) and received his Ph.D in applied mathematics from Nankai University (Tianjin, China). He visited Technische Universität Bergakademie Freiberg (Germany), UMIT (Austria) and Simon Fraser University (Canada). Currently, he is an associate professor at the Center for Combinatorics of Nankai University. His research interests are in graph theory and its applications, especially the applications of graph theory in mathematical chemistry, computer science and information theory. He has written over 40 publications in graph theory and its applications.
Frank Emmert-Streib studied physics at the University of Siegen (Germany) gaining his PhD in theoretical physics from the University of Bremen (Germany). He received postdoctoral training from the Stowers Institute for Medical Research (Kansas City, USA) and the University of Washington (Seattle, USA). Currently, he is associate professor for Computational Biology at Tampere University of Technology (Finland). His main research interests are in the field of computational medicine, network biology and statistical genomics.
Content
Challenges of computational network analysis with R
Software and practices for visualizing network data in biology and medicine
Efficient anomaly detection in dynamic, attributed graphs by using R
Chemical informatics functionality in R
Biological network comparison
Degradation analysis in R using uDEMO
Penalized methods in high-dimensional Gaussian graphical models
Chapter 1
Using the DiffCorr Package to Analyze and Visualize Differential Correlations in Biological Networks
Atsushi Fukushima and Kozo Nishida
RIKEN Center for Sustainable Resource Science, 1-7-22 Suehirocho, Tsurumi, Yokohama, 230-0045, Japan
RIKEN Quantitative Biology Center, Laboratory for Biochemical Simulation, Osaka, Japan
1.1 Introduction
1.1.1 An Introduction to Omics and Systems Biology
In this century, a high-throughput technology is being harnessed in various applications to solve a diverse range of biological problems and to explore biological phenomena. Next-generation sequencers (NGS) can be used for measuring and monitoring thousands of small molecules simultaneously [1-4] and large genomic sequences can be acquired quickly and routinely. RNA sequencing with NGS (RNA-seq) measures nearly every transcript of cellular systems (i.e., transcriptome) [5-7]. The term omics refers to the comprehensive analysis of biological systems and approaches including genomics, transcriptomics, and metabolomics that have become a promising way to inspect complex network interactions in cellular systems. To understand the organizing principle of cellular functions at different levels, an integrative approach with large-scale omics data including genomics, transcriptomics, proteomics, and metabolomics, is required [8-10]. Although it means different things to different scientists, systems biology [11] is the study of the behavior of complex biological processes using integrated approaches and a collection of omics-based data sets, quantitative measurements of the behavior of interacting cellular components, and mathematical/computational models to predict and describe complex dynamic behaviors.
1.1.2 Correlation Networks in Omics and Systems Biology
Molecular interactions can be expressed simply as a network by measuring associations among molecules in omics data (e.g., see [12, 13]). Typical network analysis is based on transcriptome data sets obtained from microarray experiments and RNA-seq. This is known as gene co-expression analysis (e.g., see reviews [14-17]). Correlation relationships are special cases of association that can be measured by correlation-based measures such as the Pearson correlation coefficient, r (Figure 1.1a), which can range from -1 to 1, where r = 1 represents a perfect positive linear relationship between gene expressions, while r = -1 indicates a perfect negative relationship. While r = 0 indicates no linear relationship between gene expressions, it does not mean that two gene expressions are statistically independent. Calculation of the Pearson correlation coefficient is not robust for outliers and assumes that the data are from a standard normal distribution. On the other hand, the Spearman rank correlation coefficient is more robust with respect to outliers; it measures a monotonic relationship between gene expressions. If the correlation between two gene expressions exceeds a threshold, these genes can be considered as co-expressed. Such associations can be described as "co-expression networks" or generally as "correlation networks," where nodes represent genes and links between nodes represent significant correlations that are above a given threshold. Typical co-expression network analysis is based on the correlation coefficient between preselected gene(s) and the rest of the genes in a data set; this is called a guide-gene approach [18]. Although a correlation does not always indicate a causal relationship, a network approach can provide clues about the regulatory mechanisms that underlie the biological processes, and it has been used to characterize genes involved in plant-specialized secondary metabolisms [14, 17, 19].
Figure 1.1 A gene-gene association measure and causal inferences in co-expression analysis. (a) Two kinds of major methods to measure the association between gene expressions. Although the Pearson correlation coefficient (PCC) is widely used in co-expression analysis in plant science, it can only be used to estimate a linear relationship between variables. A gene-gene association is not always a linear correlation. In general, information-theoretic measures can estimate a nonlinear relationship. Note that the Spearman correlation coefficient (SCC) can estimate a nonlinear relationship such as a monotonic function. (b) A concept of differential co-expression networks.
1.1.3 Network Modules and Differential Network Approaches
When assessing gene co-expression network data generated from a high-throughput microarray system, one can visualize a giant network component from a large number of interactions (e.g., see [20]). There are many approaches for summarizing such large-scale networks: graph clustering [21] has been used and differential co-expressions or differential correlations [22] have been identified by means of network analysis using omics data. In general, graph clustering such as Markov clustering [23] and DPClus [24] can be used for detecting co-expressed modules or clusters in a nonbiased manner. Graph clustering is an algorithm for efficiently extracting densely connected genes in co-expression networks. This approach has also provided insights into transcriptional organization in Arabidopsis thaliana (Arabidopsis) and Oryza sativa (rice) as well as Solanum lycopersicum (tomato) [25-29]. In addition to the mean levels of abundance [the identification of so-called "differentially expressed genes (DEGs)" between two samples] and the detection of clustered molecules with similar profile patterns, changes in the correlation patterns between molecules, referred to as differential correlations, are also informative [22, 30]. Differential network approaches can be performed by comparing two different networks, for example, normal and disease networks (Figure 1.1b). This type of differential network strategy [31] has been applied to animals and plants [19, 22, 30, 32]. Differential correlation analysis in metabolomics has been used for dissecting complex metabolisms [33-35].
1.1.4 Aims of this Chapter
This chapter aims to (i) introduce the differential network concept in biological networks, (ii) demonstrate typical correlation network analysis using transcriptome and metabolome data sets, and (iii) highlight caveats in the correlation approach including the influence of the experimental setup used to generate correlation networks and the statistical approaches applied to assess these networks. We illustrate the utility of our DiffCorr package [36] by demonstrating biologically relevant, differentially correlated molecules in transcriptome co-expression and metabolite-to-metabolite correlation networks. The R code used in this chapter can be downloaded from the github repository: http://afukushima.github.io/diffcorrbook.
1.2 What is DiffCorr?
1.2.1 Background
There are a number of algorithms for detecting the differential correlation for large-scale omics data sets. Typical approaches for identifying differential correlations include topological overlap in a graph [37-40], extension of the traditional F-statistic [41], an additive model [42], Fisher's z-test [30, 36], an interaction score based on Renyi relative entropy [43], the Haar basis [32], the combination of the graphical Gaussian model and the posterior odds ratio [44], the liquid association concept [45, 46], a combination of robust correlations and hypothetical testing (called ROS-DET (RObust Switching mechanisms DETector)) [47], random re-sampling methods [48], graph-theoretic statistics [49], and an empirical Bayesian approach [50, 51]. Liu and coworkers implemented several of these methods to identify differential co-expressions in their R package DCGL [52, 53] (see also the review by Kayano et al. [54]). A tool to identify differential correlation patterns in omics data in an efficient and unbiased manner is needed. The simplest technique, based on Fisher's z-test of correlation coefficient to identify differential correlations, is not yet widely used and, to the best of our knowledge, is not implemented for omics data in the available R packages. We developed the DiffCorr package [36], a simple method for identifying pattern changes between two experimental conditions in correlation networks, which builds on a commonly used association measure, such as Pearson's correlation coefficient. DiffCorr calculates correlation matrices for each data set, identifies the first principal component-based "eigen-molecules" in the correlation networks, and tests differential correlations between the two groups based on Fisher's z-test [36].
1.2.2 Methods
Fisher's z-test was used to identify significant differences between two correlations based on its stringency test and its provision of conservative estimates of true differential correlations among molecules between two experimental conditions in the omics data [36]. To test whether the two correlation coefficients were significantly different, we first transformed the correlation coefficients for each of the two conditions, rA and rB, into ZA and ZB, respectively. The Fisher's z-transformation of coefficient rA is defined by ZA = 1/2[log(1 + rA)/(1 -...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.