Schweitzer Fachinformationen
Wenn es um professionelles Wissen geht, ist Schweitzer Fachinformationen wegweisend. Kunden aus Recht und Beratung sowie Unternehmen, öffentliche Verwaltungen und Bibliotheken erhalten komplette Lösungen zum Beschaffen, Verwalten und Nutzen von digitalen und gedruckten Medien.
William K. Scott, PhD, is Professor at the University of Miami Leonard M. Miller School of Medicine where he teaches design and analysis of human genomic studies. He has authored over 200 peer-reviewed articles on the genetic epidemiology of complex traits.
Marylyn D. Ritchie, PhD, is Professor in the Department of Genetics at the University of Pennsylvania, Perelman School of Medicine. She is also the Director of the Center for Translational Bioinformatics in the Institute for Biomedical Informatics. She has authored over 350 peer-reviewed articles on statistical genetics, translational bioinformatics and biomedical informatics.
List of Contributors xv
Foreword xvii
1 Designing a Study for Identifying Genes in Complex Traits 1William K. Scott, Marylyn D. Ritchie, Jonathan L. Haines,and Margaret A. Pericak-Vance
Introduction 1
Components of a Disease Gene Discovery Study 3
Define Disease Phenotype 4
Clinical Definition 4
Determining that a Trait Has a Genetic Component 5
Identification of Datasets 5
Develop Study Design 5
Family-Based Studies 6
Population-Based Studies 6
Approaches for Gene Discovery 7
Analysis 7
Genomic Analysis 7
Statistical Analysis 8
Bioinformatics 8
Follow-up 8
Variant Detection 8
Replication 9
Functional Studies 9
Keys to a Successful Study 10
Foster Interaction of Necessary Expertise 10
Develop Careful Study Design 11
References 11
2 Basic Concepts in Genetics 13 Kayla Fourzali, Abigail Deppen, and Elizabeth Heise
Introduction 13
Historical Contributions 13
Segregation and Linkage Analysis 13
Hardy-Weinberg Equilibrium 14
DNA, Genes, and Chromosomes 17
Structure of DNA 17
Genes and Alleles 19
Genes and Chromosomes 20
Genes, Mitosis, and Meiosis 22
When Genes and Chromosomes Segregate Abnormally 25
Inheritance Patterns in Mendelian Disease 25
Autosomal Recessive 25
Autosomal Dominant 25
X-linked Inheritance 28
Mitochondrial Inheritance 29
Y-linked 29
Genetic Changes Associated with Disease/ Trait Phenotypes 29
Mutations Versus Polymorphisms 29
Point Mutations 30
Sickle Cell Anemia 30
Achondroplasia 30
Deletion/Insertion Mutations 31
Duchenne and Becker Muscular Dystrophy 31
Cystic Fibrosis 31
Charcot-Marie- Tooth Disease 31
Nucleotide Repeat Disorders 32
Susceptibility Versus Causative Genes 32
Summary 34
References 34
3 Determining the Genetic Component of a Disease 36Allison Ashley Koch and Evadnie Rampersaud
Introduction 36
Study Design 37
Selecting a Study Population 37
Population-Based 38
Clinic-Based 38
Ascertainment 38
Single Affected Individual 39
Relative Pairs 40
Extended Families 40
Healthy or Unaffected Controls 41
Ascertainment Bias 42
Approaches to Determining the Genetic Component of a Disease 44
Co-segregation with Chromosomal Abnormalities and Other Genetic Disorders 44
Familial Aggregation 44
Family History Approach 44
Example of Calculating Attributable Fraction 46
Correlation Coefficients 46
Twin and Adoption Studies 47
Recurrence Risk in Relatives of Affected Individuals 48
Heritability 49
Example Using Correlation Coefficients to Calculate Heritability 50
Segregation Analysis 51
Summary 52
References 53
4 Study Design for Genetic Studies 58Dana C. Crawford and Logan Dumitrescu
Introduction 58
Selecting a Study Population 58
Family- Based Studies (Linkage) 59
Family- Based Studies (Association) 60
Studies of Unrelated Individuals (Association) 61
Cohort Studies 61
Cross- Sectional Studies 66
Case- Control Studies 66
Other Study Designs 68
Biobanks 69
Other Biobanks 71
Biospecimens for Biobanks 72
Summary 73
References 74
5 Responsible Conduct of Research in Genetic Studies 79Susan Estabrooks Hahn, Adam Buchanan, Chantelle Wolpert,and Susan H. Blanton
Introduction 79
Research Regulations and Genetics Research 80
Addressing Pertinent ELSI in Genetic Research 83
Genetic Discrimination 83
Privacy and Confidentiality 84
Certificate of Confidentiality 85
Coding Data and Samples 85
Secondary Subjects 86
Future Use of Samples/Data Sharing 87
Handling of Research Results 88
CLIA Regulations: Separation of Research and Clinical Laboratories 89
Releasing Children's Genetic Research Results 90
DNA Ownership 90
DNA Banking 90
Family Coercion 91
Practical Methods for Efficient High-Quality Genetic Research Services 91
The Investigator as the Genetic Study Coordinator 92
Time Spent 92
Recruitment 93
Support Groups and Organizations 93
Referrals from Health Care Providers 93
Research Databases and the Internet 94
Institution Databases 94
Medical Clinics 94
Recruitment by Family Members 95
Informed Consent 95
Vulnerable Populations 96
Minors 97
Persons with Cognitive Impairment 97
Data and Sample Collection 97
Sample Collection 97
Confirmation of Diagnosis 98
The Art of Field Studies 99
Referring for Additional Medical Services 99
Maintaining Contact with Participants 100
Future Considerations 100
References 100
6 Linkage Analysis 105Susan H. Blanton
Disease Gene Discovery 107
Ability to Detect Linkage 116
Real World Example of LOD Score Calculation and Interpretation 117
Disease Gene Localization 120
Multipoint Analysis 121
Effects of Misspecified Model Parameters in LOD Score Analysis 124
Impact of Incorrect Disease Allele Frequency 124
Impact of Incorrect Mode of Inheritance 125
Impact of Incorrect Disease Penetrance 125
Impact of Incorrect Marker Allele Frequency 126
Control of Scoring Errors 127
Genetic Heterogeneity 128
Practical Approach for Model-Based Linkage Analysis of Complex Traits 131
Nonparametric Linkage Analysis 133
Identity by State and Identity by Descent 134
Methods for Nonparametric Linkage Analysis 136
Tests for Linkage Using Affected Sibling Pairs (ASP) 137
Test Based on Identity by State 137
Tests Based on Identity by Descent in ASPs 138
Simple Tests 138
Tests Applicable When IBD Status Cannot Be Determined 139
Multipoint Affected Sib-Pair Methods 141
Handling Sibships with More Than 2 Affected Siblings 142
Methods Incorporating Affected Relative Pairs 142
NPL Analysis 143
Fitting Population Parameters 145
Power Analysis and Experimental Design Considerations for Qualitative Traits 147
Factors Influencing Power of Sib-pair Methods 147
The Example of Testicular Cancer 148
Examples of Sib-Pair Methods for Mapping Complex Traits 150
Mapping Quantitative Traits 151
Measuring Genetic Effects in Quantitative Traits 152
Study Design for Quantitative Trait Linkage Analysis 154
Haseman-Elston Regression 155
Variance Components Linkage Analysis 156
Nonparametric Methods 158
The Future 159
Software Available 160
References 160
7 Data Management 169 Stephen D. Turner and William S. Bush
Developing a Data Organization Strategy 170
A Brief Overview of Data Normalization 170
Database Management System (DBMS) and Structured Query Language (SQL) 172
Partitioning Data by Type 173
Sequence-Level Data 174
Sample-Level Data 174
Database Implementation 175
Hardware and Software Requirements 175
Implementation and Performance Tuning 175
Interacting with the Database Directly 176
Security 177
Other Tools for Data Management and Manipulation 177
R 177
PLINK 178
SAMtools 178
Workflow Management and Cloud Computing 178
Conclusion 179
References 179
8 Linkage Disequilibrium and Association Analysis 182Eden R. Martin and Ren-HuaChung
Introduction 182
Linkage Disequilibrium 182
Measures of Allelic Association 183
Causes of Allelic Association 184
Mapping Genes Using Linkage Disequilibrium 186
Tests of Association 187
Case-Control Tests 188
Test Statistics 188
Measures of Disease Association and Impact 189
Assessing Confounding Bias 191
Family-Based Tests of Association 192
The Transmission/Disequilibrium Test 192
Tests Using Unaffected Sibling Controls 194
Tests Using Extended Pedigrees 195
Regression and Likelihood-Based Methods 196
Association Tests with Quantitative Traits 197
Analysis of Haplotype Data 197
Genome-Wide Association Studies (GWAS) 198
Special Populations 199
HapMap 200
1000 Genomes Project 200
Summary 201
References 201
9 Genome-Wide Association Studies 205 Jacob L. McCauley, Yogasudha Veturi, Shefali Setia Verma, and Marylyn D. Ritchie
Introduction 205
Definition of GWAS 206
Purpose of GWAS 206
Design 206
Technologies for High-Density Genotyping 206
Discrete and Quantitative Trait Analysis 208
Case-Control, Family-Based, and Cohort Study Designs 209
Statistical Power for Association and Correction for Testing Multiple Hypotheses 211
Data Analysis 212
Quality Control on Genotyping Call Data 212
Initial Genotyping Quality Control 213
Sample-Level Quality Control 214
SNP-Level Quality Control 215
Software Programs for Quality Control 215
Population Structure 216
Imputation 219
Genetic Association Testing 220
Meta-Analysis and "Mega-Analysis" 221
Whole-Genome Regression-Based GWAS 222
Conclusion 222
References 222
10 Bioinformatics of Human Genetic Disease Studies 228Dale J. Hedges
Introduction 228
Common Threads Genome Analysis 229
A Brief Note on Study Design 229
Data Format Manipulation 229
Planning for Adequate Computational Resources 230
Storage 231
Processing and Memory 232
Networking 232
Genomics in the Cloud 232
Processing and Analysis of Genomic Data 233
Array-Based Data 233
DNA Arrays and High-Throughput Genotyping 233
Preprocessing and Initial Quality Control 234
Genotype Calling 234
Call Efficiency 235
Data Cleaning and Additional Quality Control 236
Inferring Structural Variation From SNP-based Array Data 236
A Note on Statistical Analysis and Interpretation of Results 236
Array-Based Analysis of Gene Expression 237
Batch Effects and Data Normalization 237
Differential Expression 238
Classification and Clustering Methods 239
Visualization of Expression Data 240
Pathway and Network Analyses 240
Direct Counting and Other Expression Assay Procedures 241
Additional Uses for Oligonucleotide Arrays 242
High-Throughput Sequencing Methods for Genomics 243
Introduction 243
High-Throughput Sequencing for Genotype Inference 244
Expression Analysis from High-Throughput Sequencing Data - RNA-Seq 252
ChIP-Seq and Methylation-based Sequences 255
Bioinformatics Resources 256
Annotation of Genomic Data 257
Genome Browsers as Versatile Tools 258
Bioinformatics Frameworks and Workflows 259
Crowdsourcing and Troubleshooting 260
Data Sharing 260
References 261
11 Complex Genetic Interactions/Data Mining/Dimensionality Reduction 265William S. Bush and Stephen D. Turner
Human Diseases Are Complex 265
Complexity of Biological Systems 266
Genetic Heterogeneity 267
Statistical and Mathematical Concepts of Complex Genetic Models 268
Analytic Approaches to the Detection of Complex Interactions 270
Linkage Analysis/Genomic Sharing 270
Association Analysis 270
Genome-Wide Association Analysis 272
Conclusion 273
References 273
12 Sample Size, Power, and Data Simulation 278Sarah A. Pendergrass and Marylyn D. Ritchie
Introduction 278
Sample Size and Power 279
Power Calculations and Simulation 282
Power Studies for Association Analysis 282
Software for Calculating Power for Association Studies, Family- or Population-Based 283
PGA: Power for Genetic Association Analyses 283
Fine-Mapping Power Calculator 284
Quanto 284
PAWE: Power for Association with Errors 284
PAWE-3D 284
GPC: Genetic Power Calculator 284
CaTS 284
INPower 284
Software for Calculating Power for Transmission Disequilibrium Testing (TDT) and Affected Sib-Pair Testing (ASP) 284
TDT-PC: Transmission Disequilibrium Test Power Calculator 284
TDTASP 285
TDTPOWER 285
ASP/ASPSHARE 285
Simulation Software for Association Study Power Assessment 285
Backward and Forward Model Simulations 285
Coalescent Model Simulation - Short Genetic Sequences 286
Larger Coalescent Simulated Models 286
Forward Model Simulations - Short Genetic Sequences 286
Forward Model Simulations - Large Genetic Sequences 286
Resampling Simulation Tools 287
Software for Simulation of Phenotypic Data 287
Power Simulations for Linkage Analysis 288
Definitions for Power Assessments for Linkage Analysis 288
Computer Simulation Methods for Linkage Analysis of Mendelian Disease 289
SIMLINK 289
SLINK: Simulation Program for Linkage Analysis 289
SUP: Slink Utility Program 290
ALLEGRO 290
MERLIN: Multipoint Engine for Rapid Likelihood Inference 290
SimPED 290
Power Studies for Linkage Analysis - Complex Disease 290
Inclusion of Unaffected Siblings 291
Affected Relative Pairs of Other Types 291
Other Considerations 291
Genomic Screening Strategies: One-Stage versus Two-Stage Designs 291
Software for Designing Linkage Analysis Studies of Complex Disease 292
SIMLA 292
Quantitative Traits 292
Extreme Discordant Pairs 292
Sampling Consideration for the Variance Component Method 293
Software for Designing Linkage Analysis Studies for Quantitative Traits 294
SOLAR: Sequential Oligogenic Linkage Analysis Routines 294
MERLIN: Multipoint Engine for Rapid Likelihood Inference 294
SimuPOP 294
Summary 294
References 294
Index 298
William K. Scott1, Marylyn D. Ritchie2, Jonathan L. Haines3, and Margaret A. Pericak-Vance1
1 Dr. John T. Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
2 Department of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
3 Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
Disease gene discovery in humans has a long history, predating even the identification of DNA as the genetic molecule (Watson and Crick 1953) and the determination of the number of human chromosomes (Ford and Hamerton 1956; Tjio and Levan 1956). In fact, as early as the 1930s some simple statistical methods for the analysis of genetic data had been developed (Bernstein 1931; Fisher 1935a,b). However, these methods were severely limited in their application (more on basic concepts of genetics in Chapter 2). Not only were genetic markers lacking (the ABO blood type was one of the few that had been described), but these methods were restricted to small, two to three generation pedigrees. Any calculations were performed by hand, of course, making analysis laborious.
There were two hurdles to overcome before human disease gene discovery would become routine. First, appropriate statistical methods were lacking, as were ways of automating the calculations. Second, sufficient genetic markers to cover the human genome needed to be identified. Morton (1955), building on the work of Haldane and Smith (1947) and Wald (1947), described the use of maximum likelihood approaches in a sequential test for linkage between two loci. He used the term "LOD score" (for logarithm of the odds of linkage) for his test. This score is the basis for most modern genetic linkage analyses and represents a milestone in human disease gene discovery. However, the complex calculations had to be done by hand, severely limiting the use of this approach. Elston and Stewart (1971) described a general approach for calculating the likelihood of any non-consanguineous pedigree. This algorithm was extended by Lange and Elston (1975) to include pedigrees of arbitrary complexity. Soon thereafter, the first general-purpose computer program for linkage in humans, LIPED (Ott 1974), was described. Thus, the first of the two major hurdles was overcome.
By the mid-1970s there were 40-50 red cell antigen and serum protein polymorphisms available as genetic markers. A few markers could be arranged into initial linkage groups, but these markers covered only approximately 5-15% of the human genome. In addition to this limited coverage, genotyping these polymorphisms was labor intensive, time consuming, and often quite technically demanding. This remaining hurdle was crossed with the description of restriction fragment length polymorphisms (RFLPs) by Botstein et al. (1980). Not only were these markers easier to genotype in a standard manner, but they were frequent in the genome, covering the remaining 85-95% of the genome for the first time.
With these tools in place, the field of human disease gene discovery blossomed. The first successful disease gene linkage using RFLPs was reported (Gusella et al. 1983), localizing the Huntington disease gene to chromosome 4p. This discovery marked the beginning of disease gene identification through the positional cloning approach. Early successes using positional cloning were for diseases inherited in Mendelian fashion: autosomal dominant, autosomal recessive, or X-linked. Although confounding factors such as genetic heterogeneity, variable penetrance, and phenocopies might exist for single-gene or Mendelian traits, it is generally possible with a known genetic model to determine the best and most efficient approach to identifying the responsible gene. The success of these tools is apparent since by mid-2017 over 3350 single-gene disorders had at least one causative genetic variant identified (OMIM, accessed May 2017 at http://omim.org).
However, the inheritance patterns for traits such as the common form of Alzheimer's disease, multiple sclerosis, and non-insulin-dependent diabetes (to name a few) do not fit any simple genetic explanation, making it far more difficult to determine the best approach to identifying the unknown underlying effect. In addition to the confounding factors involved in single-gene disorders, such as genetic heterogeneity and phenocopies, gene-gene and gene-environment interactions must be considered when a complex trait is dissected. However, the tools that enabled efficient mapping of Mendelian trait loci through positional cloning were not as effective in dissecting these more complex traits. New statistical tools, study designs, and genotyping technologies were needed to perform large-scale analysis of genetic factors underlying these complex traits. As these technologies were developed, a new approach to complex disease gene identification via genome-wide association studies (GWAS) was enabled. The shift to this approach was predicted by a seminal perspective published by Risch and Merikangas (1996), in which they showed that large-scale case-control analyses of complex traits would be a powerful and efficient method of identifying alleles underlying complex traits, once genotyping technology allowed the cost-effective determination of a dense map of genetic markers. The first GWAS was published in 2005 (Klein et al. 2005), identifying the association of variation in the CFH gene with age-related macular degeneration. This was simultaneously confirmed using alternate study designs (Edwards et al. 2005; Haines et al. 2005) proving that GWAS worked, allowing this new era of complex disease genetics to begin in earnest.
With the dawn of the GWAS era, a corresponding shift in the prevailing hypotheses for these studies occurred. No longer were studies solely searching for one or a few rare mutations in a single gene that cause a rare and devastating disease. Studies of common complex diseases were searching for multiple alterations in one or more genes acting alone or in concert to increase or decrease the risk of developing a trait. Early GWAS tended to test the "common disease-common variant" (CDCV) hypothesis: the risk for common diseases, across ethnic groups, arises from evolutionarily old variants that have had substantial time to spread throughout the human population. Many studies successfully identified thousands of variants associated with the risk of complex diseases. An interactive catalog of these variants is maintained by the National Human Genome Research Institute and the European Molecular Biology Laboratory at http://www.ebi.ac.uk/gwas. Despite these successes, many studies testing the CDCV hypothesis failed to explain all the heritable variation in the risk of the complex traits under study - a phenomenon termed "missing heritability" (Manolio et al. 2009). One explanation for this was that the effect of rare variants was not well studied by early GWAS - an alternative hypothesis termed the "common disease-rare variant" (CDRV) hypothesis. This hypothesis suggests that risk of common complex diseases arises from a larger number of rare variants in one or more genes, perhaps occurring more recently.
As was the case with common variants and the exploration of the CDCV hypothesis being enabled by GWAS approaches and high-throughput genotyping technology, exploration of the CDRV hypothesis was enabled by advances in high-throughput sequencing technology and accompanying statistical analysis methods. Initial screens of coding-sequence variants in Mendelian traits via whole-exome sequencing (WES) were published by Ng et al. (2009, 2010) and Choi et al. (2009), demonstrating that in some cases, disease gene mapping could skip the positional cloning strategy and proceed directly to evaluating segregation of mutations in families. This proof of principle has been used to justify this approach for testing the CDRV hypothesis in complex traits but has been met with mixed success. A successful example is the recent analysis of 50?000 individuals in the MyCode Community Health Initiative successfully identified rare variants underlying cardiovascular traits and lipid levels (Dewey et al. 2016). The rapid and continuing decrease in whole-genome sequencing (WGS) costs suggests that within a few years, it will be possible (and perhaps commonplace) to test the CDRV hypothesis using WGS in large sample sizes - essentially performing genome-wide association for common and rare variants with direct genotype determination via sequencing.
Study design, laboratory methods, and analytic approaches differ by trait type (Mendelian or complex) and hypothesis being tested (rare disease-rare variant, Mendelian positional cloning; CDCV [GWAS]; CDRV [WES or WGS and individual variant or set-based association]). These approaches are described in the following sections.
Each genetically complex trait has its own peculiarities that require special attention. However, a guiding paradigm can be applied to most conditions. Originally, the general approach...
Dateiformat: ePUBKopierschutz: Adobe-DRM (Digital Rights Management)
Systemvoraussetzungen:
Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet – also für „fließenden” Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!
Weitere Informationen finden Sie in unserer E-Book Hilfe.