Genetic Analysis of Complex Diseases

Name: Genetic Analysis of Complex Diseases
Brand: Wiley
Price: 112.99 EUR
Availability: OnlineOnly

William K. Scott Marylyn D. Ritchie(Herausgeber*in)

Wiley (Verlag)

3. Auflage

Erschienen am 11. November 2021

336 Seiten

E-Book

ePUB mit Adobe-DRM

Systemvoraussetzungen

978-1-119-10407-0 (ISBN)

112,99 €inkl. 7% MwSt.

Systemvoraussetzungen

für ePUB mit Adobe-DRM

E-Book Einzellizenz

Als Download verfügbar

Beschreibung

Weitere Details

Weitere Ausgaben

Personen

Inhalt

List of Contributors xv

Foreword xvii

1 Designing a Study for Identifying Genes in Complex Traits 1
William K. Scott, Marylyn D. Ritchie, Jonathan L. Haines,and Margaret A. Pericak-Vance

Introduction 1

Components of a Disease Gene Discovery Study 3

Define Disease Phenotype 4

Clinical Definition 4

Determining that a Trait Has a Genetic Component 5

Identification of Datasets 5

Develop Study Design 5

Family-Based Studies 6

Population-Based Studies 6

Approaches for Gene Discovery 7

Analysis 7

Genomic Analysis 7

Statistical Analysis 8

Bioinformatics 8

Follow-up 8

Variant Detection 8

Replication 9

Functional Studies 9

Keys to a Successful Study 10

Foster Interaction of Necessary Expertise 10

Develop Careful Study Design 11

References 11

2 Basic Concepts in Genetics 13
Kayla Fourzali, Abigail Deppen, and Elizabeth Heise

Introduction 13

Historical Contributions 13

Segregation and Linkage Analysis 13

Hardy-Weinberg Equilibrium 14

DNA, Genes, and Chromosomes 17

Structure of DNA 17

Genes and Alleles 19

Genes and Chromosomes 20

Genes, Mitosis, and Meiosis 22

When Genes and Chromosomes Segregate Abnormally 25

Inheritance Patterns in Mendelian Disease 25

Autosomal Recessive 25

Autosomal Dominant 25

X-linked Inheritance 28

Mitochondrial Inheritance 29

Y-linked 29

Genetic Changes Associated with Disease/ Trait Phenotypes 29

Mutations Versus Polymorphisms 29

Point Mutations 30

Sickle Cell Anemia 30

Achondroplasia 30

Deletion/Insertion Mutations 31

Duchenne and Becker Muscular Dystrophy 31

Cystic Fibrosis 31

Charcot-Marie- Tooth Disease 31

Nucleotide Repeat Disorders 32

Susceptibility Versus Causative Genes 32

Summary 34

References 34

3 Determining the Genetic Component of a Disease 36
Allison Ashley Koch and Evadnie Rampersaud

Introduction 36

Study Design 37

Selecting a Study Population 37

Population-Based 38

Clinic-Based 38

Ascertainment 38

Single Affected Individual 39

Relative Pairs 40

Extended Families 40

Healthy or Unaffected Controls 41

Ascertainment Bias 42

Approaches to Determining the Genetic Component of a Disease 44

Co-segregation with Chromosomal Abnormalities and Other Genetic Disorders 44

Familial Aggregation 44

Family History Approach 44

Example of Calculating Attributable Fraction 46

Correlation Coefficients 46

Twin and Adoption Studies 47

Recurrence Risk in Relatives of Affected Individuals 48

Heritability 49

Example Using Correlation Coefficients to Calculate Heritability 50

Segregation Analysis 51

Summary 52

References 53

4 Study Design for Genetic Studies 58
Dana C. Crawford and Logan Dumitrescu

Introduction 58

Selecting a Study Population 58

Family- Based Studies (Linkage) 59

Family- Based Studies (Association) 60

Studies of Unrelated Individuals (Association) 61

Cohort Studies 61

Cross- Sectional Studies 66

Case- Control Studies 66

Other Study Designs 68

Biobanks 69

Other Biobanks 71

Biospecimens for Biobanks 72

Summary 73

References 74

5 Responsible Conduct of Research in Genetic Studies 79
Susan Estabrooks Hahn, Adam Buchanan, Chantelle Wolpert,and Susan H. Blanton

Introduction 79

Research Regulations and Genetics Research 80

Addressing Pertinent ELSI in Genetic Research 83

Genetic Discrimination 83

Privacy and Confidentiality 84

Certificate of Confidentiality 85

Coding Data and Samples 85

Secondary Subjects 86

Future Use of Samples/Data Sharing 87

Handling of Research Results 88

CLIA Regulations: Separation of Research and Clinical Laboratories 89

Releasing Children's Genetic Research Results 90

DNA Ownership 90

DNA Banking 90

Family Coercion 91

Practical Methods for Efficient High-Quality Genetic Research Services 91

The Investigator as the Genetic Study Coordinator 92

Time Spent 92

Recruitment 93

Support Groups and Organizations 93

Referrals from Health Care Providers 93

Research Databases and the Internet 94

Institution Databases 94

Medical Clinics 94

Recruitment by Family Members 95

Informed Consent 95

Vulnerable Populations 96

Minors 97

Persons with Cognitive Impairment 97

Data and Sample Collection 97

Sample Collection 97

Confirmation of Diagnosis 98

The Art of Field Studies 99

Referring for Additional Medical Services 99

Maintaining Contact with Participants 100

Future Considerations 100

References 100

6 Linkage Analysis 105
Susan H. Blanton

Disease Gene Discovery 107

Ability to Detect Linkage 116

Real World Example of LOD Score Calculation and Interpretation 117

Disease Gene Localization 120

Multipoint Analysis 121

Effects of Misspecified Model Parameters in LOD Score Analysis 124

Impact of Incorrect Disease Allele Frequency 124

Impact of Incorrect Mode of Inheritance 125

Impact of Incorrect Disease Penetrance 125

Impact of Incorrect Marker Allele Frequency 126

Control of Scoring Errors 127

Genetic Heterogeneity 128

Practical Approach for Model-Based Linkage Analysis of Complex Traits 131

Nonparametric Linkage Analysis 133

Identity by State and Identity by Descent 134

Methods for Nonparametric Linkage Analysis 136

Tests for Linkage Using Affected Sibling Pairs (ASP) 137

Test Based on Identity by State 137

Tests Based on Identity by Descent in ASPs 138

Simple Tests 138

Tests Applicable When IBD Status Cannot Be Determined 139

Multipoint Affected Sib-Pair Methods 141

Handling Sibships with More Than 2 Affected Siblings 142

Methods Incorporating Affected Relative Pairs 142

NPL Analysis 143

Fitting Population Parameters 145

Power Analysis and Experimental Design Considerations for Qualitative Traits 147

Factors Influencing Power of Sib-pair Methods 147

The Example of Testicular Cancer 148

Examples of Sib-Pair Methods for Mapping Complex Traits 150

Mapping Quantitative Traits 151

Measuring Genetic Effects in Quantitative Traits 152

Study Design for Quantitative Trait Linkage Analysis 154

Haseman-Elston Regression 155

Variance Components Linkage Analysis 156

Nonparametric Methods 158

The Future 159

Software Available 160

References 160

7 Data Management 169
Stephen D. Turner and William S. Bush

Developing a Data Organization Strategy 170

A Brief Overview of Data Normalization 170

Database Management System (DBMS) and Structured Query Language (SQL) 172

Partitioning Data by Type 173

Sequence-Level Data 174

Sample-Level Data 174

Database Implementation 175

Hardware and Software Requirements 175

Implementation and Performance Tuning 175

Interacting with the Database Directly 176

Security 177

Other Tools for Data Management and Manipulation 177

R 177

PLINK 178

SAMtools 178

Workflow Management and Cloud Computing 178

Conclusion 179

References 179

8 Linkage Disequilibrium and Association Analysis 182
Eden R. Martin and Ren-HuaChung

Introduction 182

Linkage Disequilibrium 182

Measures of Allelic Association 183

Causes of Allelic Association 184

Mapping Genes Using Linkage Disequilibrium 186

Tests of Association 187

Case-Control Tests 188

Test Statistics 188

Measures of Disease Association and Impact 189

Assessing Confounding Bias 191

Family-Based Tests of Association 192

The Transmission/Disequilibrium Test 192

Tests Using Unaffected Sibling Controls 194

Tests Using Extended Pedigrees 195

Regression and Likelihood-Based Methods 196

Association Tests with Quantitative Traits 197

Analysis of Haplotype Data 197

Genome-Wide Association Studies (GWAS) 198

Special Populations 199

HapMap 200

1000 Genomes Project 200

Summary 201

References 201

9 Genome-Wide Association Studies 205
Jacob L. McCauley, Yogasudha Veturi, Shefali Setia Verma, and Marylyn D. Ritchie

Introduction 205

Definition of GWAS 206

Purpose of GWAS 206

Design 206

Technologies for High-Density Genotyping 206

Discrete and Quantitative Trait Analysis 208

Case-Control, Family-Based, and Cohort Study Designs 209

Statistical Power for Association and Correction for Testing Multiple Hypotheses 211

Data Analysis 212

Quality Control on Genotyping Call Data 212

Initial Genotyping Quality Control 213

Sample-Level Quality Control 214

SNP-Level Quality Control 215

Software Programs for Quality Control 215

Population Structure 216

Imputation 219

Genetic Association Testing 220

Meta-Analysis and "Mega-Analysis" 221

Whole-Genome Regression-Based GWAS 222

Conclusion 222

References 222

10 Bioinformatics of Human Genetic Disease Studies 228
Dale J. Hedges

Introduction 228

Common Threads Genome Analysis 229

A Brief Note on Study Design 229

Data Format Manipulation 229

Planning for Adequate Computational Resources 230

Storage 231

Processing and Memory 232

Networking 232

Genomics in the Cloud 232

Processing and Analysis of Genomic Data 233

Array-Based Data 233

DNA Arrays and High-Throughput Genotyping 233

Preprocessing and Initial Quality Control 234

Genotype Calling 234

Call Efficiency 235

Data Cleaning and Additional Quality Control 236

Inferring Structural Variation From SNP-based Array Data 236

A Note on Statistical Analysis and Interpretation of Results 236

Array-Based Analysis of Gene Expression 237

Batch Effects and Data Normalization 237

Differential Expression 238

Classification and Clustering Methods 239

Visualization of Expression Data 240

Pathway and Network Analyses 240

Direct Counting and Other Expression Assay Procedures 241

Additional Uses for Oligonucleotide Arrays 242

High-Throughput Sequencing Methods for Genomics 243

Introduction 243

High-Throughput Sequencing for Genotype Inference 244

Expression Analysis from High-Throughput Sequencing Data - RNA-Seq 252

ChIP-Seq and Methylation-based Sequences 255

Bioinformatics Resources 256

Annotation of Genomic Data 257

Genome Browsers as Versatile Tools 258

Bioinformatics Frameworks and Workflows 259

Crowdsourcing and Troubleshooting 260

Data Sharing 260

References 261

11 Complex Genetic Interactions/Data Mining/Dimensionality Reduction 265
William S. Bush and Stephen D. Turner

Human Diseases Are Complex 265

Complexity of Biological Systems 266

Genetic Heterogeneity 267

Statistical and Mathematical Concepts of Complex Genetic Models 268

Analytic Approaches to the Detection of Complex Interactions 270

Linkage Analysis/Genomic Sharing 270

Association Analysis 270

Genome-Wide Association Analysis 272

Conclusion 273

References 273

12 Sample Size, Power, and Data Simulation 278
Sarah A. Pendergrass and Marylyn D. Ritchie

Introduction 278

Sample Size and Power 279

Power Calculations and Simulation 282

Power Studies for Association Analysis 282

Software for Calculating Power for Association Studies, Family- or Population-Based 283

PGA: Power for Genetic Association Analyses 283

Fine-Mapping Power Calculator 284

Quanto 284

PAWE: Power for Association with Errors 284

PAWE-3D 284

GPC: Genetic Power Calculator 284

CaTS 284

INPower 284

Software for Calculating Power for Transmission Disequilibrium Testing (TDT) and Affected Sib-Pair Testing (ASP) 284

GPC: Genetic Power Calculator 284

TDT-PC: Transmission Disequilibrium Test Power Calculator 284

TDTASP 285

TDTPOWER 285

ASP/ASPSHARE 285

Simulation Software for Association Study Power Assessment 285

Backward and Forward Model Simulations 285

Coalescent Model Simulation - Short Genetic Sequences 286

Larger Coalescent Simulated Models 286

Forward Model Simulations - Short Genetic Sequences 286

Forward Model Simulations - Large Genetic Sequences 286

Resampling Simulation Tools 287

Software for Simulation of Phenotypic Data 287

Power Simulations for Linkage Analysis 288

Definitions for Power Assessments for Linkage Analysis 288

Computer Simulation Methods for Linkage Analysis of Mendelian Disease 289

SIMLINK 289

SLINK: Simulation Program for Linkage Analysis 289

SUP: Slink Utility Program 290

ALLEGRO 290

MERLIN: Multipoint Engine for Rapid Likelihood Inference 290

SimPED 290

Power Studies for Linkage Analysis - Complex Disease 290

Inclusion of Unaffected Siblings 291

Affected Relative Pairs of Other Types 291

Other Considerations 291

Genomic Screening Strategies: One-Stage versus Two-Stage Designs 291

Software for Designing Linkage Analysis Studies of Complex Disease 292

SIMLA 292

Quantitative Traits 292

Extreme Discordant Pairs 292

Sampling Consideration for the Variance Component Method 293

Software for Designing Linkage Analysis Studies for Quantitative Traits 294

SOLAR: Sequential Oligogenic Linkage Analysis Routines 294

MERLIN: Multipoint Engine for Rapid Likelihood Inference 294

SimuPOP 294

Summary 294

References 294

Index 298

1
Designing a Study for Identifying Genes in Complex Traits

William K. Scott1, Marylyn D. Ritchie2, Jonathan L. Haines3, and Margaret A. Pericak-Vance1

1 Dr. John T. Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA

2 Department of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA

3 Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA

Introduction

Disease gene discovery in humans has a long history, predating even the identification of DNA as the genetic molecule (Watson and Crick 1953) and the determination of the number of human chromosomes (Ford and Hamerton 1956; Tjio and Levan 1956). In fact, as early as the 1930s some simple statistical methods for the analysis of genetic data had been developed (Bernstein 1931; Fisher 1935a,b). However, these methods were severely limited in their application (more on basic concepts of genetics in Chapter 2). Not only were genetic markers lacking (the ABO blood type was one of the few that had been described), but these methods were restricted to small, two to three generation pedigrees. Any calculations were performed by hand, of course, making analysis laborious.

There were two hurdles to overcome before human disease gene discovery would become routine. First, appropriate statistical methods were lacking, as were ways of automating the calculations. Second, sufficient genetic markers to cover the human genome needed to be identified. Morton (1955), building on the work of Haldane and Smith (1947) and Wald (1947), described the use of maximum likelihood approaches in a sequential test for linkage between two loci. He used the term "LOD score" (for logarithm of the odds of linkage) for his test. This score is the basis for most modern genetic linkage analyses and represents a milestone in human disease gene discovery. However, the complex calculations had to be done by hand, severely limiting the use of this approach. Elston and Stewart (1971) described a general approach for calculating the likelihood of any non-consanguineous pedigree. This algorithm was extended by Lange and Elston (1975) to include pedigrees of arbitrary complexity. Soon thereafter, the first general-purpose computer program for linkage in humans, LIPED (Ott 1974), was described. Thus, the first of the two major hurdles was overcome.

By the mid-1970s there were 40-50 red cell antigen and serum protein polymorphisms available as genetic markers. A few markers could be arranged into initial linkage groups, but these markers covered only approximately 5-15% of the human genome. In addition to this limited coverage, genotyping these polymorphisms was labor intensive, time consuming, and often quite technically demanding. This remaining hurdle was crossed with the description of restriction fragment length polymorphisms (RFLPs) by Botstein et al. (1980). Not only were these markers easier to genotype in a standard manner, but they were frequent in the genome, covering the remaining 85-95% of the genome for the first time.

With these tools in place, the field of human disease gene discovery blossomed. The first successful disease gene linkage using RFLPs was reported (Gusella et al. 1983), localizing the Huntington disease gene to chromosome 4p. This discovery marked the beginning of disease gene identification through the positional cloning approach. Early successes using positional cloning were for diseases inherited in Mendelian fashion: autosomal dominant, autosomal recessive, or X-linked. Although confounding factors such as genetic heterogeneity, variable penetrance, and phenocopies might exist for single-gene or Mendelian traits, it is generally possible with a known genetic model to determine the best and most efficient approach to identifying the responsible gene. The success of these tools is apparent since by mid-2017 over 3350 single-gene disorders had at least one causative genetic variant identified (OMIM, accessed May 2017 at http://omim.org).

However, the inheritance patterns for traits such as the common form of Alzheimer's disease, multiple sclerosis, and non-insulin-dependent diabetes (to name a few) do not fit any simple genetic explanation, making it far more difficult to determine the best approach to identifying the unknown underlying effect. In addition to the confounding factors involved in single-gene disorders, such as genetic heterogeneity and phenocopies, gene-gene and gene-environment interactions must be considered when a complex trait is dissected. However, the tools that enabled efficient mapping of Mendelian trait loci through positional cloning were not as effective in dissecting these more complex traits. New statistical tools, study designs, and genotyping technologies were needed to perform large-scale analysis of genetic factors underlying these complex traits. As these technologies were developed, a new approach to complex disease gene identification via genome-wide association studies (GWAS) was enabled. The shift to this approach was predicted by a seminal perspective published by Risch and Merikangas (1996), in which they showed that large-scale case-control analyses of complex traits would be a powerful and efficient method of identifying alleles underlying complex traits, once genotyping technology allowed the cost-effective determination of a dense map of genetic markers. The first GWAS was published in 2005 (Klein et al. 2005), identifying the association of variation in the CFH gene with age-related macular degeneration. This was simultaneously confirmed using alternate study designs (Edwards et al. 2005; Haines et al. 2005) proving that GWAS worked, allowing this new era of complex disease genetics to begin in earnest.

With the dawn of the GWAS era, a corresponding shift in the prevailing hypotheses for these studies occurred. No longer were studies solely searching for one or a few rare mutations in a single gene that cause a rare and devastating disease. Studies of common complex diseases were searching for multiple alterations in one or more genes acting alone or in concert to increase or decrease the risk of developing a trait. Early GWAS tended to test the "common disease-common variant" (CDCV) hypothesis: the risk for common diseases, across ethnic groups, arises from evolutionarily old variants that have had substantial time to spread throughout the human population. Many studies successfully identified thousands of variants associated with the risk of complex diseases. An interactive catalog of these variants is maintained by the National Human Genome Research Institute and the European Molecular Biology Laboratory at http://www.ebi.ac.uk/gwas. Despite these successes, many studies testing the CDCV hypothesis failed to explain all the heritable variation in the risk of the complex traits under study - a phenomenon termed "missing heritability" (Manolio et al. 2009). One explanation for this was that the effect of rare variants was not well studied by early GWAS - an alternative hypothesis termed the "common disease-rare variant" (CDRV) hypothesis. This hypothesis suggests that risk of common complex diseases arises from a larger number of rare variants in one or more genes, perhaps occurring more recently.

As was the case with common variants and the exploration of the CDCV hypothesis being enabled by GWAS approaches and high-throughput genotyping technology, exploration of the CDRV hypothesis was enabled by advances in high-throughput sequencing technology and accompanying statistical analysis methods. Initial screens of coding-sequence variants in Mendelian traits via whole-exome sequencing (WES) were published by Ng et al. (2009, 2010) and Choi et al. (2009), demonstrating that in some cases, disease gene mapping could skip the positional cloning strategy and proceed directly to evaluating segregation of mutations in families. This proof of principle has been used to justify this approach for testing the CDRV hypothesis in complex traits but has been met with mixed success. A successful example is the recent analysis of 50?000 individuals in the MyCode Community Health Initiative successfully identified rare variants underlying cardiovascular traits and lipid levels (Dewey et al. 2016). The rapid and continuing decrease in whole-genome sequencing (WGS) costs suggests that within a few years, it will be possible (and perhaps commonplace) to test the CDRV hypothesis using WGS in large sample sizes - essentially performing genome-wide association for common and rare variants with direct genotype determination via sequencing.

Study design, laboratory methods, and analytic approaches differ by trait type (Mendelian or complex) and hypothesis being tested (rare disease-rare variant, Mendelian positional cloning; CDCV [GWAS]; CDRV [WES or WGS and individual variant or set-based association]). These approaches are described in the following sections.

Components of a Disease Gene Discovery Study

Each genetically complex trait has its own peculiarities that require special attention. However, a guiding paradigm can be applied to most conditions. Originally, the general approach...

Systemvoraussetzungen

Als PDF speichern Als Link merken