Strategies in Biomedical Data Science

Driving Force for Innovation
Wiley (Verlag)
  • erschienen am 27. Dezember 2016
  • |
  • 464 Seiten
E-Book | ePUB mit Adobe-DRM | Systemvoraussetzungen
978-1-119-25597-0 (ISBN)
An essential guide to healthcare data problems, sources, and solutions
Strategies in Biomedical Data Science provides medical professionals with much-needed guidance toward managing the increasing deluge of healthcare data. Beginning with a look at our current top-down methodologies, this book demonstrates the ways in which both technological development and more effective use of current resources can better serve both patient and payer. The discussion explores the aggregation of disparate data sources, current analytics and toolsets, the growing necessity of smart bioinformatics, and more as data science and biomedical science grow increasingly intertwined. You'll dig into the unknown challenges that come along with every advance, and explore the ways in which healthcare data management and technology will inform medicine, politics, and research in the not-so-distant future. Real-world use cases and clear examples are featured throughout, and coverage of data sources, problems, and potential mitigations provides necessary insight for forward-looking healthcare professionals.
Big Data has been a topic of discussion for some time, with much attention focused on problems and management issues surrounding truly staggering amounts of data. This book offers a lifeline through the tsunami of healthcare data, to help the medical community turn their data management problem into a solution.
* Consider the data challenges personalized medicine entails
* Explore the available advanced analytic resources and tools
* Learn how bioinformatics as a service is quickly becoming reality
* Examine the future of IOT and the deluge of personal device data
The sheer amount of healthcare data being generated will only increase as both biomedical research and clinical practice trend toward individualized, patient-specific care. Strategies in Biomedical Data Science provides expert insight into the kind of robust data management that is becoming increasingly critical as healthcare evolves.
weitere Ausgaben werden ermittelt
JAY A. ETCHINGS is the director of operations at Arizona State University's Research Computing program, where he is responsible for developing innovative architectures to progress fluid technical environments supporting highly computational workloads, peta-scale data analysis, next-generation cyber capabilities, and emerging network innovations.
Foreword xi
Acknowledgments xv
Introduction 1
Who Should Read This Book? 3
What's in This Book? 4
How to Contact Us 6
Chapter 1 Healthcare, History, and Heartbreak 7
Top Issues in Healthcare 9
Data Management 16
Biosimilars, Drug Pricing, and Pharmaceutical Compounding 18
Promising Areas of Innovation 19
Conclusion 25
Notes 25
Chapter 2 Genome Sequencing: Know Thyself, One Base Pair at a Time 27
Content contributed by Sheetal Shetty and Jacob Brill
Challenges of Genomic Analysis 29
The Language of Life 30
A Brief History of DNA Sequencing 31
DNA Sequencing and the Human Genome Project 35
Select Tools for Genomic Analysis 38
Conclusion 47
Notes 48
Chapter 3 Data Management 53
Content contributed by Joe Arnold
Bits about Data 54
Data Types 56
Data Security and Compliance 59
Data Storage 66
SwiftStack 70
OpenStack Swift Architecture 78
Conclusion 94
Notes 94
Chapter 4 Designing a Data-Ready Network Infrastructure 105
Research Networks: A Primer 108
ESnet at 30: Evolving toward Exascale and Raising Expectations 109
Internet2 Innovation Platform 111
Advances in Networking 113
InfiniBand and Microsecond Latency 114
The Future of High-Performance Fabrics 117
Network Function Virtualization 119
Software-Defined Networking 121
OpenDaylight 122
Conclusion 157
Notes 157
Chapter 5 Data-Intensive Compute Infrastructures 163
Content contributed by Dijiang Huang, Yuli Deng, Jay Etchings, Zhiyuan Ma, and Guangchun Luo
Big Data Applications in Health Informatics 166
Sources of Big Data in Health Informatics 168
Infrastructure for Big Data Analytics 171
Fundamental System Properties 186
GPU-Accelerated Computing and Biomedical Informatics 187
Conclusion 190
Notes 191
Chapter 6 Cloud Computing and Emerging Architectures 211
Cloud Basics 213
Challenges Facing Cloud Computing Applications in Biomedicine 215
Hybrid Campus Clouds 216
Research as a Service 217
Federated Access Web Portals 219
Cluster Homogeneity 220
Emerging Architectures (Zeta Architecture) 221
Conclusion 229
Notes 229
Chapter 7 Data Science 235
NoSQL Approaches to Biomedical Data Science 237
Using Splunk for Data Analytics 244
Statistical Analysis of Genomic Data with Hadoop 250
Extracting and Transforming Genomic Data 253
Processing eQTL Data 256
Generating Master SNP Files for Cases and Controls 259
Generating Gene Expression Files for Cases and Controls 260
Cleaning Raw Data Using MapReduce 261
Transpose Data Using Python 263
Statistical Analysis Using Spark 264
Hive Tables with Partitions 268
Conclusion 270
Notes 270
Appendix: A Brief Statistics Primer 290
Content Contributed by Daniel Peñaherrera
Chapter 8 Next-Generation Cyberinfrastructures 307
Next-Generation Cyber Capability 308
NGCC Design and Infrastructure 310
Conclusion 327
Note 330
Conclusion 335
Appendix A The Research Data Management Survey: From Concepts to Practice 337
Brandon Mikkelsen and Jay Etchings
Appendix B Central IT and Research Support 353
Gregory D. Palmer
Appendix C HPC Working Example: Using Parallelization Programs Such as GNU Parallel and OpenMP with Serial
Tools 377
Appendix D HPC and Hadoop: Bridging HPC to Hadoop 385
Appendix E Bioinformatics + Docker: Simplifying Bioinformatics Tools Delivery with Docker Containers 391
Glossary 399
About the Author 419
About the Contributors 421
Index 427


Never let the future disturb you.

You will meet it, if you have to, with the same weapons of reason which today arm you against the present.

-Marcus Aurelius

Some time ago, while I was engaged as a consultant, it became painfully obvious that the approaches to healthcare data management and overall infrastructure architecture were stuck in the Stone Age. While data and information technology (IT) professionals sprinted to remain on the cutting edge of top tech trends, much of the healthcare system remained a technical backwater. The many explanations for this include compliance controls, challenges associated with the rapid proliferation of data, and reliance on old systems with proprietary code where porting was more painful than the day-to-day operations. This state of affairs has been frustrating for all involved. But beyond the very real frustrations, there are far more important negative impacts. Technical inefficiencies increase costs, lead to a loss of research productivity, and hurt clinical outcomes. In other words, everyone suffers. When I talk to people about data management and IT support within the healthcare field, a recurring theme is that much is "lost in translation" between the various stakeholders: IT professionals, researchers, doctors, clinicians, and administrators.

Over the past 20 years, much of my time has been spent in medical and technical fields. I have held positions with two large insurance payer providers and have worked with the Centers for Medicare & Medicaid Services (CMS) as a recovery audit contractor. I have even worked clinically as an emergency medical technician with a strong background in exercise physiology. Seeking greater challenges led me to Las Vegas, Nevada, where I was fortunate to work on the first cloud-enabled centrally deterministic (Class 2) gaming systems for the state lottery. This was well before the term "cloud" had even arrived. At the close of the project, I returned to the medical field, joining a Fortune 50 payer provider ingesting targeted acquisitions.

My wide-ranging work experiences have showed me that medical and research professionals are usually not technology experts, and most do not desire to be. At the same time, computer scientists and infrastructure experts are not biologists, doctors, or researchers. This longtime disconnect paves the way for high-paid consultants to act as intermediaries brought in to work between IT and biomedical staff.

Not surprisingly, this does not work terribly well, neither does it best serve the medical and research communities. Consultants typically demand high compensation and often are not able to perform the sort of knowledge transfer necessary to make a meaningful and sustainable impact. There are many different permutations and possible explanations for this. But, in the end, I think it is at heart a failure to adequately translate or bridge biomedicine and IT.

The primary motivation for this book is to begin to create a sustainable and readily accessible bridge between IT and data technologists, on one hand, and the community of clinicians, researchers, and academics who deliver and advance healthcare, on the other hand. This book is thus a translational text that will hopefully work both ways. It can help IT staff learn more about clinical and research needs within biomedicine. It also can help doctors and researchers learn more about data and other technical tools that are potentially at their disposal.

My experience in healthcare has shown me that both IT professionals and biologists tend to become isolated or siloed in their professional worlds. This isolation hurts us all: IT staff, biologists, doctors, and patients alike. This is not to suggest that IT staff and data managers should get master's degrees in biology or epidemiology. Rather, I am suggesting that as IT staff and data managers learn more about the biomedical context of their work, they will be able to work better and more efficiently. Furthermore, as biomedicine becomes ever more dependent on computing and big data, there is more and more domain-specific technical knowledge to assimilate.

As IT and biomedicine innovate with increasing rapidity, I predict that we will see more and more hybrid job titles, such as health technologist and bioinformatician. In order to stay current, both IT professionals and biomedical professionals will need to become less isolated. This book begins to bring together these two fields that are so dependent on each other and have so much to offer each other. It is my sincere hope that this work will narrow the gap between those engaged in use-inspired research and those supporting that research from an infrastructure delivery perspective.

In the interest of creating as accessible a bridge text as possible between IT staff and biomedical personnel, this book is relatively nontechnical. For the most part, the aim is to offer a conceptual introduction to key topics in data management for the biomedical sciences. While a certain familiarity with IT, networking, and applications is assumed, you will find very little in the way of code examples. The goal is to equip you with some foundational concepts that will leave you prepared to seek out whatever additional information you and your institution might need.

I have worked in IT for over 20 years, but I am most inspired by how computing technologies can be used to solve human problems. I certainly appreciate elegant code and innovative technical solutions. But at the end of the day, it is the prospect of improving patient outcomes that keeps me engaged and driven to learn and continually extend the boundaries of the possible. One area of biomedical research that I find particularly inspiring is the potential to use targeted therapies to more effectively treat pediatric low-grade astrocytomas (PLGAs). PLGAs are by far the most common cancer of the brain among children. They are often fatal, and current chemotherapies frequently have lifelong side effects, including neurocognitive impairment. Dr. Joshua LaBaer, interim director of the Biodesign Institute at Arizona State University, is working to develop effective targeted therapies that reduce harmful effects on normal cells. Proceeds from this book support the ASU Research Foundation and the work of Dr. Joshua LaBaer, Director, The Biodesign Institute, Personalized Diagnostics and Virginia G. Piper Chair in Personalized Medicine.

In reflecting on the important roles to be played by humans and by computing, I am reminded of a frequently cited quote by Leo M. Cherne, an American economist and public servant, that is often inaccurately attributed to Albert Einstein: "The computer is incredibly fast, accurate, and stupid. Man is unbelievably slow, inaccurate, and brilliant. The marriage of the two is a force beyond calculation." As our capabilities to gather, analyze, and archive data dramatically improve, computing is likely to be increasingly valuable to biomedical research and clinical medicine. Yet let us always remember the need for humans, slow and inaccurate as we usually are.


Strategies in Biomedical Data Science is designed to help anyone who works with biomedical data. This certainly includes IT staff and systems administrators. These readers will hopefully gain a deeper understanding of particular challenges and solutions for biomedical data management. The target audience also includes bioscience researchers and clinical staff. While persons in these roles are not typically directly responsible for data management, they are most certainly concerned with and affected by how data is created, used, and archived. I hope these readers will gain a deeper understanding of how IT staff tend to approach systems architecture and data management. Quite frequently we focus on research academic and other public research institutions. Such institutions are tremendously important for cutting-edge research and collaboration. Most of the best practices and scenarios presented in the book are, however, equally applicable to private-sector use cases.

All readers are welcome to work through this book in whatever order best suits their particular interests and needs.


Strategies in Biomedical Data Science offers a relatively high-level introduction to the cutting-edge and rapidly changing field of biomedical data. It provides biomedical IT professionals with much-needed guidance toward managing the increasing deluge of healthcare data. This book demonstrates ways in which both technological development and more effective use of current resources can better serve both patient and payer. The discussion explores the aggregation of disparate data sources, current analytics and tool sets, the growing necessity of smart bioinformatics, and more as data science and biomedical science grow increasingly intertwined. Real-world use cases and clear examples are featured throughout, and coverage of data sources, problems, and potential mitigation provides necessary insight for forward-looking healthcare professionals.

The book begins with an overview of current technical challenges in healthcare and then moves into topics in biomedical data management, including network infrastructure, compute infrastructure, cloud architecture, and finally next-generation cyberinfrastructures.

Many of the chapters include use cases and/or case studies. Use cases examine a general use case and typically focus on one application or technology....

Dateiformat: ePUB
Kopierschutz: Adobe-DRM (Digital Rights Management)


Computer (Windows; MacOS X; Linux): Installieren Sie bereits vor dem Download die kostenlose Software Adobe Digital Editions (siehe E-Book Hilfe).

Tablet/Smartphone (Android; iOS): Installieren Sie bereits vor dem Download die kostenlose App Adobe Digital Editions (siehe E-Book Hilfe).

E-Book-Reader: Bookeen, Kobo, Pocketbook, Sony, Tolino u.v.a.m. (nicht Kindle)

Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet - also für "fließenden" Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein "harter" Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.

Bitte beachten Sie bei der Verwendung der Lese-Software Adobe Digital Editions: wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!

Weitere Informationen finden Sie in unserer E-Book Hilfe.

Download (sofort verfügbar)

41,99 €
inkl. 7% MwSt.
Download / Einzel-Lizenz
ePUB mit Adobe-DRM
siehe Systemvoraussetzungen
E-Book bestellen