Learn the algorithms and tools you need to build MapReduce applications with Hadoop and Spark for processing gigabyte, terabyte, or petabyte-sized datasets on clusters of commodity hardware. With this practical book, author Mahmoud Parsian, head of the big data team at Illumina, takes you step-by-stepthrough the design of machine-learning algorithms, such as Naive Bayes and Markov Chain, and shows you how apply them to clinical and biological datasets, using MapReduce design patterns.
* Apply MapReduce algorithms to clinical and biological data, such as DNA-Seq and RNA-Seq
* Use the most relevant regression/analytical algorithms used for different biological data types
* Apply t-test, joins, top-10, and correlation algorithms using MapReduce/Hadoop and Spark
Sprache
Verlagsort
Zielgruppe
Maße
Höhe: 230 mm
Breite: 184 mm
Dicke: 42 mm
Gewicht
ISBN-13
978-1-4919-0618-7 (9781491906187)
Schweitzer Klassifikation
Mahmoud Parsian, Ph.D. in Computer Science, is a practicingsoftware professional with 30 years of experience as a developer, designer, architect, and author. For the past 15 years, he hasbeen involved in Java server-side, databases, MapReduce, anddistributed computing. Dr. Parsian currently leads Illumina'sBig Data team, which is focused on large-scale genome analyticsand distributed computing. He leads and develops scalableregression algorithms; DNA sequencing and RNA sequencing pipelinesusing Java, MapReduce, Hadoop, HBase, and Spark; and open sourcetools. He is also the author of JDBC Recipes and JDBC Metadata (bothfrom Apress).