
Introduction to Data Science
Data Wrangling and Visualization with R
Rafael A. Irizarry(Autor*in)
Chapman & Hall/CRC (Verlag)
2. Auflage
Erschienen am 2. August 2024
Buch
Hardcover
328 Seiten
978-1-032-11655-6 (ISBN)
Beschreibung
Unlike the first edition, the new edition has been split into two books.
Thoroughly revised and updated, this is the first book of the second edition of Introduction to Data Science: Data Wrangling and Visualization with R. It introduces skills that can help you tackle real-world data analysis challenges. These include R programming, data wrangling with dplyr, data visualization with ggplot2, file organization with UNIX/Linux shell, version control with Git and GitHub, and reproducible document preparation with Quarto and knitr. The new edition includes additional material on data.table, locales, and accessing data through APIs. The book is divided into four parts: R, Data Visualization, Data Wrangling, and Productivity Tools. Each part has several chapters meant to be presented as one lecture and includes dozens of exercises. The second book will cover topics including probability, statistics and prediction algorithms with R.
Throughout the book, we use motivating case studies. In each case study, we try to realistically mimic a data scientist's experience. For each of the skills covered, we start by asking specific questions and answer these through data analysis. Examples of the case studies included in the book are: US murder rates by state, self-reported student heights, trends in world health and economics, and the impact of vaccines on infectious disease rates.
This book is meant to be a textbook for a first course in Data Science. No previous knowledge of R is necessary, although some experience with programming may be helpful. To be a successful data analyst implementing these skills covered in this book requires understanding advanced statistical concepts, such as those covered the second book. If you read and understand all the chapters and complete all the exercises in this book, and understand statistical concepts, you will be well-positioned to perform basic data analysis tasks and you will be prepared to learn the more advanced concepts and skills needed to become an expert.
Thoroughly revised and updated, this is the first book of the second edition of Introduction to Data Science: Data Wrangling and Visualization with R. It introduces skills that can help you tackle real-world data analysis challenges. These include R programming, data wrangling with dplyr, data visualization with ggplot2, file organization with UNIX/Linux shell, version control with Git and GitHub, and reproducible document preparation with Quarto and knitr. The new edition includes additional material on data.table, locales, and accessing data through APIs. The book is divided into four parts: R, Data Visualization, Data Wrangling, and Productivity Tools. Each part has several chapters meant to be presented as one lecture and includes dozens of exercises. The second book will cover topics including probability, statistics and prediction algorithms with R.
Throughout the book, we use motivating case studies. In each case study, we try to realistically mimic a data scientist's experience. For each of the skills covered, we start by asking specific questions and answer these through data analysis. Examples of the case studies included in the book are: US murder rates by state, self-reported student heights, trends in world health and economics, and the impact of vaccines on infectious disease rates.
This book is meant to be a textbook for a first course in Data Science. No previous knowledge of R is necessary, although some experience with programming may be helpful. To be a successful data analyst implementing these skills covered in this book requires understanding advanced statistical concepts, such as those covered the second book. If you read and understand all the chapters and complete all the exercises in this book, and understand statistical concepts, you will be well-positioned to perform basic data analysis tasks and you will be prepared to learn the more advanced concepts and skills needed to become an expert.
Rezensionen / Stimmen
Praise for the first edition:"I think the book would be perfect for schools looking to make a transition to a model where introduction to data science takes the place of introduction to statistics and maybe introductory computer science."
- Arend Kuyper, Northwestern University
"A great introduction to data science and modern R programing, with tons of examples of application of the R abilities throughout the whole volume. The book suggests multiple links to the internet websites related to the topics under consideration that makes it an incredibly useful source of contemporary data science and programing, helping to students and researchers in their projects."
- Technometrics
"Introduction to Data Science will teach you to juggle with your data and get maximum results from it using R. I highly recommended this book for students and everybody taking the first steps in data science using R."
- Maria Ivanchuk, ISCB News
Weitere Details
Reihe
Auflage
2nd edition
Sprache
Englisch
Verlagsort
Boca Raton
Großbritannien
Verlagsgruppe
Taylor & Francis Ltd
Zielgruppe
Für höhere Schule und Studium
Undergraduate Advanced and Undergraduate Core
Illustrationen
56 s/w Abbildungen, 137 farbige Abbildungen, 57 Farbfotos bzw. farbige Rasterbilder, 56 s/w Zeichnungen, 80 farbige Zeichnungen, 2 s/w Tabellen
2 Tables, black and white; 80 Line drawings, color; 56 Line drawings, black and white; 57 Halftones, color; 137 Illustrations, color; 56 Illustrations, black and white
Maße
Höhe: 260 mm
Breite: 183 mm
Dicke: 23 mm
Gewicht
846 gr
ISBN-13
978-1-032-11655-6 (9781032116556)
Copyright in bibliographic data and cover images is held by Nielsen Book Services Limited or by the publishers or by their respective licensors: all rights reserved.
Schweitzer Klassifikation
Weitere Ausgaben
Andere Ausgaben

E-Book
08/2024
2. Auflage
Chapman and Hall
82,99 €
Als Download verfügbar

E-Book
08/2024
2. Auflage
Chapman and Hall
82,99 €
Als Download verfügbar
Vorauflage

Buch
11/2019
1. Auflage
Chapman & Hall/CRC
104,78 €
Artikel ist vergriffen; siehe Neuauflage
Person
Rafael A. Irizarry is professor and chair of Data Science at the Dana-Farber Cancer Institute, professor of biostatistics at Harvard, and a fellow of the American Statistical Association and the International Society of Computational Biology. Prof. Irizarry is an applied statistician and during the last 25 years has worked in diverse areas, including genomics, sound engineering, and public health surveillance. He disseminates solutions to data analysis challenges as open source software, tools that are widely downloaded and used. Prof. Irizarry has also developed and taught several data science courses at Harvard as well as popular online courses.
Inhalt
Preface Acknowledgements Introduction Part 1: R 1. Getting started 2. R basics 3. Programming basics 4. The tidyverse 5. data.table 6. Importing data Part 2: Data Visualization 7. Visualizing data distributions 8. ggplot2 9. Data visualization principles 10. Data visualization in practice Part 3: Data Wrangling 11. Reshaping data 12. Joining tables 13. Parsing dates and times 14. Locales 15. Extracting data from the web 16. String processing 17. Text analysis Part 4: Productivity Tools 18. Organizing with Unix 19. Git and GitHub 20. Reproducible projects