Introduction to Data Science
Rafael A. Irizarry(Author)
Chapman & Hall/CRC (Publisher)
2nd Edition
Will be published approx. on 30. November 2026
Book
826 pages
978-1-032-51938-8 (ISBN)
Description
Unlike the first edition, the new edition has been split into two books, which have been brought together in this set.
Thoroughly revised and updated, the first book (Introduction to Data Science: Data Wrangling and Visualization with R) introduces skills that can help the reader tackle real-world data analysis challenges. These include R programming, data wrangling with dplyr, data visualization with ggplot2, file organization with UNIX/Linux shell, version control with Git and GitHub, and reproducible document preparation with Quarto and knitr. It includes additional material on data.table, locales, and accessing data through APIs. The book is divided into four parts: R, Data Visualization, Data Wrangling, and Productivity Tools. Each part has several chapters meant to be presented as one lecture and includes dozens of exercises.
The second book (Introduction to Data Science: Statistics and Prediction Algorithms Through Case Studies) teaches data science as a way of thinking statistically, not just as a collection of computational tools. Building on the topics covered in Introduction to Data Science: Data Wrangling and Visualization with R, this book is designed for students with some programming experience and basic mathematical maturity, this book builds the foundations of probability, statistical inference, regression, high-dimensional data analysis, and machine learning through real data examples and reproducible R code. It is suitable for one-semester course in advanced data science.
Thoroughly revised and updated, the first book (Introduction to Data Science: Data Wrangling and Visualization with R) introduces skills that can help the reader tackle real-world data analysis challenges. These include R programming, data wrangling with dplyr, data visualization with ggplot2, file organization with UNIX/Linux shell, version control with Git and GitHub, and reproducible document preparation with Quarto and knitr. It includes additional material on data.table, locales, and accessing data through APIs. The book is divided into four parts: R, Data Visualization, Data Wrangling, and Productivity Tools. Each part has several chapters meant to be presented as one lecture and includes dozens of exercises.
The second book (Introduction to Data Science: Statistics and Prediction Algorithms Through Case Studies) teaches data science as a way of thinking statistically, not just as a collection of computational tools. Building on the topics covered in Introduction to Data Science: Data Wrangling and Visualization with R, this book is designed for students with some programming experience and basic mathematical maturity, this book builds the foundations of probability, statistical inference, regression, high-dimensional data analysis, and machine learning through real data examples and reproducible R code. It is suitable for one-semester course in advanced data science.
More details
Series
Edition
2nd edition
Language
English
Place of publication
United Kingdom
Publishing group
Taylor & Francis Ltd
Target group
College/higher education
Postgraduate, Undergraduate Advanced, and Undergraduate Core
Illustrations
254 farbige Abbildungen, 61 Farbfotos bzw. farbige Rasterbilder, 158 s/w Zeichnungen, 193 farbige Zeichnungen, 14 s/w Tabellen, 158 s/w Abbildungen
14 Tables, black and white; 193 Line drawings, color; 158 Line drawings, black and white; 61 Halftones, color; 254 Illustrations, color; 158 Illustrations, black and white
Dimensions
Height: 254 mm
Width: 178 mm
Weight
453 gr
ISBN-13
978-1-032-51938-8 (9781032519388)
Copyright in bibliographic data is held by Nielsen Book Services Limited or its licensors: all rights reserved.
Schweitzer Classification
Person
Rafael A. Irizarry is Professor and Chair of the Department of Data Science at Dana-Farber Cancer Institute and Professor of Applied Statistics at Harvard. His research focuses on Genomics and he has taught several Data Science courses.
Content
Vol 1 Preface Acknowledgements Introduction Part 1: R 1. Getting started 2. R basics 3. Programming basics 4. The tidyverse 5. data.table 6. Importing data
Part 2: Data Visualization 7. Visualizing data distributions 8. ggplot2 9. Data visualization principles 10. Data visualization in practice
Part 3: Data Wrangling 11. Reshaping data 12. Joining tables 13. Parsing dates and times 14. Locales 15. Extracting data from the web 16. String processing 17. Text analysis
Part 4: Productivity Tools 18. Organizing with Unix 19. Git and GitHub 20. Reproducible projects
Vol 2 Distributions Numerical Summaries Comparing Groups Connecting Data and Probability Discrete Probability Continuous Probability Random Variables Sampling Models and the Central Limit Theorem Estimates and Confidence Intervals Data-Driven Models Bayesian Statistics Hierarchical Models Hypothesis Testing Bootstrap Introduction to Regression The Linear Model Framework Treatment Effect Models Generalized Linear Models Association Is Not Causation Multivariable Regression Working with Matrices in R Applied Linear Algebra Dimension Reduction Regularization Latent Factor Models Notation and Terminology Performance Metrics Conditional Expectations and Smoothing Resampling and Model Assessment Supervised Learning Methods Building Machine Learning Models Unsupervised Learning: Clustering
Part 2: Data Visualization 7. Visualizing data distributions 8. ggplot2 9. Data visualization principles 10. Data visualization in practice
Part 3: Data Wrangling 11. Reshaping data 12. Joining tables 13. Parsing dates and times 14. Locales 15. Extracting data from the web 16. String processing 17. Text analysis
Part 4: Productivity Tools 18. Organizing with Unix 19. Git and GitHub 20. Reproducible projects
Vol 2 Distributions Numerical Summaries Comparing Groups Connecting Data and Probability Discrete Probability Continuous Probability Random Variables Sampling Models and the Central Limit Theorem Estimates and Confidence Intervals Data-Driven Models Bayesian Statistics Hierarchical Models Hypothesis Testing Bootstrap Introduction to Regression The Linear Model Framework Treatment Effect Models Generalized Linear Models Association Is Not Causation Multivariable Regression Working with Matrices in R Applied Linear Algebra Dimension Reduction Regularization Latent Factor Models Notation and Terminology Performance Metrics Conditional Expectations and Smoothing Resampling and Model Assessment Supervised Learning Methods Building Machine Learning Models Unsupervised Learning: Clustering