From a review of the first edition: "Modern Data Science with R... is rich with examples and is guided by a strong narrative voice. What's more, it presents an organizing framework that makes a convincing argument that data science is a course distinct from applied statistics" (The American Statistician).
Modern Data Science with R is a comprehensive data science textbook for undergraduates that incorporates statistical and computational thinking to solve real-world data problems. Rather than focus exclusively on case studies or programming syntax, this book illustrates how statistical programming in the state-of-the-art R/RStudio computing environment can be leveraged to extract meaningful information from a variety of data in the service of addressing compelling questions.
The second edition is updated to reflect the growing influence of the tidyverse set of packages. All code in the book has been revised and styled to be more readable and easier to understand. New functionality from packages like sf, purrr, tidymodels, and tidytext is now integrated into the text. All chapters have been revised, and several have been split, re-organized, or re-imagined to meet the shifting landscape of best practice.
Copyright in bibliographic data and cover images is held by Nielsen Book Services Limited or by the publishers or by their respective licensors: all rights reserved.
weitere Ausgaben werden ermittelt
Benjamin S. Baumer is an associate professor in the Statistical & Data Sciences program at Smith College. He has been a practicing data scientist since 2004, when he became the first full-time statistical analyst for the New York Mets. Ben is a co-author of The Sabermetric Revolution and Analyzing Baseball Data with R. He received the 2019 Waller Education Award and the 2016 Significant Contributor Award from the Society for American Baseball Research.
Daniel T. Kaplan is the DeWitt Wallace emeritus professor of mathematics and computer science at Macalester College. He is the author of several textbooks on statistical modeling and statistical computing. Danny received the 2006 Macalester Excellence in Teaching award and the 2017 CAUSE Lifetime Achievement Award.
Nicholas J. Horton is Beitzel Professor of Technology and Society (Statistics and Data Science) at Amherst College. He is a Fellow of the ASA and the AAAS, co-chair of the National Academies Committee on Applied and Theoretical Statistics, recipient of a number of national teaching awards, author of a series of books on statistical computing, and actively involved in data science curriculum efforts to help students "think with data".
Background and motivation
Key features of this book
Changes in the second edition
Key role of technology
How to use this book
I Part I: Introduction to Data Science
1. Prologue: Why data science?
What is data science?
Case study: The evolution of sabermetrics
2. Data visualization
The federal election cycle
Composing data graphics
Importance of data graphics: Challenger
Creating effective presentations
The wider world of data visualization
3. A grammar for graphics
A grammar for data graphics
Canonical data graphics in R
Extended example: Historical baby names
4. Data wrangling on one table
A grammar for data wrangling
Extended example: Ben's time with the Mets
5. Data wrangling on multiple tables
Extended example: Manny Ramirez
6. Tidy data
Using across() with dplyr functions
The map() family of functions
Iterating over a one-dimensional vector
Iteration over subgroups
Extended example: Factors associated with BMI
8. Data Science Ethics
Role of data science in society
Some settings for professional ethics
Some principles to guide ethical action
Data and disclosure
Professional guidelines for ethical conduct
II Part II: Statistics and Modeling
9. Statistical foundations
Samples and populations
Statistical models: Explaining variation
Confounding and accounting for other factors
The perils of p-values
10. Predictive modeling
Simple classification models
Extended example: Who has diabetes?
11. Supervised learning
Example: Evaluation of income models redux
Extended example: Who has diabetes this time?
12. Unsupervised learning
Reasoning in reverse
Extended example: Grouping cancers
Key principles of simulation
III Part III: Topics in Data Science
14. Dynamic and customized data graphics
Rich Web content using Djs and htmlwidgets
Interactive Web apps with Shiny
Customization of library(ggplot)ggplot graphics
Extended example: Hot dog eating
15. Database querying using SQL
From dplyr to SQL
The SQL universe
The SQL data manipulation language
Extended example: FiveThirtyEight flights
SQL vs R
16. Database administration
Constructing efficient SQL databases
Changing SQL data
Extended example: Building a database
17. Working with geospatial data
Motivation: What's so great about geospatial data?
Spatial data structures
Extended example: Congressional districts
Effective maps: How (not) to lie
Playing well with others
18. Geospatial computations
Extended example: Trail elevations at MacLeish
19. Text as data
Regular expressions using Macbeth
Extended example: Analyzing textual data from arXivorg
20. Network science
Introduction to network science
Extended example: Six degrees of Kristen Stewart
Extended example: men's college basketball
21. Epilogue: Towards "big data"
Notions of big data
Tools for bigger data
Alternatives to R
IV Part IV: Appendices
A Packages used in this book
The mdsr package
B Introduction to R and RStudio
Fundamental structures and objects
C Algorithmic thinking
Extended example: Law of large numbers
Debugging and defensive coding
D Reproducible analysis and workflow
Scriptable statistical computing
Reproducible analysis with R Markdown
Projects and version control
E Regression modeling
Inference for regression
Assumptions underlying regression
F Setting up a database server
Connecting to SQL
"This text continues to be fantastic! There are a number of courses for which I would require this book and others that I would recommend it as a supplement. I would likely require it for courses focused on computing in R or courses in data science. I would include it as a recommended text in introductory and other statistics courses that used R as the software of choice, where this text could be used as a supplemental resource in how to use R to work with data." (Hunter Glanz Cal Poly San Luis Obispo)
"Easy for students to read and relate to the exercises and examples. Many questions and hands-on activities with data sets to practice skills." (Lynn Collen, St. Cloud Stat Univ.)
"I used the first edition of this book as the primary text for an intermediate data science course a few years ago and I liked it very much...I think that the technical breadth, writing style, and level of difficulty are very clear strengths. Also, my students and I found the `tidyverse` approach to be particularly well-suited for teaching and learning R...and I love that the MDSR book includes such complete code. Students can program everything they see in the book, and often times there are tips & tricks for them to discover along the way just by studying expert code provided by the authors. This really sets MDSR apart from other books I considered for the course." (Matthew Beckman, Penn State University)
Dewey Decimal Classfication (DDC)
ohne DRM (Digital Rights Management)Systemvoraussetzungen:
Computer (Windows; MacOS X; Linux): Verwenden Sie eine Lese-Software, die das Dateiformat EPUB verarbeiten kann: z.B. Adobe Digital Editions oder FBReader - beide kostenlos (siehe E-Book Hilfe).
Tablet/Smartphone (Android; iOS): Installieren Sie bereits vor dem Download die kostenlose App Adobe Digital Editions (siehe E-Book Hilfe).
E-Book-Reader: Bookeen, Kobo, Pocketbook, Sony, Tolino u.v.a.m. (nicht Kindle)
Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet - also für "glatten" Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Ein Kopierschutz bzw. Digital Rights Management wird bei diesem E-Book nicht eingesetzt.