
Data Science: The Hard Parts
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
This practical guide provides a collection of techniques and best practices that are generally overlooked in most data engineering and data science pedagogy. A common misconception is that great data scientists are experts in the "big themes" of the discipline?machine learning and programming. But most of the time, these tools can only take us so far. In practice, the smaller tools and skills really separate a great data scientist from a not-so-great one.
Taken as a whole, the lessons in this book make the difference between an average data scientist candidate and a qualified data scientist working in the field. Author Daniel Vaughan has collected, extended, and used these skills to create value and train data scientists from different companies and industries.
With this book, you will:
- Understand how data science creates value
- Deliver compelling narratives to sell your data science project
- Build a business case using unit economics principles
- Create new features for a ML model using storytelling
- Learn how to decompose KPIs
- Perform growth decompositions to find root causes for changes in a metric
Daniel Vaughan is head of data at Clip, the leading paytech company in Mexico. He''s the author of Analytical Skills for AI and Data Science (O''Reilly).
More details
Other editions
Additional editions

Content
- Cover
- Copyright
- Table of Contents
- Preface
- Conventions Used in This Book
- Using Code Examples
- O'Reilly Online Learning
- How to Contact Us
- Acknowledgments
- Part I. Data Analytics Techniques
- Chapter 1. So What? Creating Value with Data Science
- What Is Value?
- What: Understanding the Business
- So What: The Gist of Value Creation in DS
- Now What: Be a Go-Getter
- Measuring Value
- Key Takeaways
- Further Reading
- Chapter 2. Metrics Design
- Desirable Properties That Metrics Should Have
- Measurable
- Actionable
- Relevance
- Timeliness
- Metrics Decomposition
- Funnel Analytics
- Stock-Flow Decompositions
- P×Q-Type Decompositions
- Example: Another Revenue Decomposition
- Example: Marketplaces
- Key Takeaways
- Further Reading
- Chapter 3. Growth Decompositions: Understanding Tailwinds and Headwinds
- Why Growth Decompositions?
- Additive Decomposition
- Example
- Interpretation and Use Cases
- Multiplicative Decomposition
- Example
- Interpretation
- Mix-Rate Decompositions
- Example
- Interpretation
- Mathematical Derivations
- Additive Decomposition
- Multiplicative Decomposition
- Mix-Rate Decomposition
- Key Takeaways
- Further Reading
- Chapter 4. 2×2 Designs
- The Case for Simplification
- What's a 2×2 Design?
- Example: Test a Model and a New Feature
- Example: Understanding User Behavior
- Example: Credit Origination and Acceptance
- Example: Prioritizing Your Workflow
- Key Takeaways
- Further Reading
- Chapter 5. Building Business Cases
- Some Principles to Construct Business Cases
- Example: Proactive Retention Strategy
- Fraud Prevention
- Purchasing External Datasets
- Working on a Data Science Project
- Key Takeaways
- Further Reading
- Chapter 6. What's in a Lift?
- Lifts Defined
- Example: Classifier Model
- Self-Selection and Survivorship Biases
- Other Use Cases for Lifts
- Key Takeaways
- Further Reading
- Chapter 7. Narratives
- What's in a Narrative: Telling a Story with Your Data
- Clear and to the Point
- Credible
- Memorable
- Actionable
- Building a Narrative
- Science as Storytelling
- What, So What, and Now What?
- The Last Mile
- Writing TL
- DRs
- Tips to Write Memorable TL
- DRs
- Example: Writing a TL
- DR for This Chapter
- Delivering Powerful Elevator Pitches
- Presenting Your Narrative
- Key Takeaways
- Further Reading
- Chapter 8. Datavis: Choosing the Right Plot to Deliver a Message
- Some Useful and Not-So-Used Data Visualizations
- Bar Versus Line Plots
- Slopegraphs
- Waterfall Charts
- Scatterplot Smoothers
- Plotting Distributions
- General Recommendations
- Find the Right Datavis for Your Message
- Choose Your Colors Wisely
- Different Dimensions in a Plot
- Aim for a Large Enough Data-Ink Ratio
- Customization Versus Semiautomation
- Get the Font Size Right from the Beginning
- Interactive or Not
- Stay Simple
- Start by Explaining the Plot
- Key Takeaways
- Further Reading
- Part II. Machine Learning
- Chapter 9. Simulation and Bootstrapping
- Basics of Simulation
- Simulating a Linear Model and Linear Regression
- What Are Partial Dependence Plots?
- Omitted Variable Bias
- Simulating Classification Problems
- Latent Variable Models
- Comparing Different Algorithms
- Bootstrapping
- Key Takeaways
- Further Reading
- Chapter 10. Linear Regression: Going Back to Basics
- What's in a Coefficient?
- The Frisch-Waugh-Lovell Theorem
- Why Should You Care About FWL?
- Confounders
- Additional Variables
- The Central Role of Variance in ML
- Key Takeaways
- Further Reading
- Chapter 11. Data Leakage
- What Is Data Leakage?
- Outcome Is Also a Feature
- A Function of the Outcome Is Itself a Feature
- Bad Controls
- Mislabeling of a Timestamp
- Multiple Datasets with Sloppy Time Aggregations
- Leakage of Other Information
- Detecting Data Leakage
- Complete Separation
- Windowing Methodology
- Choosing the Length of the Windows
- The Training Stage Mirrors the Scoring Stage
- Implementing the Windowing Methodology
- I Have Leakage: Now What?
- Key Takeaways
- Further Reading
- Chapter 12. Productionizing Models
- What Does "Production Ready" Mean?
- Batch Scores (Offline)
- Real-Time Model Objects
- Data and Model Drift
- Essential Steps in any Production Pipeline
- Get and Transform Data
- Validate Data
- Training and Scoring Stages
- Validate Model and Scores
- Deploy Model and Scores
- Key Takeaways
- Further Reading
- Chapter 13. Storytelling in Machine Learning
- A Holistic View of Storytelling in ML
- Ex Ante and Interim Storytelling
- Creating Hypotheses
- Feature Engineering
- Ex Post Storytelling: Opening the Black Box
- Interpretability-Performance Trade-Off
- Linear Regression: Setting a Benchmark
- Feature Importance
- Heatmaps
- Partial Dependence Plots
- Accumulated Local Effects
- Key Takeaways
- Further Reading
- Chapter 14. From Prediction to Decisions
- Dissecting Decision Making
- Simple Decision Rules by Smart Thresholding
- Precision and Recall
- Example: Lead Generation
- Confusion Matrix Optimization
- Key Takeaways
- Further Reading
- Chapter 15. Incrementality: The Holy Grail of Data Science?
- Defining Incrementality
- Causal Reasoning to Improve Prediction
- Causal Reasoning as a Differentiator
- Improved Decision Making
- Confounders and Colliders
- Selection Bias
- Unconfoundedness Assumption
- Breaking Selection Bias: Randomization
- Matching
- Machine Learning and Causal Inference
- Open Source Codebases
- Double Machine Learning
- Key Takeaways
- Further Reading
- Chapter 16. A/B Tests
- What Is an A/B Test?
- Decision Criterion
- Minimum Detectable Effects
- Choosing the Statistical Power, Level, and P
- Estimating the Variance of the Outcome
- Simulations
- Example: Conversion Rates
- Setting the MDE
- Hypotheses Backlog
- Metric
- Hypothesis
- Ranking
- Governance of Experiments
- Key Takeaways
- Further Reading
- Chapter 17. Large Language Models and the Practice of Data Science
- The Current State of AI
- What Do Data Scientists Do?
- Evolving the Data Scientist's Job Description
- Case Study: A/B Testing
- Case Study: Data Cleansing
- Case Study: Machine Learning
- LLMs and This Book
- Key Takeaways
- Further Reading
- Index
- About the Author
- Colophon
System requirements
File format: PDF
Copy-Protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our eBook Help page.