
Doing Data Science
Beschreibung
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Weitere Details
Weitere Ausgaben
Inhalt
- Intro
- Copyright
- Table of Contents
- Preface
- Motivation
- Origins of the Class
- Origins of the Book
- What to Expect from This Book
- How This Book Is Organized
- How to Read This Book
- How Code Is Used in This Book
- Who This Book Is For
- Prerequisites
- Supplemental Reading
- About the Contributors
- Conventions Used in This Book
- Using Code Examples
- Safari® Books Online
- How to Contact Us
- Acknowledgments
- Chapter 1. Introduction: What Is Data Science?
- Big Data and Data Science Hype
- Getting Past the Hype
- Why Now?
- Datafication
- The Current Landscape (with a Little History)
- Data Science Jobs
- A Data Science Profile
- Thought Experiment: Meta-Definition
- OK, So What Is a Data Scientist, Really?
- In Academia
- In Industry
- Chapter 2. Statistical Inference, Exploratory Data Analysis, and the Data Science Process
- Statistical Thinking in the Age of Big Data
- Statistical Inference
- Populations and Samples
- Populations and Samples of Big Data
- Big Data Can Mean Big Assumptions
- Modeling
- Exploratory Data Analysis
- Philosophy of Exploratory Data Analysis
- Exercise: EDA
- The Data Science Process
- A Data Scientist's Role in This Process
- Thought Experiment: How Would You Simulate Chaos?
- Case Study: RealDirect
- How Does RealDirect Make Money?
- Exercise: RealDirect Data Strategy
- Chapter 3. Algorithms
- Machine Learning Algorithms
- Three Basic Algorithms
- Linear Regression
- k-Nearest Neighbors (k-NN)
- k-means
- Exercise: Basic Machine Learning Algorithms
- Solutions
- Summing It All Up
- Thought Experiment: Automated Statistician
- Chapter 4. Spam Filters, Naive Bayes, and Wrangling
- Thought Experiment: Learning by Example
- Why Won't Linear Regression Work for Filtering Spam?
- How About k-nearest Neighbors?
- Naive Bayes
- Bayes Law
- A Spam Filter for Individual Words
- A Spam Filter That Combines Words: Naive Bayes
- Fancy It Up: Laplace Smoothing
- Comparing Naive Bayes to k-NN
- Sample Code in bash
- Scraping the Web: APIs and Other Tools
- Jake's Exercise: Naive Bayes for Article Classification
- Sample R Code for Dealing with the NYT API
- Chapter 5. Logistic Regression
- Thought Experiments
- Classifiers
- Runtime
- You
- Interpretability
- Scalability
- M6D Logistic Regression Case Study
- Click Models
- The Underlying Math
- Estimating a and ß
- Newton's Method
- Stochastic Gradient Descent
- Implementation
- Evaluation
- Media 6 Degrees Exercise
- Sample R Code
- Chapter 6. Time Stamps and Financial Modeling
- Kyle Teague and GetGlue
- Timestamps
- Exploratory Data Analysis (EDA)
- Metrics and New Variables or Features
- What's Next?
- Cathy O'Neil
- Thought Experiment
- Financial Modeling
- In-Sample, Out-of-Sample, and Causality
- Preparing Financial Data
- Log Returns
- Example: The S&P Index
- Working out a Volatility Measurement
- Exponential Downweighting
- The Financial Modeling Feedback Loop
- Why Regression?
- Adding Priors
- A Baby Model
- Exercise: GetGlue and Timestamped Event Data
- Exercise: Financial Data
- Chapter 7. Extracting Meaning from Data
- William Cukierski
- Background: Data Science Competitions
- Background: Crowdsourcing
- The Kaggle Model
- A Single Contestant
- Their Customers
- Thought Experiment: What Are the Ethical Implications of a Robo-Grader?
- Feature Selection
- Example: User Retention
- Filters
- Wrappers
- Embedded Methods: Decision Trees
- Entropy
- The Decision Tree Algorithm
- Handling Continuous Variables in Decision Trees
- Random Forests
- User Retention: Interpretability Versus Predictive Power
- David Huffaker: Google's Hybrid Approach to Social Research
- Moving from Descriptive to Predictive
- Social at Google
- Privacy
- Thought Experiment: What Is the Best Way to Decrease Concern and Increase Understanding and Control?
- Chapter 8. Recommendation Engines: Building a User-Facing Data Product at Scale
- A Real-World Recommendation Engine
- Nearest Neighbor Algorithm Review
- Some Problems with Nearest Neighbors
- Beyond Nearest Neighbor: Machine Learning Classification
- The Dimensionality Problem
- Singular Value Decomposition (SVD)
- Important Properties of SVD
- Principal Component Analysis (PCA)
- Alternating Least Squares
- Fix V and Update U
- Last Thoughts on These Algorithms
- Thought Experiment: Filter Bubbles
- Exercise: Build Your Own Recommendation System
- Sample Code in Python
- Chapter 9. Data Visualization and Fraud Detection
- Data Visualization History
- Gabriel Tarde
- Mark's Thought Experiment
- What Is Data Science, Redux?
- Processing
- Franco Moretti
- A Sample of Data Visualization Projects
- Mark's Data Visualization Projects
- New York Times Lobby: Moveable Type
- Project Cascade: Lives on a Screen
- Cronkite Plaza
- eBay Transactions and Books
- Public Theater Shakespeare Machine
- Goals of These Exhibits
- Data Science and Risk
- About Square
- The Risk Challenge
- The Trouble with Performance Estimation
- Model Building Tips
- Data Visualization at Square
- Ian's Thought Experiment
- Data Visualization for the Rest of Us
- Data Visualization Exercise
- Chapter 10. Social Networks and Data Journalism
- Social Network Analysis at Morning Analytics
- Case-Attribute Data versus Social Network Data
- Social Network Analysis
- Terminology from Social Networks
- Centrality Measures
- The Industry of Centrality Measures
- Thought Experiment
- Morningside Analytics
- How Visualizations Help Us Find Schools of Fish
- More Background on Social Network Analysis from a Statistical Point of View
- Representations of Networks and Eigenvalue Centrality
- A First Example of Random Graphs: The Erdos-Renyi Model
- A Second Example of Random Graphs: The Exponential Random Graph Model
- Data Journalism
- A Bit of History on Data Journalism
- Writing Technical Journalism: Advice from an Expert
- Chapter 11. Causality
- Correlation Doesn't Imply Causation
- Asking Causal Questions
- Confounders: A Dating Example
- OK Cupid's Attempt
- The Gold Standard: Randomized Clinical Trials
- A/B Tests
- Second Best: Observational Studies
- Simpson's Paradox
- The Rubin Causal Model
- Visualizing Causality
- Definition: The Causal Effect
- Three Pieces of Advice
- Chapter 12. Epidemiology
- Madigan's Background
- Thought Experiment
- Modern Academic Statistics
- Medical Literature and Observational Studies
- Stratification Does Not Solve the Confounder Problem
- What Do People Do About Confounding Things in Practice?
- Is There a Better Way?
- Research Experiment (Observational Medical Outcomes Partnership)
- Closing Thought Experiment
- Chapter 13. Lessons Learned from Data Competitions: Data Leakage and Model Evaluation
- Claudia's Data Scientist Profile
- The Life of a Chief Data Scientist
- On Being a Female Data Scientist
- Data Mining Competitions
- How to Be a Good Modeler
- Data Leakage
- Market Predictions
- Amazon Case Study: Big Spenders
- A Jewelry Sampling Problem
- IBM Customer Targeting
- Breast Cancer Detection
- Pneumonia Prediction
- How to Avoid Leakage
- Evaluating Models
- Accuracy: Meh
- Probabilities Matter, Not 0s and 1s
- Choosing an Algorithm
- A Final Example
- Parting Thoughts
- Chapter 14. Data Engineering: MapReduce, Pregel, and Hadoop
- About David Crawshaw
- Thought Experiment
- MapReduce
- Word Frequency Problem
- Enter MapReduce
- Other Examples of MapReduce
- What Can't MapReduce Do?
- Pregel
- About Josh Wills
- Thought Experiment
- On Being a Data Scientist
- Data Abundance Versus Data Scarcity
- Designing Models
- Economic Interlude: Hadoop
- A Brief Introduction to Hadoop
- Cloudera
- Back to Josh: Workflow
- So How to Get Started with Hadoop?
- Chapter 15. The Students Speak
- Process Thinking
- Naive No Longer
- Helping Hands
- Your Mileage May Vary
- Bridging Tunnels
- Some of Our Work
- Chapter 16. Next-Generation Data Scientists, Hubris, and Ethics
- What Just Happened?
- What Is Data Science (Again)?
- What Are Next-Gen Data Scientists?
- Being Problem Solvers
- Cultivating Soft Skills
- Being Question Askers
- Being an Ethical Data Scientist
- Career Advice
- Index
- About the Authors
Systemvoraussetzungen
Dateiformat: ePUB
Kopierschutz: Adobe-DRM (Digital Rights Management)
Systemvoraussetzungen:
- Computer (Windows; MacOS X; Linux): Installieren Sie bereits vor dem Download die kostenlose Software Adobe Digital Editions (siehe E-Book Hilfe).
- Tablet/Smartphone (Android; iOS): Installieren Sie bereits vor dem Download die kostenlose App Adobe Digital Editions oder die App PocketBook (siehe E-Book Hilfe).
- E-Book-Reader: Bookeen, Kobo, Pocketbook, Sony, Tolino u.v.a.m. (nicht Kindle)
Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet – also für „fließenden” Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an.
Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.
Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!
Weitere Informationen finden Sie in unserer E-Book Hilfe.