
Responsible Data Science
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
The increasing popularity of data science has resulted in numerous well-publicized cases of bias, injustice, and discrimination. The widespread deployment of "Black box" algorithms that are difficult or impossible to understand and explain, even for their developers, is a primary source of these unanticipated harms, making modern techniques and methods for manipulating large data sets seem sinister, even dangerous. When put in the hands of authoritarian governments, these algorithms have enabled suppression of political dissent and persecution of minorities. To prevent these harms, data scientists everywhere must come to understand how the algorithms that they build and deploy may harm certain groups or be unfair.
Responsible Data Science delivers a comprehensive, practical treatment of how to implement data science solutions in an even-handed and ethical manner that minimizes the risk of undue harm to vulnerable members of society. Both data science practitioners and managers of analytics teams will learn how to:
* Improve model transparency, even for black box models
* Diagnose bias and unfairness within models using multiple metrics
* Audit projects to ensure fairness and minimize the possibility of unintended harm
Perfect for data science practitioners, Responsible Data Science will also earn a spot on the bookshelves of technically inclined managers, software developers, and statisticians.
More details
Other editions
Additional editions

Persons
PETER BRUCE is the Senior Learning Officer at Elder Research, Inc., author of several best-selling texts on data science, and Founder of the Institute for Statistics Education at Statistics.com, an Elder Research Company.
Content
- Cover
- Title Page
- Copyright Page
- About the Authors
- About the Technical Editor
- Acknowledgments
- Contents at a Glance
- Contents
- Introduction
- What This Book Covers
- Who Will Benefit Most from This Book
- Looking Ahead in This Book
- Special Features
- Code Repository
- Chapter 1 Responsible Data Science
- The Optum Disaster
- Jekyll and Hyde
- Eugenics
- Galton, Pearson, and Fisher
- Ties between Eugenics and Statistics
- Ethical Problems in Data Science Today
- Predictive Models
- From Explaining to Predicting
- Predictive Modeling
- Setting the Stage for Ethical Issues to Arise
- Classic Statistical Models
- Black-Box Methods
- Important Concepts in Predictive Modeling
- Feature Selection
- Model-Centric vs. Data-Centric Models
- Holdout Sample and Cross-Validation
- Overfitting
- Unsupervised Learning
- The Ethical Challenge of Black Boxes
- Two Opposing Forces
- Pressure for More Powerful AI
- Public Resistance and Anxiety
- Summary
- Chapter 2 Background: Modeling and the Black-Box Algorithm
- Assessing Model Performance
- Predicting Class Membership
- The Rare Class Problem
- Lift and Gains
- Area Under the Curve
- AUC vs. Lift (Gains)
- Predicting Numeric Values
- Goodness-of-Fit
- Holdout Sets and Cross-Validation
- Optimization and Loss Functions
- Intrinsically Interpretable Models vs. Black-Box Models
- Ethical Challenges with Interpretable Models
- Black-Box Models
- Ensembles
- Nearest Neighbors
- Clustering
- Association Rules
- Collaborative Filters
- Artificial Neural Nets and Deep Neural Nets
- Problems with Black-Box Predictive Models
- Problems with Unsupervised Algorithms
- Summary
- Chapter 3 The Ways AI Goes Wrong, and the Legal Implications
- AI and Intentional Consequences by Design
- Deepfakes
- Supporting State Surveillance and Suppression
- Behavioral Manipulation
- Automated Testing to Fine-Tune Targeting
- AI and Unintended Consequences
- Healthcare
- Finance
- Law Enforcement
- Technology
- The Legal and Regulatory Landscape around AI
- Ignorance Is No Defense: AI in the Context of Existing Law and Policy
- A Finger in the Dam: Data Rights, Data Privacy, and Consumer Protection Regulations
- Trends in Emerging Law and Policy Related to AI
- Summary
- Part 2 The Responsible Data Science Process
- Chapter 4 The Responsible Data Science Framework
- Why We Keep Building Harmful AI
- Misguided Need for Cutting-Edge Models
- Excessive Focus on Predictive Performance
- Ease of Access and the Curse of Simplicity
- The Common Cause
- The Face Thieves
- An Anatomy of Modeling Harms
- The World: Context Matters for Modeling
- The Data: Representation Is Everything
- The Model: Garbage In, Danger Out
- Model Interpretability: Human Understanding for Superhuman Models
- Efforts Toward a More Responsible Data Science
- Principles Are the Focus
- Nonmaleficence
- Fairness
- Transparency
- Accountability
- Privacy
- Bridging the Gap Between Principles and Practice with the Responsible Data Science (RDS) Framework
- Justification
- Compilation
- Preparation
- Modeling
- Auditing
- Summary
- Chapter 5 Model Interpretability: The What and the Why
- The Sexist Résumé Screener
- The Necessity of Model Interpretability
- Connections Between Predictive Performance and Interpretability
- Uniting (High) Model Performance and Model Interpretability
- Categories of Interpretability Methods
- Global Methods
- Local Methods
- Real-World Successes of Interpretability Methods
- Facilitating Debugging and Audit
- Leveraging the Improved Performance of Black-Box Models
- Acquiring New Knowledge
- Addressing Critiques of Interpretability Methods
- Explanations Generated by Interpretability Methods Are Not Robust
- Explanations Generated by Interpretability Methods Are Low Fidelity
- The Forking Paths of Model Interpretability
- The Four-Measure Baseline
- Building Our Own Credit Scoring Model
- Using Train-Test Splits
- Feature Selection and Feature Engineering
- Baseline Models
- The Importance of Making Your Code Work for Everyone
- Execution Variability
- Addressing Execution Variability with Functionalized Code
- Stochastic Variability
- Addressing Stochastic Variability via Resampling
- Summary
- Part 3 RDS in Practice
- Chapter 6 Beginning a Responsible Data Science Project
- How the Responsible Data Science Framework Addresses the Common Cause
- Datasets Used
- Regression Datasets-Communities and Crime
- Classification Datasets-COMPAS
- Common Elements Across Our Analyses
- Project Structure and Documentation
- Project Structure for the Responsible Data Science Framework: Everything in Its Place
- Documentation: The Responsible Thing to Do
- Beginning a Responsible Data Science Project
- Communities and Crime (Regression)
- Justification
- Compilation
- Identifying Protected Classes
- Preparation-Data Splitting and Feature Engineering
- Datasheets
- COMPAS (Classification)
- Justification
- Identifying Protected Classes
- Preparation
- Summary
- Chapter 7 Auditing a Responsible Data Science Project
- Fairness and Data Science in Practice
- The Many Different Conceptions of Fairness
- Different Forms of Fairness Are Trade-Offs with Each Other
- Quantifying Predictive Fairness Within a Data Science Project
- Mitigating Bias to Improve Fairness
- Preprocessing
- In-processing
- Postprocessing
- Classification Example: COMPAS
- Prework: Code Practices, Modeling, and Auditing
- Justification, Compilation, and Preparation Review
- Modeling
- Auditing
- Per-Group Metrics: Overall
- Per-Group Metrics: Error
- Fairness Metrics
- Interpreting Our Models: Why Are They Unfair?
- Analysis for Different Groups
- Bias Mitigation
- Preprocessing: Oversampling
- Postprocessing: Optimizing Thresholds Automatically
- Postprocessing: Optimizing Thresholds Manually
- Summary
- Chapter 8 Auditing for Neural Networks
- Why Neural Networks Merit Their Own Chapter
- Neural Networks Vary Greatly in Structure
- Neural Networks Treat Features Differently
- Neural Networks Repeat Themselves
- A More Impenetrable Black Box
- Baseline Methods
- Representation Methods
- Distillation Methods
- Intrinsic Methods
- Beginning a Responsible Neural Network Project
- Justification
- Moving Forward
- Compilation
- Tracking Experiments
- Preparation
- Modeling
- Auditing
- Per-Group Metrics: Overall
- Per-Group Metrics: Unusual Definitions of "False Positive"
- Fairness Metrics
- Interpreting Our Models: Why Are They Unfair?
- Bias Mitigation
- Wrap-Up
- Auditing Neural Networks for Natural Language Processing
- Identifying and Addressing Sources of Bias in NLP
- The Real World
- Data
- Models
- Model Interpretability
- Summary
- Chapter 9 Conclusion
- How Can We Do Better?
- The Responsible Data Science Framework
- Doing Better As Managers
- Doing Better As Practitioners
- A Better Future If We Can Keep It
- Index
- EULA
System requirements
File format: PDF
Copy-Protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our eBook Help page.