Responsible Data Science

Name: Responsible Data Science
Brand: Wiley
Price: 25.99 EUR
Availability: OnlineOnly

Grant Fleming Peter C. Bruce(Author)

Wiley (Publisher)

1st Edition

Published on 21. April 2021

304 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-119-74164-0 (ISBN)

€25.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Other editions

Persons

Content

Introduction

In this book, we will review some of the harmful ways artificial intelligence has been used and provide a framework to facilitate the responsible practice of data science. While we will touch upon mitigating legal risks, in this book we will focus primarily on the modeling process itself, especially on how factors overlooked by current modeling practices lead to unintended harms once the model is deployed in a real-world context.

Three core themes will be developed through this book:

Any AI algorithm can have a harmful, dark side: once they are applied in the real world, AI algorithms can cause any number of harms. An algorithm designed to help police catch murderers can later be appropriated by totalitarian states to persecute dissidents; an algorithm that expands the availability of financial credit for the vast majority of people may nonetheless intensify bias against minorities.
The dark sides of AI algorithms are created or deepened by current modeling approaches. By focusing only on technical considerations like maximizing predictive performance, data scientists ignore the potential for their model to aggravate biases against certain groups, generate harmful predictions, or otherwise be used by other groups in the future for malicious purposes.
New modeling approaches are needed if we want to use AI more responsibly. If data scientists and their users are going to continue to use AI algorithms to make consequential decisions, then they ought to do so with consideration for a broader range of technical and societal factors than are normally considered.

New U.S. diplomats in training used to be told "not to give unintentional offense." Our primary goal for this book is to tell you a variant of this: that there are a number of specific actionable steps that you, the reader, can begin taking to reduce the risk of causing unintentional harm with your models.

In particular, this book focuses on how to make models more transparent, interpretable, and fair. It will present illustrations and snippets of code in a way that a technically literate manager or executive can understand, without necessarily knowing any programming language.

What This Book Covers

Chapter 1, "Why Data Science Should Be Ethical," provides historical background for the ethical concerns in statistics and an introduction to basic modeling methods. In Chapter 2, "Background: Modeling and the Black-Box Algorithm," we define various types of predictive models and briefly discuss the concepts of model transparency and model interpretability. Chapter 3, "The Ways AI Goes Wrong, and the Legal Implications," reviews the landscape of the types of ethics and fairness issues encountered in the practice of data science (e.g., legal constraints, privacy and data ownership concerns, and algorithms "gone bad") and finishes by distinguishing interpretable models from black-box models. In Chapter 4, "The Responsible Data Science (RDS) Framework," we discuss the desired characteristics of a Responsible Data Science framework, summarize the attempts by other groups at creating one, and combine the lessons learned from these other groups with those presented in the book up until this point to construct our own framework, the aptly named the Responsible Data Science (RDS) framework. Chapter 5, "Model Interpretability: The What and the Why," prepares the reader for implementing the RDS framework in later chapters by doing a deeper dive into model interpretability and how it can be achieved for black-box models. We begin setting up a responsible data science project within our framework and performing initial checks on two datasets in Chapter 6, "Beginning a Responsible Data Science Project." In Chapters 7, "Auditing a Responsible Data Science Project," and Chapter 8, "Auditing for Neural Networks," we delve into case studies in auditing conventional machine learning models and deep neural networks for failure scenarios, fairness, and interpretability. Finally, we conclude the book in Chapter 9, "Conclusion," with a look to the future and a call to action.

Who Will Benefit Most from This Book

Much has been written elsewhere about the legal issues relevant to AI; thus, our primary audience is not corporate general counsels. Instead, this book is intended for the following two groups:

Data-literate managers and executives
Business-literate data scientists and analysts

Although the focus placed on responsibility in data science is relatively new, many people have been trained in the myriad wonderful things that AI can accomplish. They have also read in the news about the ethical lapses in some AI projects. These lapses are not surprising, because relatively few data scientists are trained in how to adequately understand and control their AI while maintaining high predictive performance in models. Hence, we aim this book at data science managers and executives and at data science practitioners.

Practitioners will learn of the ways in which their models, intended to provide benefits, can at the same time cause harm. They will learn how to leverage fairness metrics, interpretability methods, and other interventions to their model or dataset to audit those models, identifying and mitigating possible issues prior to deployment or result delivery. Through worked examples, the book guides users in structuring their models to have a greater consideration for ethical impacts, while assuring that best practices are followed and model performance is optimized. This is a key differentiator for our book, as most responsible AI frameworks do not provide specific technical recommendations for fulfilling the principles that they lay out.

Managers of data science teams, and managers with any responsibilities in the analytics realm, can use this book to stay alert for the ways in which analytical models can run afoul of ethical practices, and even the law. More importantly, they will learn the language and concepts to engage their analytics teams in the solutions and mitigation steps that we propose. While some code and technical discussion is provided, following it in detail is by no means needed. The overall presentation in the book is at a level that provides managers who are at least somewhat familiar with analytics the ability and tools to instill responsible best practices for data science in their organizations.

Finally, a word to individual data scientists. You may think that your project has no implications in the ethical realm. The real-world context for deployment may seem innocuous, the modeling task may seem harmless, and the content of this book may not seem relevant to your project. Though the ideas and techniques presented in this book are primarily discussed in the context of ethically fraught models, they are still useful as the basis for best practices in other modeling contexts. After all, there is a great degree of overlap between traditional best practices for modeling and best practices for responsible data science. Doing data science more responsibly, in the manner that we lay out in this book, improves understanding of the relationships between a model and its real-world deployment context, improves transparency and accountability through better guidelines for documentation, and reduces the risk of unanticipated biases creeping into models by providing workflows for model auditing. Plus, who knows when that innocuous-sounding project may later turn out to have a dark side?

Looking Ahead in This Book

The responsible practice of data science covers a lot of ground in different dimensions.

Formal legal and regulatory requirements: Clearly, any company or individual developing or implementing data science solutions will want to stay on the right side of the law. The most famous attempt to regulate AI is the GDPR; it runs over 80 pages and is quite detailed. It was developed to meet the demands of a specific point in time, but there is no guarantee that it will be a useful guide in the future. Things change rapidly in the field of AI, and the GDPR is like a boulder placed in the path of a stream-sooner or later, the stream will find ways around the obstacle. There are already a number of publications on this topic, and our audience is not the corporate general counsel but rather the manager and the data science practitioner. So, while this book will touch on key laws in this area, such as the GDPR, it will not do so in great depth.
Bad actors: In many cases, the pernicious use of AI is neither inadvertent nor the result of lack of understanding-it is intentional. Deep learning has been put to malicious use by cyber hacks who can digest and analyze multilayered defense mechanisms to determine quickly where weaknesses lie. When those who are responsible for data science development and implementation have malevolent intentions, a lecture on responsibility and a course on ethics will not have much impact. This book will note countermeasures that can have some effect, but dealing with bad actors, like dealing with regulators, is not the primary focus of this book.
AI out of control: In many cases, those deploying AI are responsible parties, obeying the law, and yet their AI has in some sense "escaped their full control" after deployment. Perhaps it has morphed into something that was not initially intended, or perhaps it has triggered effects and reactions that were...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Responsible Data Science

Description

More details

Other editions

Additional editions

Persons

Content

Introduction

What This Book Covers

Who Will Benefit Most from This Book

Looking Ahead in This Book

System requirements