
Smarter Data Science
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Organizations can make data science a repeatable, predictable tool, which business professionals use to get more value from their data
Enterprise data and AI projects are often scattershot, underbaked, siloed, and not adaptable to predictable business changes. As a result, the vast majority fail. These expensive quagmires can be avoided, and this book explains precisely how.
Data science is emerging as a hands-on tool for not just data scientists, but business professionals as well. Managers, directors, IT leaders, and analysts must expand their use of data science capabilities for the organization to stay competitive. Smarter Data Science helps them achieve their enterprise-grade data projects and AI goals. It serves as a guide to building a robust and comprehensive information architecture program that enables sustainable and scalable AI deployments.
When an organization manages its data effectively, its data science program becomes a fully scalable function that's both prescriptive and repeatable. With an understanding of data science principles, practitioners are also empowered to lead their organizations in establishing and deploying viable AI. They employ the tools of machine learning, deep learning, and AI to extract greater value from data for the benefit of the enterprise.
By following a ladder framework that promotes prescriptive capabilities, organizations can make data science accessible to a range of team members, democratizing data science throughout the organization. Companies that collect, organize, and analyze data can move forward to additional data science achievements:
- Improving time-to-value with infused AI models for common use cases
- Optimizing knowledge work and business processes
- Utilizing AI-based business intelligence and data visualization
- Establishing a data topology to support general or highly specialized needs
- Successfully completing AI projects in a predictable manner
- Coordinating the use of AI from any compute node. From inner edges to outer edges: cloud, fog, and mist computing
When they climb the ladder presented in this book, businesspeople and data scientists alike will be able to improve and foster repeatable capabilities. They will have the knowledge to maximize their AI and data assets for the benefit of their organizations.
More details
Other editions
Additional editions


Persons
NEAL FISHMAN is a Distinguished Engineer and CTO of Data-Based Pathology at IBM. He is an IBM-certified Senior IT Architect and Open Group Distinguished Chief Architect.
COLE STRYKER is a journalist based in Los Angeles. He is the author of Epic Win for Anonymous and Hacking the Future.
Content
Foreword for Smarter Data Science xix
Epigraph xxi
Preamble xxiii
Chapter 1 Climbing the AI Ladder 1
Readying Data for AI 2
Technology Focus Areas 3
Taking the Ladder Rung by Rung 4
Constantly Adapt to Retain Organizational Relevance 8
Data-Based Reasoning is Part and Parcel in the Modern Business 10
Toward the AI-Centric Organization 14
Summary 16
Chapter 2 Framing Part I: Considerations for Organizations Using AI 17
Data-Driven Decision-Making 18
Using Interrogatives to Gain Insight 19
The Trust Matrix 20
The Importance of Metrics and Human Insight 22
Democratizing Data and Data Science 23
Aye, a Prerequisite: Organizing Data Must Be a Forethought 26
Preventing Design Pitfalls 27
Facilitating the Winds of Change: How Organized Data Facilitates Reaction Time 29
Quae Quaestio (Question Everything) 30
Summary 32
Chapter 3 Framing Part II: Considerations for Working with Data and AI 35
Personalizing the Data Experience for Every User 36
Context Counts: Choosing the Right Way to Display Data 38
Ethnography: Improving Understanding Through Specialized Data 42
Data Governance and Data Quality 43
The Value of Decomposing Data 43
Providing Structure Through Data Governance 43
Curating Data for Training 45
Additional Considerations for Creating Value 45
Ontologies: A Means for Encapsulating Knowledge 46
Fairness, Trust, and Transparency in AI Outcomes 49
Accessible, Accurate, Curated, and Organized 52
Summary 54
Chapter 4 A Look Back on Analytics: More Than One Hammer 57
Been Here Before: Reviewing the Enterprise Data Warehouse 57
Drawbacks of the Traditional Data Warehouse 64
Paradigm Shift 68
Modern Analytical Environments: The Data Lake 69
By Contrast 71
Indigenous Data 72
Attributes of Difference 73
Elements of the Data Lake 75
The New Normal: Big Data is Now Normal Data 77
Liberation from the Rigidity of a Single Data Model 78
Streaming Data 78
Suitable Tools for the Task 78
Easier Accessibility 79
Reducing Costs 79
Scalability 79
Data Management and Data Governance for AI 80
Schema-on-Read vs. Schema-on-Write 81
Summary 84
Chapter 5 A Look Forward on Analytics: Not Everything Can Be a Nail 87
A Need for Organization 87
The Staging Zone 90
The Raw Zone 91
The Discovery and Exploration Zone 92
The Aligned Zone 93
The Harmonized Zone 98
The Curated Zone 100
Data Topologies 100
Zone Map 103
Data Pipelines 104
Data Topography 105
Expanding, Adding, Moving, and Removing Zones 107
Enabling the Zones 108
Ingestion 108
Data Governance 111
Data Storage and Retention 112
Data Processing 114
Data Access 116
Management and Monitoring 117
Metadata 118
Summary 119
Chapter 6 Addressing Operational Disciplines on the AI Ladder 121
A Passage of Time 122
Create 128
Stability 128
Barriers 129
Complexity 129
Execute 130
Ingestion 131
Visibility 132
Compliance 132
Operate 133
Quality 134
Reliance 135
Reusability 135
The xOps Trifecta: DevOps/MLOps, DataOps, and AIOps 136
DevOps/MLOps 137
DataOps 139
AIOps 142
Summary 144
Chapter 7 Maximizing the Use of Your Data: Being Value Driven 147
Toward a Value Chain 148
Chaining Through Correlation 152
Enabling Action 154
Expanding the Means to Act 155
Curation 156
Data Governance 159
Integrated Data Management 162
Onboarding 163
Organizing 164
Cataloging 166
Metadata 167
Preparing 168
Provisioning 169
Multi-Tenancy 170
Summary 173
Chapter 8 Valuing Data with Statistical Analysis and Enabling Meaningful Access 175
Deriving Value: Managing Data as an Asset 175
An Inexact Science 180
Accessibility to Data: Not All Users are Equal 183
Providing Self-Service to Data 184
Access: The Importance of Adding Controls 186
Ranking Datasets Using a Bottom-Up Approach for Data Governance 187
How Various Industries Use Data and AI 188
Benefi ting from Statistics 189
Summary 198
Chapter 9 Constructing for the Long-Term 199
The Need to Change Habits: Avoiding Hard-Coding 200
Overloading 201
Locked In 202
Ownership and Decomposition 204
Design to Avoid Change 204
Extending the Value of Data Through AI 206
Polyglot Persistence 208
Benefi ting from Data Literacy 213
Understanding a Topic 215
Skillsets 216
It's All Metadata 218
The Right Data, in the Right Context, with the Right Interface 219
Summary 221
Chapter 10 A Journey's End: An IA for AI 223
Development Efforts for AI 224
Essential Elements: Cloud-Based Computing, Data, and Analytics 228
Intersections: Compute Capacity and Storage Capacity 234
Analytic Intensity 237
Interoperability Across the Elements 238
Data Pipeline Flight Paths: Preflight, Inflight, Postflight 242
Data Management for the Data Puddle, Data Pond, and Data Lake 243
Driving Action: Context, Content, and Decision-Makers 245
Keep It Simple 248
The Silo is Dead; Long Live the Silo 250
Taxonomy: Organizing Data Zones 252
Capabilities for an Open Platform 256
Summary 260
Appendix Glossary of Terms 263
Index 269
Praise For This Book
The authors have obviously explored the paths toward an efficient information architecture. There is value in learning from their experience. If you have responsibility for or influence over how your organization uses artificial intelligence you will find Smarter Data Science an invaluable read. It is noteworthy that the book is written with a sense of scope that lends to its credibility. So much written about AI technologies today seems to assume a technical vacuum. We are not all working in startups! We have legacy technology that needs to be considered. The authors have created an excellent resource that acknowledges that enterprise context is a nuanced and important problem. The ideas are presented in a logical and clear format that is suitable to the technologist as well as the businessperson.
Christopher Smith, Chief Knowledge Management and Innovation Officer, Sullivan & Cromwell, LLC
It has been always been a pleasure to learn from Neal. The stories and examples that urge every business to stay "relevant" served to provide my own source of motivation. The concepts presented in this book helped to resolve issues that I have been having to address. This book teaches almost all aspects of the data industry. The experiences, patterns, and anti-patterns, are thoroughly explained. This work provides benefit to a variety of roles, including architects, developers, product owners, and business executives. For organizations exploring AI, this book is the cornerstone to becoming successful.
Harry Xuegang Huang Ph.D., External Consultant, A.P. Moller - Maersk (Denmark)
This is by far one of the best and most refreshing books on AI and data science that I have come across. The authors seek and speak the truth and they penetrate into the core of the challenge most organizations face in finding value in their data: moving focus away from a tendency to connect the winning dots by 'magical' technologies and overly simplified methods. The book is laid out in a well-considered and mature approach that is grounded in deliberation, pragmatism, and respect for information. By following the authors' advice, you will unlock true and long-term value and avoid the many pitfalls that fashionistas and false prophets have come to dominate the narrative in AI.
Jan Gravesen, M.Sc., IBM Distinguished Engineer, Director and Chief Technology Officer, IBM
Most of the books on data analytics and data science focus on tools and techniques of the discipline and do not provide the reader with a complete framework to plan and implement projects that solve business problems and foster competitive advantage. Just because machine learning and new methodologies learn from data and do not require a preconceived model for analysis does not eliminate the need for a robust information management program and required processes. In Smarter Data Science, the authors present a holistic model that emphasizes how critical data and data management are in implementing successful value-driven data analytics and AI solutions. The book presents an elegant and novel approach to data management and explores its various layers and dimensions (from data creation/ownership and governance to quality and trust) as a key component of a well-integrated methodology for value-adding data sciences and AI. The book covers the components of an agile approach to data management and information architecture that fosters business innovation and can adapt to ever changing requirements and priorities. The many examples of recent data challenges facing diverse businesses make the book extremely readable and relevant for practical applications. This is an excellent book for both data officers and data scientists to gain deep insights into the fundamental relationship between data management, analytics, machine learning, and AI.
Ali Farahani, Ph.D., Former Chief Data Officer, County of Los Angeles; Adjunct Associate Professor, USC
There are many different approaches to gaining insights with data given the new advances in technology today. This book encompasses more than the technology that makes AI and machine learning possible, but truly depicts the process and foundation needed to prepare that data to make AI consumable and actionable. I thoroughly enjoyed the section on data governance and the importance of accessible, accurate, curated, and organized data for any sort of analytics consumption. The significance and differences in zones and preparation of data also has some fantastic points that should be highly considered in any sort of analytics project. The authors' ability to describe best practices from a true journey of data within an organization in relation to business needs and information outcomes is spot on. I would highly recommend this book to anyone learning, playing, or working in the wonderful space of Data & AI.
Phil Black, VP of Client Services for Data and AI, TechD
The authors have pieced together data governance, data architecture, data topologies, and data science in a perfect way. Their observations and approach have paved the way towards achieving a flexible and sustainable environment for advanced analytics. I am adopting these techniques in building my own analytics platform for our company.
Svetlana Grigoryeva, Manager Data Services and AI, Shearman and Sterling
This book is a delight to read and provides many thought-provoking ideas. This book is a great resource for data scientists, and everyone who is involved with large scale, enterprise-wide AI initiatives.
Simon Seow, Managing Director, Info Spec Sdn Bhd (Malaysia)
Having worked in IT as a Vice president at MasterCard and as a Global Director at GM, I learned long ago about the importance of finding and listening to the best people. Here, the authors have brought a unique and novel voice that resonates with verve about how to be successful with data science at an enterprise scale. With the explosive growth of big data, computer power, cheap sensor technology, and the awe-inspiring breakthroughs with AI, Smarter Data Science also instills in us that without a solid information architecture, we may fall short in our work with AI.
Glen Birrell, Executive IT Consultant
In the 21st century the ability to use metadata to empower cross-industry ecosystems and exploit a hierarchy of AI algorithms will be essential to maximize stakeholder value. Today's data science processes and systems simply don't offer enough speed, flexibility, quality or context to enable that. Smarter Data Science is a very useful book as it provides concrete steps towards wisdom within those intelligent enterprises.
Richard Hopkins, President, Academy of Technology, IBM (UK)
A must read for everyone who curates, manages, or makes decisions on data. Lifts a lot of the mystery and magical thinking out of "Data Science" to explain why we're underachieving on the promise of AI. Full of practical ideas for improving the practice of information architecture for modern analytical environments using AI or ML. Highly recommended.
Linda Nadeau, Information Architect, Metaphor Consulting LLC
In this book, the authors "unpack" the meaning of data as a natural resource for the modern corporation. Following on Neal's previous book that explored the role of data in enterprise transformation, the authors construct and lead the reader through a holistic approach to drive business value with data science. This book examines data, analytics, and the AI value chain across several industries describing specific use and business cases. This book is a must read for Chief Data Officers as well as accomplished or inspiring data scientists in any industry.
Boris Vishnevsky, Principal, Complex Solutions and Cyber Security, Slalom; Adjunct Professor, TJU
As an architect working with clients on highly complex projects, all of my new projects involve vast amounts of data, distributed sources of data, cloud-based technologies, and data science. This book is invaluable for my real-world enterprise scale practice. The anticipated risks, complexities, and the rewards of infusing AI is laid out in a well-organized manner that is easy to comprehend taking the reader out of the scholastic endeavor of fact-based learning and into the real world of data science. I would highly recommend this book to anyone wanting to be meaningfully involved with data science.
John Aviles, Federal CTO Technical Lead, IBM
I hold over 150 patents and work as a data scientist on creating some of the most complex AI business projects, and this book has been of immense value to me as a field guide. The authors have established the need as to why IA must be part of a systematic maturing approach to AI. I regard this book as a "next generation AI guidebook" that your organization can't afford to be without.
Gandhi Sivakumar, Chief Architect and Master Inventor, IBM (Australia)
A seminal treatment for how enterprises must leverage AI. The authors provide a clear and understandable path forward for using AI across cloud, fog, and mist computing. A must read for any serious data scientist and data manager.
Raul Shneir, Director, Israel National Cyber Directorate (Israel)
As a professor at Wharton who teaches data science I often mention to my students about emerging new analytical tools such as AI that can provide valuable information to business decision makers. I also encourage them to keep abreast of such tools. Smarter Data Science will definitely make my recommended readings list. It articulates clearly how an organization can build a successful...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.