Competing with High Quality Data

Name: Competing with High Quality Data | Concepts, Tools, and Techniques for Building a Successful Approach to Data Quality
Brand: Wiley
Price: 86.99 EUR
Availability: OnlineOnly

Concepts, Tools, and Techniques for Building a Successful Approach to Data Quality

Rajesh Jugulum(Author)

Wiley (Publisher)

Published on 10. March 2014

304 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-118-41649-5 (ISBN)

€86.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

Create a competitive advantage with data quality Data is rapidly becoming the powerhouse of industry, butlow-quality data can actually put a company at a disadvantage. Tobe used effectively, data must accurately reflect the real-worldscenario it represents, and it must be in a form that is usable andaccessible. Quality data involves asking the right questions,targeting the correct parameters, and having an effective internalmanagement, organization, and access system. It must be relevant,complete, and correct, while falling in line with pervasiveregulatory oversight programs. Competing with High Quality Data: Concepts, Tools andTechniques for Building a Successful Approach to Data Qualitytakes a holistic approach to improving data quality, fromcollection to usage. Author Rajesh Jugulum is globally-recognizedas a major voice in the data quality arena, with high-levelbackgrounds in international corporate finance. In the book,Jugulum provides a roadmap to data quality innovation,covering topics such as: * The four-phase approach to data quality control * Methodology that produces data sets for different aspects of abusiness * Streamlined data quality assessment and issue resolution * A structured, systematic, disciplined approach to effectivedata gathering The book also contains real-world case studies to illustrate howcompanies across a broad range of sectors have employed dataquality systems, whether or not they succeeded, and what lessonswere learned. High-quality data increases value throughout theinformation supply chain, and the benefits extend to the client,employee, and shareholder. Competing with High Quality Data:Concepts, Tools and Techniques for Building a Successful Approachto Data Quality provides the information and guidance necessaryto formulate and activate an effective data quality plan today.

More details

Other editions

Person

Content

Foreword xiii

Prelude xv

Preface xvii

Acknowledgments xix

1 The Importance of Data Quality 1

1.0 Introduction 1

1.1 Understanding the Implications of Data Quality 1

1.2 The Data Management Function 4

1.3 The Solution Strategy 6

1.4 Guide to This Book 6

Section I Building a Data Quality Program 2 The Data Quality Operating Model 13

2.0 Introduction 13

2.1 Data Quality Foundational Capabilities 13

2.1.1 Program Strategy and Governance 14

2.1.2 Skilled Data Quality Resources 14

2.1.3 Technology Infrastructure and Metadata 15

2.1.4 Data Profi ling and Analytics 15

2.1.5 Data Integration 15

2.1.6 Data Assessment 16

2.1.7 Issues Resolution (IR) 16

2.1.8 Data Quality Monitoring and Control 16

2.2 The Data Quality Methodology 17

2.2.1 Establish a Data Quality Program 17

2.2.2 Conduct a Current-State Analysis 17

2.2.3 Strengthen Data Quality Capability through Data Quality Projects 18

2.2.4 Monitor the Ongoing Production Environment and Measure Data Quality Improvement Effectiveness 18

2.2.5 Detailed Discussion on Establishing the Data Quality Program 18

2.2.6 Assess the Current State of Data Quality 21

2.3 Conclusions 22

3 The DAIC Approach 23

3.0 Introduction 23

3.1 Six Sigma Methodologies 23

3.1.1 Development of Six Sigma Methodologies 25

3.2 DAIC Approach for Data Quality 28

3.2.1 The Defi ne Phase 28

3.2.2 The Assess Phase 31

3.2.3 The Improve Phase 36

3.2.4 The Control Phase (Monitor and Measure) 37

3.3 Conclusions 40

Section II Executing a Data Quality Program 4 Quantification of the Impact of Data Quality 43

4.0 Introduction 43

4.1 Building a Data Quality Cost Quantifi cation Framework 43

4.1.1 The Cost Waterfall 44

4.1.2 Prioritization Matrix 46

4.1.3 Remediation and Return on Investment 50

4.2 A Trading Offi ce Illustrative Example 51

4.3 Conclusions 54

5 Statistical Process Control and Its Relevance in Data Quality Monitoring and Reporting 55

5.0 Introduction 55

5.1 What Is Statistical Process Control? 55

5.1.1 Common Causes and Special Causes 57

5.2 Control Charts 59

5.2.1 Different Types of Data 59

5.2.2 Sample and Sample Parameters 60

5.2.3 Construction of Attribute Control Charts 62

5.2.4 Construction of Variable Control Charts 65

5.2.5 Other Control Charts 67

5.2.6 Multivariate Process Control Charts 69

5.3 Relevance of Statistical Process Control in Data Quality Monitoring and Reporting 69

5.4 Conclusions 70

6 Critical Data Elements: Identification, Validation, and Assessment 71

6.0 Introduction 71

6.1 Identifi cation of Critical Data Elements 71

6.1.1 Data Elements and Critical Data Elements 71

6.1.2 CDE Rationalization Matrix 72

6.2 Assessment of Critical Data Elements 75

6.2.1 Data Quality Dimensions 76

6.2.2 Data Quality Business Rules 78

6.2.3 Data Profi ling 79

6.2.4 Measurement of Data Quality Scores 80

6.2.5 Results Recording and Reporting (Scorecard) 80

6.3 Conclusions 82

7 Prioritization of Critical Data Elements (Funnel Approach) 83

7.0 Introduction 83

7.1 The Funnel Methodology (Statistical Analysis for CDE Reduction) 83

7.1.1 Correlation and Regression Analysis for Continuous CDEs 85

7.1.2 Association Analysis for Discrete CDEs 88

7.1.3 Signal-to-Noise Ratios Analysis 90

7.2 Case Study: Basel II 91

7.2.1 Basel II: CDE Rationalization Matrix 91

7.2.2 Basel II: Correlation and Regression Analysis 94

7.2.3 Basel II: Signal-to-Noise (S/N) Ratios 96

7.3 Conclusions 99

8 Data Quality Monitoring and Reporting Scorecards 101

8.0 Introduction 101

8.1 Development of the DQ Scorecards 102

8.2 Analytical Framework (ANOVA, SPCs, Thresholds, Heat Maps) 102

8.2.1 Thresholds and Heat Maps 103

8.2.2 Analysis of Variance (ANOVA) and SPC Charts 107

8.3 Application of the Framework 109

8.4 Conclusions 112

9 Data Quality Issue Resolution 113

9.0 Introduction 113

9.1 Description of the Methodology 113

9.2 Data Quality Methodology 114

9.3 Process Quality/Six Sigma Approach 115

9.4 Case Study: Issue Resolution Process Reengineering 117

9.5 Conclusions 119

10 Information System Testing 121

10.0 Introduction 121

10.1 Typical System Arrangement 122

10.1.1 The Role of Orthogonal Arrays 123

10.2 Method of System Testing 123

10.2.1 Study of Two-Factor Combinations 123

10.2.2 Construction of Combination Tables 124

10.3 MTS Software Testing 126

10.4 Case Study: A Japanese Software Company 130

10.5 Case Study: A Finance Company 133

10.6 Conclusions 138

11 Statistical Approach for Data Tracing 139

11.0 Introduction 139

11.1 Data Tracing Methodology 139

11.1.1 Statistical Sampling 142

11.2 Case Study: Tracing 144

11.2.1 Analysis of Test Cases and CDE Prioritization 144

11.3 Data Lineage through Data Tracing 149

11.4 Conclusions 151

12 Design and Development of Multivariate Diagnostic Systems 153

12.0 Introduction 153

12.1 The Mahalanobis-Taguchi Strategy 153

12.1.1 The Gram Schmidt Orthogonalization Process 155

12.2 Stages in MTS 158

12.3 The Role of Orthogonal Arrays and Signal-to-Noise Ratio in Multivariate Diagnosis 159

12.3.1 The Role of Orthogonal Arrays 159

12.3.2 The Role of S/N Ratios in MTS 161

12.3.3 Types of S/N Ratios 162

12.3.4 Direction of Abnormals 164

12.4 A Medical Diagnosis Example 172

12.5 Case Study: Improving Client Experience 175

12.5.1 Improvements Made Based on Recommendations from MTS Analysis 177

12.6 Case Study: Understanding the Behavior Patterns of Defaulting Customers 178

12.7 Case Study: Marketing 180

12.7.1 Construction of the Reference Group 181

12.7.2 Validation of the Scale 181

12.7.3 Identification of Useful Variables 181

12.8 Case Study: Gear Motor Assembly 182

12.8.1 Apparatus 183

12.8.2 Sensors 184

12.8.3 High-Resolution Encoder 184

12.8.4 Life Test 185

12.8.5 Characterization 185

12.8.6 Construction of the Reference Group or Mahalanobis Space 186

12.8.7 Validation of the MTS Scale 187

12.8.8 Selection of Useful Variables 188

12.9 Conclusions 189

13 Data Analytics 191

13.0 Introduction 191

13.1 Data and Analytics as Key Resources 191

13.1.1 Different Types of Analytics 193

13.1.2 Requirements for Executing Analytics 195

13.1.3 Process of Executing Analytics 196

13.2 Data Innovation 197

13.2.1 Big Data 198

13.2.2 Big Data Analytics 199

13.2.3 Big Data Analytics Operating Model 206

13.2.4 Big Data Analytics Projects: Examples 207

13.3 Conclusions 208

14. Building a Data Quality Practices Center 209

14.0 Introduction 209

14.1 Building a DQPC 209

14.2 Conclusions 211

Appendix A 213

Equations for Signal-to-Noise (S/N) Ratios 213

Nondynamic S/N Ratios 213

Dynamic S/N Ratios 214

Appendix B 217

Matrix Theory: Related Topics 217

What Is a Matrix? 217

Appendix C 221

Some Useful Orthogonal Arrays 221

Two-Level Orthogonal Arrays 221

Three-Level Orthogonal Arrays 255

Index of Terms and Symbols 259

References 261

Index 267

Chapter 1
The Importance of Data Quality

1.0 Introduction

In this introductory chapter, we discuss the importance of data quality (DQ), understanding DQ implications, and the requirements for managing the DQ function. This chapter also sets the stage for the discussions in the other chapters of this book that focus on the building and execution of the DQ program. At the end, this chapter provides a guide to this book, with descriptions of the chapters and how they interrelate.

1.1 Understanding the Implications of Data Quality

Dr. Genichi Taguchi, who was a world-renowned quality engineering expert from Japan, emphasized and established the relationship between poor quality and overall loss. Dr. Taguchi (1987) used a quality loss function (QLF) to measure the loss associated with quality characteristics or parameters. The QLF describes the losses that a system suffers from an adjustable characteristic. According to the QLF, the loss increases as the characteristic y (such as thickness or strength) gets further from the target value (m). In other words, there is a loss associated if the quality characteristic diverges from the target. Taguchi regards this loss as a loss to society, and somebody must pay for this loss. The results of such losses include system breakdowns, company failures, company bankruptcies, and so forth. In this context, everything is considered part of society (customers, organizations, government, etc.).

Figure 1.1 shows how the loss arising from varying (on either side) from the target by Δ0 increases and is given by L(y). When y is equal to m, the loss is zero, or at the minimum. The equation for the loss function can be expressed as follows:

1.1 L(y) = k(y − m)2

Figure 1.1 Quality Loss Function (QLF)

where k is a factor that is expressed in dollars, based on direct costs, indirect costs, warranty costs, reputational costs, loss due to lost customers, and costs associated with rework and rejection. There are prescribed ways to determine the value of k.

The loss function is usually not symmetrical—sometimes it is steep on one side or on both sides. Deming (1960) says that the loss function need not be exact and that it is difficult to obtain the exact function. As most cost calculations are based on estimations or predictions, an approximate function is sufficient—that is, close approximation is good enough.

The concept of the loss function aptly applies in the DQ context, especially when we are measuring data quality associated with various data elements such as customer IDs, social security numbers, and account balances. Usually, the data elements are prioritized based on certain criteria, and the quality levels for data elements are measured in terms of percentages (of accuracy, completeness, etc.). The prioritized data elements are referred to as critical data elements (CDEs).

If the quality levels associated with these CDEs are not at the desired levels, then there is a greater chance of making wrong decisions, which might have adverse impacts on organizations. The adverse impacts may be in the form of losses, as previously described. Since the data quality levels are a “higher-the-better” type of characteristic (because we want to increase the percent levels), only half of Figure 1.1 is applicable when measuring loss due to poor data quality. Figure 1.2 is a better representation of this situation, showing how the loss due to variance from the target by Δ0 increases when the quality levels are lower than m and is given by L(y). In this book, the target value is also referred to as the business specification or threshold.

Figure 1.2 Loss Function for Data Quality Levels (Higher-the-Better Type of Characteristic)

As shown in Figure 1.2, the loss will be at minimum when y attains a level equal to m. This loss will remain at the same level even if the quality levels are greater than m. Therefore, it may be not be necessary to improve the CDE quality levels beyond m, as this improvement will not have any impact on the loss.

Losses due to poor quality can take a variety of forms (English, 2009), such as denying students entry to colleges, customer loan denial, incorrect prescription of medicines, crashing submarines, and inaccurate nutrition labeling on food products. In the financial industry context, consider a situation where a customer is denied a loan on the basis of a bad credit history because the loan application was processed using the wrong social security number. This is a good example of a data quality issue, and we can imagine how such issues can compound, resulting in huge losses to the organizations involved. The Institute of International Finance and McKinsey & Company (2011) cite one of the key factors in the global financial crisis that began in 2007 as inadequate information technology (IT) and data architecture to support the management of financial risk. This highlights the importance of data quality and leads us to conclude that the effect of poor data quality on the financial crisis cannot be ignored. During this crisis, many banks, investment companies, and insurance companies lost billions of dollars, causing some to go bankrupt. The impacts of these events were significant and included economic recession, millions of foreclosures, lost jobs, depletion of retirement funds, and loss of confidence in the industry and in the government.

All the aforementioned impacts can be classified into two categories, as described in Taguchi (1987): losses due to the functional variability of the process and losses due to harmful side effects. Figure 1.3 shows how all the costs in these categories add up.

Figure 1.3 Sources of Societal Losses

In this section, we discussed the importance of data quality and the implications of bad data. It is clear that the impact of bad data is quite significant and that it is important to manage key data resources effectively to minimize overall loss. For this reason, there is a need to establish a dedicated data management function that is responsible for ensuring high data quality levels. Section 1.2 briefly describes the establishment of such a function and its various associated roles.

1.2 The Data Management Function

In some organizations, the data management function is referred to as the chief data office (CDO), and it is responsible for the oversight of various data-related activities. One way of overseeing data-related activities is to separate them into different components such as data governance, data strategies, data standards, and data quality. The data governance component is important because it navigates subsequent data-related activities. This includes drivers such as steering committees, program management aspects, project and change management aspects, compliance with organization requirements, and similar functions. The data strategy component is useful for understanding the data and planning how to use it effectively. The data standards component is responsible for ensuring that the various parties using the data share the same understanding across the organization. This is accomplished by developing standards around various data elements and data models. The data quality component is responsible for cleaning the data and making sure that it is fit for the intended purpose, so it can be used in various decision-making activities. This group should work closely with the data strategy component.

Please note that we are presenting one of the several possible ways of overseeing the data management function, or CDO. The CDO function should work closely with various functions, business units, and technology groups across the organization to ensure that data is interpreted consistently in all functions of the organization and is fit for the intended purposes. An effective CDO function should demonstrate several key attributes, including the following:

Clear leadership and senior management support
Key data-driven objectives
A visual depiction of target areas for prioritization
A tight integration of CDO objectives with company priorities and objectives
A clear benefit to the company upon execution

As this book focuses on data quality, various chapters provide descriptions of the approaches, frameworks, methods, concepts, tools, and techniques that can be used to satisfy the various DQ requirements, including the following:

Developing a DQ standard operating model (DQOM) so that it can be adopted by all DQ projects
Identifying and prioritizing critical data elements
Establishing a DQ monitoring and controlling scheme
Solving DQ issues and performing root-cause analyses (RCAs)
Defining and deploying data tracing and achieving better data lineage
Quantifying the impact of poor data quality

All of these requirements are necessary to ensure that data is fit for its purpose with a high degree of confidence.

Sections 1.3 and 1.4 explain the solution strategy for DQ problems, as well as the organization of this book, with descriptions of the chapters. The main objective of these chapters is that readers should be able to use the concepts, procedures, and tools discussed...

Content (EPUB)

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Competing with High Quality Data

Description

More details

Other editions

Additional editions

Person

Content

Chapter 1
The Importance of Data Quality

1.0 Introduction

1.1 Understanding the Implications of Data Quality

1.2 The Data Management Function

System requirements

Schweitzer Fachinformationen

Competing with High Quality Data

Description

More details

Other editions

Additional editions

Person

Content

Chapter 1 The Importance of Data Quality

1.0 Introduction

1.1 Understanding the Implications of Data Quality

1.2 The Data Management Function

System requirements

Chapter 1
The Importance of Data Quality