
The Data Warehouse Toolkit
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Persons
Content
Introduction xxvii
1 Data Warehousing, Business Intelligence, and Dimensional Modeling Primer 1
Different Worlds of Data Capture and Data Analysis 2
Goals of Data Warehousing and Business Intelligence 3
Dimensional Modeling Introduction 7
Kimball's DW/BI Architecture 18
Alternative DW/BI Architectures 26
Dimensional Modeling Myths 30
More Reasons to Think Dimensionally 32
Agile Considerations 34
Summary 35
2 Kimball Dimensional Modeling Techniques Overview 37
Fundamental Concepts 37
Basic Fact Table Techniques 41
Basic Dimension Table Techniques 46
Integration via Conformed Dimensions 50
Dealing with Slowly Changing Dimension Attributes 53
Dealing with Dimension Hierarchies 56
Advanced Fact Table Techniques 58
Advanced Dimension Techniques 62
Special Purpose Schemas 67
3 Retail Sales 69
Four-Step Dimensional Design Process 70
Retail Case Study 72
Dimension Table Details 79
Retail Schema in Action 94
Retail Schema Extensibility 95
Factless Fact Tables 97
Dimension and Fact Table Keys 98
Resisting Normalization Urges 104
Summary 109
4 Inventory 111
Value Chain Introduction 111
Inventory Models 112
Fact Table Types 119
Value Chain Integration 122
Enterprise Data Warehouse Bus Architecture 123
Conformed Dimensions 130
Conformed Facts 138
Summary 139
5 Procurement 141
Procurement Case Study 141
Procurement Transactions and Bus Matrix 142
Slowly Changing Dimension Basics 147
Hybrid Slowly Changing Dimension Techniques 159
Slowly Changing Dimension Recap 164
Summary 165
6 Order Management 167
Order Management Bus Matrix 168
Order Transactions 168
Invoice Transactions 187
Accumulating Snapshot for Order Fulfillment Pipeline 194
Summary 199
7 Accounting 201
Accounting Case Study and Bus Matrix 202
General Ledger Data 203
Budgeting Process 210
Dimension Attribute Hierarchies 214
Consolidated Fact Tables 224
Role of OLAP and Packaged Analytic Solutions 226
Summary 227
8 Customer Relationship Management 229
CRM Overview 230
Customer Dimension Attributes 233
Bridge Tables for Multivalued Dimensions 245
Complex Customer Behavior 249
Customer Data Integration Approaches 256
Low Latency Reality Check 260
Summary 261
9 Human Resources Management 263
Employee Profile Tracking 263
Headcount Periodic Snapshot 267
Bus Matrix for HR Processes 268
Packaged Analytic Solutions and Data Models 270
Recursive Employee Hierarchies 271
Multivalued Skill Keyword Attributes 274
Survey Questionnaire Data 277
Summary 279
10 Financial Services 281
Banking Case Study and Bus Matrix 282
Dimension Triage to Avoid Too Few Dimensions 283
Supertype and Subtype Schemas for Heterogeneous Products 293
Hot Swappable Dimensions 296
Summary 296
11 Telecommunications 297
Telecommunications Case Study and Bus Matrix 297
General Design Review Considerations 299
Design Review Guidelines 304
Draft Design Exercise Discussion 306
Remodeling Existing Data Structures 309
Geographic Location Dimension 310
Summary 310
12 Transportation 311
Airline Case Study and Bus Matrix 311
Extensions to Other Industries 317
Combining Correlated Dimensions 318
More Date and Time Considerations 321
Localization Recap 324
Summary 324
13 Education 325
University Case Study and Bus Matrix 325
Accumulating Snapshot Fact Tables 326
Factless Fact Tables 329
More Educational Analytic Opportunities 336
Summary 336
14 Healthcare 339
Healthcare Case Study and Bus Matrix 339
Claims Billing and Payments 342
Electronic Medical Records 348
Facility/Equipment Inventory Utilization 351
Dealing with Retroactive Changes 351
Summary 352
15 Electronic Commerce 353
Clickstream Source Data 353
Clickstream Dimensional Models 357
Integrating Clickstream into Web Retailer's Bus Matrix 368
Profitability Across Channels Including Web 370
Summary 373
16 Insurance 375
Insurance Case Study 376
Policy Transactions 379
Premium Periodic Snapshot 385
More Insurance Case Study Background 388
Claim Transactions 390
Claim Accumulating Snapshot 392
Policy/Claim Consolidated Periodic Snapshot 395
Factless Accident Events 396
Common Dimensional Modeling Mistakes to Avoid 397
Summary 401
17 Kimball DW/BI Lifecycle Overview 403
Lifecycle Roadmap 404
Lifecycle Launch Activities 406
Lifecycle Technology Track 416
Lifecycle Data Track 420
Lifecycle BI Applications Track 422
Lifecycle Wrap-up Activities 424
Common Pitfalls to Avoid 426
Summary 427
18 Dimensional Modeling Process and Tasks 429
Modeling Process Overview 429
Get Organized 431
Design the Dimensional Model 434
Summary 441
19 ETL Subsystems and Techniques 443
Round Up the Requirements 444
The 34 Subsystems of ETL 449
Extracting: Getting Data into the Data Warehouse 450
Cleaning and Conforming Data 455
Delivering: Prepare for Presentation 463
Managing the ETL Environment 483
Summary 496
20 ETL System Design and Development Process and Tasks 497
ETL Process Overview 497
Develop the ETL Plan 498
Develop One-Time Historic Load Processing 503
Develop Incremental ETL Processing 512
Real-Time Implications 520
Summary 526
21 Big Data Analytics 527
Big Data Overview 527
Recommended Best Practices for Big Data 531
Summary 542
Index 543
The data warehousing and business intelligence (DW/BI) industry certainly has matured since Ralph Kimball published the first edition of The Data Warehouse Toolkit (Wiley) in 1996. Although large corporate early adopters paved the way, DW/BI has since been embraced by organizations of all sizes. The industry has built thousands of DW/BI systems. The volume of data continues to grow as warehouses are populated with increasingly atomic data and updated with greater frequency. Over the course of our careers, we have seen databases grow from megabytes to gigabytes to terabytes to petabytes, yet the basic challenge of DW/BI systems has remained remarkably constant. Our job is to marshal an organization's data and bring it to business users for their decision making. Collectively, you've delivered on this objective; business professionals everywhere are making better decisions and generating payback on their DW/BI investments.
Since the first edition of The Data Warehouse Toolkit was published, dimensional modeling has been broadly accepted as the dominant technique for DW/BI presentation. Practitioners and pundits alike have recognized that the presentation of data must be grounded in simplicity if it is to stand any chance of success. Simplicity is the fundamental key that allows users to easily understand databases and software to efficiently navigate databases. In many ways, dimensional modeling amounts to holding the fort against assaults on simplicity. By consistently returning to a business-driven perspective and by refusing to compromise on the goals of user understandability and query performance, you establish a coherent design that serves the organization's analytic needs. This dimensionally modeled framework becomes the platform for BI. Based on our experience and the overwhelming feedback from numerous practitioners from companies like your own, we believe that dimensional modeling is absolutely critical to a successful DW/BI initiative.
Dimensional modeling also has emerged as the leading architecture for building integrated DW/BI systems. When you use the conformed dimensions and conformed facts of a set of dimensional models, you have a practical and predictable framework for incrementally building complex DW/BI systems that are inherently distributed.
For all that has changed in our industry, the core dimensional modeling techniques that Ralph Kimball published 17 years ago have withstood the test of time. Concepts such as conformed dimensions, slowly changing dimensions, heterogeneous products, factless fact tables, and the enterprise data warehouse bus matrix continue to be discussed in design workshops around the globe. The original concepts have been embellished and enhanced by new and complementary techniques. We decided to publish this third edition of Kimball's seminal work because we felt that it would be useful to summarize our collective dimensional modeling experience under a single cover. We have each focused exclusively on decision support, data warehousing, and business intelligence for more than three decades. We want to share the dimensional modeling patterns that have emerged repeatedly during the course of our careers. This book is loaded with specific, practical design recommendations based on real-world scenarios.
The goal of this book is to provide a one-stop shop for dimensional modeling techniques. True to its title, it is a toolkit of dimensional design principles and techniques. We address the needs of those just starting in dimensional DW/BI and we describe advanced concepts for those of you who have been at this a while. We believe that this book stands alone in its depth of coverage on the topic of dimensional modeling. It's the definitive guide.
This book is intended for data warehouse and business intelligence designers, implementers, and managers. In addition, business analysts and data stewards who are active participants in a DW/BI initiative will find the content useful.
Even if you're not directly responsible for the dimensional model, we believe it is important for all members of a project team to be comfortable with dimensional modeling concepts. The dimensional model has an impact on most aspects of a DW/BI implementation, beginning with the translation of business requirements, through the extract, transformation and load (ETL) processes, and finally, to the unveiling of a data warehouse through business intelligence applications. Due to the broad implications, you need to be conversant in dimensional modeling regardless of whether you are responsible primarily for project management, business analysis, data architecture, database design, ETL, BI applications, or education and support. We've written this book so it is accessible to a broad audience.
For those of you who have read the earlier editions of this book, some of the familiar case studies will reappear in this edition; however, they have been updated significantly and fleshed out with richer content, including sample enterprise data warehouse bus matrices for nearly every case study. We have developed vignettes for new subject areas, including big data analytics.
The content in this book is somewhat technical. We primarily discuss dimensional modeling in the context of a relational database with nuances for online analytical processing (OLAP) cubes noted where appropriate. We presume you have basic knowledge of relational database concepts such as tables, rows, keys, and joins. Given we will be discussing dimensional models in a nondenominational manner, we won't dive into specific physical design and tuning guidance for any given database management systems.
The book is organized around a series of business vignettes or case studies. We believe developing the design techniques by example is an extremely effective approach because it allows us to share very tangible guidance and the benefits of real world experience. Although not intended to be full-scale application or industry solutions, these examples serve as a framework to discuss the patterns that emerge in dimensional modeling. In our experience, it is often easier to grasp the main elements of a design technique by stepping away from the all-too-familiar complexities of one's own business. Readers of the earlier editions have responded very favorably to this approach.
Be forewarned that we deviate from the case study approach in Chapter 2: Kimball Dimensional Modeling Techniques Overview. Given the broad industry acceptance of the dimensional modeling techniques invented by the Kimball Group, we have consolidated the official listing of our techniques, along with concise descriptions and pointers to more detailed coverage and illustrations of these techniques in subsequent chapters. Although not intended to be read from start to finish like the other chapters, we feel this technique-centric chapter is a useful reference and can even serve as a professional checklist for DW/BI designers.
With the exception of Chapter 2, the other chapters of this book build on one another. We start with basic concepts and introduce more advanced content as the book unfolds. The chapters should be read in order by every reader. For example, it might be difficult to comprehend Chapter 16: Insurance, unless you have read the preceding chapters on retailing, procurement, order management, and customer relationship management.
Those of you who have read the last edition may be tempted to skip the first few chapters. Although some of the early fact and dimension grounding may be familiar turf, we don't want you to sprint too far ahead. You'll miss out on updates to fundamental concepts if you skip ahead too quickly.
This book is laced with tips (like this note), key concept listings, and chapter pointers to make it more useful and easily referenced in the future.
Chapter 1: Data Warehousing, Business Intelligence, and Dimensional Modeling Primer
The book begins with a primer on data warehousing, business intelligence, and dimensional modeling. We explore the components of the overall DW/BI architecture and establish the core vocabulary used during the remainder of the book. Some of the myths and misconceptions about dimensional modeling are dispelled.
Chapter 2: Kimball Dimensional ModelingTechniques Overview
This chapter describes more than 75 dimensional modeling techniques and patterns. This official listing of the Kimball techniques includes forward pointers to subsequent chapters where the techniques are brought to life in case study vignettes.
Chapter 3: Retail Sales
Retailing is the classic example used to illustrate dimensional modeling. We start with the classic because it is one that we all understand. Hopefully, you won't need to think very hard about the industry because we want you to focus on core dimensional modeling concepts instead. We begin by discussing the four-step process for designing dimensional models. We explore dimension tables in depth, including the date dimension that will be reused repeatedly throughout the book. We also discuss degenerate dimensions, snowflaking, and surrogate keys. Even if you're not a retailer, this chapter is required reading because it is chock full of fundamentals.
Chapter 4: Inventory
We remain within the retail industry for the second case study but turn your attention to another business process. This chapter introduces the enterprise data warehouse bus architecture and the...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.