
Deciphering Data Architectures
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Data fabric, data lakehouse, and data mesh have recently appeared as viable alternatives to the modern data warehouse. These new architectures have solid benefits, but they''re also surrounded by a lot of hyperbole and confusion. This practical book provides a guided tour of these architectures to help data professionals understand the pros and cons of each. James Serra, big data and data warehousing solution architect at Microsoft, examines common data architecture concepts, including how data warehouses have had to evolve to work with data lake features. You''ll learn what data lakehouses can help you achieve, as well as how to distinguish data mesh hype from reality. Best of all, you''ll be able to determine the most appropriate data architecture for your needs. With this book, you''ll:
- Gain a working understanding of several data architectures
- Learn the strengths and weaknesses of each approach
- Distinguish data architecture theory from reality
- Pick the best architecture for your use case
- Understand the differences between data warehouses and data lakes
- Learn common data architecture concepts to help you build better solutions
- Explore the historical evolution and characteristics of data architectures
- Learn essentials of running an architecture design session, team organization, and project success factors
Free from product discussions, this book will serve as a timeless resource for years to come.
More details
Other editions
Additional editions

Content
- Cover
- Copyright
- Table of Contents
- Foreword
- Preface
- Conventions Used in This Book
- O'Reilly Online Learning
- How to Contact Us
- Acknowledgments
- Part I. Foundation
- Chapter 1. Big Data
- What Is Big Data, and How Can It Help You?
- Data Maturity
- Stage 1: Reactive
- Stage 2: Informative
- Stage 3: Predictive
- Stage 4: Transformative
- Self-Service Business Intelligence
- Summary
- Chapter 2. Types of Data Architectures
- Evolution of Data Architectures
- Relational Data Warehouse
- Data Lake
- Modern Data Warehouse
- Data Fabric
- Data Lakehouse
- Data Mesh
- Summary
- Chapter 3. The Architecture Design Session
- What Is an ADS?
- Why Hold an ADS?
- Before the ADS
- Preparing
- Inviting Participants
- Conducting the ADS
- Introductions
- Discovery
- Whiteboarding
- After the ADS
- Tips for Conducting an ADS
- Summary
- Part II. Common Data Architecture Concepts
- Chapter 4. The Relational Data Warehouse
- What Is a Relational Data Warehouse?
- What a Data Warehouse Is Not
- The Top-Down Approach
- Why Use a Relational Data Warehouse?
- Drawbacks to Using a Relational Data Warehouse
- Populating a Data Warehouse
- How Often to Extract the Data
- Extraction Methods
- How to Determine What Data Has Changed Since the Last Extraction
- The Death of the Relational Data Warehouse Has Been Greatly Exaggerated
- Summary
- Chapter 5. Data Lake
- What Is a Data Lake?
- Why Use a Data Lake?
- Bottom-Up Approach
- Best Practices for Data Lake Design
- Multiple Data Lakes
- Advantages
- Disadvantages
- Summary
- Chapter 6. Data Storage Solutions and Processes
- Data Storage Solutions
- Data Marts
- Operational Data Stores
- Data Hubs
- Data Processes
- Master Data Management
- Data Virtualization and Data Federation
- Data Catalogs
- Data Marketplaces
- Summary
- Chapter 7. Approaches to Design
- Online Transaction Processing Versus Online Analytical Processing
- Operational and Analytical Data
- Symmetric Multiprocessing and Massively Parallel Processing
- Lambda Architecture
- Kappa Architecture
- Polyglot Persistence and Polyglot Data Stores
- Summary
- Chapter 8. Approaches to Data Modeling
- Relational Modeling
- Keys
- Entity-Relationship Diagrams
- Normalization Rules and Forms
- Tracking Changes
- Dimensional Modeling
- Facts, Dimensions, and Keys
- Tracking Changes
- Denormalization
- Common Data Model
- Data Vault
- The Kimball and Inmon Data Warehousing Methodologies
- Inmon's Top-Down Methodology
- Kimball's Bottom-Up Methodology
- Choosing a Methodology
- Hybrid Models
- Methodology Myths
- Summary
- Chapter 9. Approaches to Data Ingestion
- ETL Versus ELT
- Reverse ETL
- Batch Processing Versus Real-Time Processing
- Batch Processing Pros and Cons
- Real-Time Processing Pros and Cons
- Data Governance
- Summary
- Part III. Data Architectures
- Chapter 10. The Modern Data Warehouse
- The MDW Architecture
- Pros and Cons of the MDW Architecture
- Combining the RDW and Data Lake
- Data Lake
- Relational Data Warehouse
- Stepping Stones to the MDW
- EDW Augmentation
- Temporary Data Lake Plus EDW
- All-in-One
- Case Study: Wilson & Gunkerk's Strategic Shift to an MDW
- Challenge
- Solution
- Outcome
- Summary
- Chapter 11. Data Fabric
- The Data Fabric Architecture
- Data Access Policies
- Metadata Catalog
- Master Data Management
- Data Virtualization
- Real-Time Processing
- APIs
- Services
- Products
- Why Transition from an MDW to a Data Fabric Architecture?
- Potential Drawbacks
- Summary
- Chapter 12. Data Lakehouse
- Delta Lake Features
- Performance Improvements
- The Data Lakehouse Architecture
- What If You Skip the Relational Data Warehouse?
- Relational Serving Layer
- Summary
- Chapter 13. Data Mesh Foundation
- A Decentralized Data Architecture
- Data Mesh Hype
- Dehghani's Four Principles of Data Mesh
- Principle #1: Domain Ownership
- Principle #2: Data as a Product
- Principle #3: Self-Serve Data Infrastructure as a Platform
- Principle #4: Federated Computational Governance
- The "Pure" Data Mesh
- Data Domains
- Data Mesh Logical Architecture
- Different Topologies
- Data Mesh Versus Data Fabric
- Use Cases
- Summary
- Chapter 14. Should You Adopt Data Mesh? Myths, Concerns, and the Future
- Myths
- Myth: Using Data Mesh Is a Silver Bullet That Solves All Data Challenges Quickly
- Myth: A Data Mesh Will Replace Your Data Lake and Data Warehouse
- Myth: Data Warehouse Projects Are All Failing, and a Data Mesh Will Solve That Problem
- Myth: Building a Data Mesh Means Decentralizing Absolutely Everything
- Myth: You Can Use Data Virtualization to Create a Data Mesh
- Concerns
- Philosophical and Conceptual Matters
- Combining Data in a Decentralized Environment
- Other Issues of Decentralization
- Complexity
- Duplication
- Feasibility
- People
- Domain-Level Barriers
- Organizational Assessment: Should You Adopt a Data Mesh?
- Recommendations for Implementing a Successful Data Mesh
- The Future of Data Mesh
- Zooming Out: Understanding Data Architectures and Their Applications
- Summary
- Part IV. People, Processes, and Technology
- Chapter 15. People and Processes
- Team Organization: Roles and Responsibilities
- Roles for MDW, Data Fabric, or Data Lakehouse
- Roles for Data Mesh
- Why Projects Fail: Pitfalls and Prevention
- Pitfall: Allowing Executives to Think That BI Is "Easy"
- Pitfall: Using the Wrong Technologies
- Pitfall: Gathering Too Many Business Requirements
- Pitfall: Gathering Too Few Business Requirements
- Pitfall: Presenting Reports Without Validating Their Contents First
- Pitfall: Hiring an Inexperienced Consulting Company
- Pitfall: Hiring a Consulting Company That Outsources Development to Offshore Workers
- Pitfall: Passing Project Ownership Off to Consultants
- Pitfall: Neglecting the Need to Transfer Knowledge Back into the Organization
- Pitfall: Slashing the Budget Midway Through the Project
- Pitfall: Starting with an End Date and Working Backward
- Pitfall: Structuring the Data Warehouse to Reflect the Source Data Rather Than the Business's Needs
- Pitfall: Presenting End Users with a Solution with Slow Response Times or Other Performance Issues
- Pitfall: Overdesigning (or Underdesigning) Your Data Architecture
- Pitfall: Poor Communication Between IT and the Business Domains
- Tips for Success
- Don't Skimp on Your Investment
- Involve Users, Show Them Results, and Get Them Excited
- Add Value to New Reports and Dashboards
- Ask End Users to Build a Prototype
- Find a Project Champion/Sponsor
- Make a Project Plan That Aims for 80% Efficiency
- Summary
- Chapter 16. Technologies
- Choosing a Platform
- Open Source Solutions
- On-Premises Solutions
- Cloud Provider Solutions
- Cloud Service Models
- Major Cloud Providers
- Multi-Cloud Solutions
- Software Frameworks
- Hadoop
- Databricks
- Snowflake
- Summary
- Index
- About the Author
- Colophon
System requirements
File format: PDF
Copy-Protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our eBook Help page.