
Data Warehousing and Knowledge Discovery
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Content
- Title
- Preface
- Organization
- Table of Contents
- Physical and Conceptual Data Warehouse Models
- ONE: A Predictable and Scalable DW Model
- Introduction
- Related Works
- ONE Storage Model
- Query Processing
- Evaluation
- Conclusions
- References
- The Planning OLAP Model - A Multidimensional Model with Planning Support
- Introduction
- Foundation and Related Work
- Common Planning Functions by Example
- An OLAP Model for Planning
- Basic Planning Operators
- Expressing Typical Planning Functions
- Impact and Conclusion
- References
- Extending the Dimensional Templates Approach to Integrate Complex Multidimensional Design Concepts
- Introduction
- Related Work
- The Dimensional Templates Approach
- Extending the Dimensional Templates Approach
- Time-Related Data
- Many-to-Many Relationships
- Hierarchically Structured Data
- Dealing with Coverage Facts
- Handling Perspective Analysis
- Enhanced DTA's Broadening Scope
- Improving the DTA Generation Algorithm
- The Generation Algorithm Basics
- Further on Satisfied Grain-Goals: Step 3
- The Generation Algorithm: Step 5
- Conclusions
- References
- OLAP Formulations for Supporting Complex Spatial Objects in Data Warehouses
- Introduction
- Related Work
- Modeling Data Cubes with Complex Spatial Data
- Data Model and C3 Constructs
- Spatial OLAP Formulations with the C3 Constructs
- Conclusions and Future Work
- References
- Data Warehousing Design Methodologies and Tools
- Multidimensional Database Design from Document-Centric XML Documents
- Introduction
- Related Works: Design Processes
- Objectives and Contributions
- Overview of the Design Process
- From User Requirement analysis to a Multidimensional Schema
- Collecting Requirements and Building a Requirement Matrix
- Translating Requirements into a Multidimensional Schema
- Confrontation
- Multidimensional Database Implementation
- Conclusion and Future Works
- References
- Modern Software Engineering Methodologies Meet Data Warehouse Design: 4WD
- Introduction
- From Problems to Goals
- From Goals to Principles
- From Principles to Methodology: 4WD
- Incrementality and Risk-Based Iteration
- Prototyping
- User Involvement
- Component Reuse
- Formal and Light Documentation
- Automated Schema Transformation
- Practical Evidences
- Related Literature and Discussion
- References
- GEM: Requirement-Driven Generation of ETL and Multidimensional Conceptual Designs
- Introduction
- GEM in a Nutshell
- Inputs
- System Architecture
- Output
- Requirement Validation
- Requirement Completion
- Multidimensional Validation
- Operation Identification
- Evaluation
- Related Work
- Conclusions
- References
- ETL Methodologies and Tools
- ETLMR: A Highly Scalable Dimensional ETL Framework Based on MapReduce
- Introduction
- Overview
- Dimension Processing
- One Dimension One Task
- One Dimension All Tasks
- Snowflaked Dimension Processing
- Post-fixing
- Offline Dimensions
- Fact Processing
- Implementation and Evaluation
- Experimental Setup
- Test Data
- Scalability of Proposed Processing Methods
- System Scalability
- Comparison with other Data Warehousing Tools
- Related Work
- Conclusion and Future Work
- References
- Complementing Data in the ETL Process
- Introduction
- Background Knowledge
- Related Work
- A Strategy for Data Imputation during the ETL Process
- Attribute Combination Definition
- Training Set Preparation
- Performance Calculation for Attribute Combinations
- Real Imputation
- ComplETL
- Experiments and Results
- Conclusions
- References
- TTL: A Transformation, Transference and Loading Approach for Active Monitoring
- Introduction
- The TTL Approach
- The Application Domain
- Base of Knowledge
- The Pre-analysis
- Results of Implementation
- Conclusions
- References
- Support for User Involvement in Data Cleaning
- Introduction
- Motivation
- Data Cleaning Graphs
- The Notion of Data Cleaning Graph
- Operational Semantics
- Case Study
- Experiments
- Related Work
- Conclusions
- References
- Data Warehouse Performance and Optimization
- Efficient Processing of Drill-across Queries over Geographic Data Warehouses
- Introduction
- Related Work
- Theoretical Foundation
- Efficient Processing of Drill-across SOLAP Queries
- The Proposed GDW Schema
- Classes of Drill-across SOLAP Queries
- The Proposed DrillAcrossSB Approach
- Performance Evaluation
- Experimental Setup
- Performance Results for Dataset DS1
- Performance Results for Dataset DS2
- Conclusions and Future Work
- References
- The NOX OLAP Query Model: From Algebra to Execution
- Introduction
- Related Work
- Preliminary Material
- Conceptual Model
- Native Language Queries
- The Sidera Architecture
- The Sidera Algebra
- Query Optimization
- Selection
- Projection
- Experimental Results
- Conclusions
- References
- VarDB: High-Performance Warehouse Processing with Massive Ordering and Binary Search
- Introduction
- Related Work
- Relevant Architectural Details
- Introduction to Data Processing Mechanism
- Storage Architecture
- Some Implemented Filters
- Experimental Results
- Bulk Load Times
- Queries Execution without Indexex
- Indexes Creation
- Queries Execution with Indexex
- Conclusion
- References
- Data Warehouse Partitioning Techniques
- Vertical Fragmentation of XML Data Warehouses Using Frequent Path Sets
- Introduction
- Background
- XML Warehouse Model
- Problem Formulation
- Fragmentation Approach
- Fragmentation Process
- Pruning the Set of Path Queries
- Grouping the Paths
- Deriving the Fragmentation Schema
- Populating the Fragments
- Homogeneous Fragment.
- Mixed Fragment.
- Experimentation
- Workload
- Results and Discussion
- Conclusion
- References
- Implementing Vertical Splitting for Large Scale Multidimensional Datasets and Its Evaluations
- Introduction
- Employing Extendible Arrays
- HOMD Implementation Model
- Chunked HOMD
- Chunking
- Structure of C-HOMD
- Encoding Records into RDT Keys in C-HOMD
- Retrieval of Records
- Handling Unique Key Columns
- Splitting C-HOMD
- Experimental Evaluations
- Retrieval Cost
- Storage Cost
- Construction Cost
- Related Work
- Conclusion
- References
- Analytics over Large Multidimensional Datasets
- Describing Analytical Sessions Using a Multidimensional Algebra
- Introduction
- Motivation and Basic Concepts
- Related Work
- Obtaining the MAC of an SQL Query
- The Multidimensional Algebra
- Formulating an SQL Query as a MAC
- Interpreting the MAC
- Normalizing the MAC
- Bridging NMACs
- Conclusions
- References
- Tagged MapReduce: Efficiently Computing Multi-analytics Using MapReduce
- Introduction
- Background and Related Work
- MapReduce Overview
- Hadoop Framework
- Pig Latin and ASSET Queries
- Related Work
- Motivation
- Implementation
- TaggedMap and TaggedReduce
- Implementation Alternatives
- Transformation to Hadoop's MapReduce
- Optimizations
- Experiments
- Experimental Setting
- Results
- Conclusions and Future Work
- References
- Pattern Mining
- Frequent Pattern Mining from Time-Fading Streams of Uncertain Data
- Introduction
- Background and Related Work
- Mining from Static Databases of Uncertain Data
- Mining from Uncertain Data Streams with Sliding Windows
- Our Proposed Algorithms
- A Naive Algorithm: TUF-Streaming(Naive)
- A Space-Saving Algorithm: TUF-Streaming(Space)
- A Time-Saving Algorithm: TUF-Streaming(Time)
- An Enhancement Algorithm for the Landmark Model: TUF-Streaming(Space&Time)
- Analytical Evaluation
- Experimental Evaluation
- Conclusions
- References
- SPO-Tree: Efficient Single Pass Ordered Incremental Pattern Mining
- Introduction
- Related Work
- Single Pass Ordered Tree (SPO-Tree)
- Experimental Results
- Real-World Datasets
- Synthetic Datasets
- Conclusions and Future Work
- References
- RP-Tree: Rare Pattern Tree Mining
- Introduction
- Related Work
- Rare Pattern Tree Mining
- Basic Concept: Rare Itemsets
- RP-Tree Algorithm
- RP-Tree with Information Gain
- Experimental Results
- Itemset Generation Performance
- Changes in Rule Quality
- Conclusions and Future Work
- References
- Matrix-Based Mining Techniques
- Co-clustering with Augmented Data Matrix
- Introduction
- Related Work
- Problem Definition
- Co-clustering with Augmented Data Matrix Algorithm
- Experiments Result and Evaluation
- Data Description
- Comparison and Evaluation Methods
- Classification Based Evaluation
- Mutual Information Based Evaluation
- Conclusion
- References
- Using Confusion Matrices and Confusion Graphs to Design Ensemble Classification Models from Large Datasets
- Introduction
- Background
- Ensemble Base Model Design from Confusion Matrices
- Confusion Graphs and pVn Base Models
- Combination of Base Model Predictions
- Experimental Methods
- Datasets and Algorithms for the Experiments
- Preliminary Experiments for Confusion Matrix and Confusion Graph Creation
- Methods for Performance Evaluation
- Experimental Results
- Analysis of Base Model Diversity and Competence
- Evaluation of Performance for Discrete Classification
- Evaluation of Performance for Probabilistic Classification
- Conclusions
- References
- Data Mining and Knowledge Discovery Techniques
- Pairwise Similarity Calculation of Information Networks
- Introduction
- Related Work
- Graph Model
- Overview of SimRank
- Extending the Similarity Measure
- Topological Similarity
- Dealing with Loops in the Network
- Experimental Evaluation
- Evaluation Metric
- Experimental Results
- Conclusions
- References
- Feature Selection with Mutual Information for Uncertain Data
- Introduction
- Mutual Information
- Basic Notions
- Estimation
- MI Estimation with Uncertain Data
- Methodology and Experiments
- Methodology
- Experimental Results
- Conclusions
- References
- Time Aware Index for Link Prediction in Social Networks
- Introduction
- Related Research
- Supervised Learning Method for Predicting Links
- Features Used for Link Prediction
- New Index for Time Aware Link Prediction
- Experimental Evaluation
- Experiment Using Facebook Data
- Experiment with Coauthorship Data
- Conclusion and Future Research
- References
- An Efficient Cacheable Secure Scalar Product Protocol for Privacy-Preserving Data Mining
- Introduction
- Preliminaries
- Homomorphic Public-key Cryptosystems
- Semi-honest Model
- Related Work
- Caching Analysis of Proposed SSP Protocols
- Goethals et al.'s SSP Protocol
- Caching Analysis of Goethals et al.'s Protocol
- The Cacheable Secure Scalar Product Protocol
- The Correctness
- Caching Analysis
- Security Analysis
- Complexity Analysis
- Extension to Multi-party Environment
- Empirical Evaluations
- Conclusion
- References
- Data Mining and Knowledge Discovery Applications
- Learning Actions in Complex Software Systems
- Introduction
- Related Work
- Preliminaries
- Definitions
- Time Series Distance Measures
- Methodology
- Data
- Design of the Proposed Prediction Method
- Results and Discussion
- Experiment Setup
- Frequent Patterns
- Rule Evaluation
- Conclusions
- References
- An Envelope-Based Approach to Rotation-Invariant Boundary Image Matching
- Introduction
- Related Work and Existing Algorithms
- Single Envelope Lower Bound and Its Matching Algorithm
- Multi-envelope Lower Bound and Its Matching Algorithm
- Experimental Evaluation
- Experimental Data and Environment
- Experimental Results
- Conclusions
- References
- Finding Fraud in Health Insurance Data with Two-Layer Outlier Detection Approach
- Introduction
- Related Work
- Outlier Score for Single Records
- Statistics per Entity
- Statistics per Entity
- Identifying Fraud Sets
- The Fraud Scatter Plot
- Aggregation of Scores for Data Segments
- Results and Analysis
- Strange Behavior in Prescribing Drugs
- Finding Typos
- Patients with High Claim Costs
- Conclusions and Further Research
- References
- Enhancing Activity Recognition in Smart Homes Using Feature Induction
- Introduction
- Related Work
- Model Construction for Activity Recognition
- B&B Structure Learning Assisted HMM for Activity Recognition
- Feature Induction Assisted HMM for Activity Recognition
- Experiments and Results
- Conclusion and Future Work
- References
- Stream, Sensor and Time-Series Mining
- Mining Approximate Frequent Closed Flows over Packet Streams
- Introduction
- Related Work
- Approximate Closed Patterns
- The ACL-Stream Algorithm
- Incremental Maintenance over Packet Streams
- Experiments
- Accuracy Assessment
- The Detection Ability
- Memory Consumption and Throughput
- Conclusion
- References
- Knowledge Acquisition from Sensor Data in an Equine Environment
- Introduction
- Background and Motivation
- Related Research
- Context and Knowledge Representation
- Context Data
- Sensor Representation
- Imposing Context on Sensor Data
- Knowledge Acquisition
- Pre-integration Event Detection
- Post-integration Event Detection
- Experiments and Evaluation
- Conclusions
- References
- Concurrent Semi-supervised Learning of Data Streams
- Introduction
- Related Work
- Proposed Method
- Dynamic Tree Structure
- Online Update
- Concurrent Semi-supervised Learning
- Complexity Analysis
- Experiments and Analysis
- Experimental Setup
- Clustering Evaluation
- Classification Evaluation
- Scalability Evaluation
- Sensitivity Analysis
- Conclusions
- References
- A Bounded Version of Online Boosting on Open-Ended Data Streams
- Introduction
- Theoretical Background
- AdaBoost.M1
- Online Boosting
- Updateable Prequential Boosting
- Empirical Evaluation
- Conclusion
- References
- Moderated VFDT in Stream Mining Using Adaptive Tie Threshold and Incremental Pruning
- Introduction
- Research Background
- Very Fast Decision Tree (VFDT)
- Effects of Tie Breaking in Hoeffding Trees
- Moderated Very Fast Decision Tree (M-VFDT)
- Observation of Hoeffding Bound Fluctuation
- Adaptive Splitting Tie Threshold
- Pruning Mechanisms
- Experiments
- Conclusion and Future Work
- References
- Finding Critical Thresholds for Defining Bursts
- Introduction
- Related Work
- Problem Statement
- Monotonicity of the Coverage Function
- The Divide-and-Conquer Heuristics
- The One-Dimensional Problem
- The Two-Dimensional Problem
- Complexity Analysis
- Evaluation
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.