
Scientific and Statistical Database Management
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Content
- Title
- Organization
- Table of Contents
- Keynote Address I
- Navigating Oceans of Data
- Introduction
- The User Base
- The CMOP Observatory
- CMOP Interfaces and Tools
- Supporting Ranked Search for Datasets
- General Lessons
- Issues and Challenges
- References
- Uncertain and Probabilistic Data
- Probabilistic Range Monitoring of Streaming Uncertain Positions in GeoSocial Networks
- Introduction
- Related Work
- Managing Uncertain Moving Objects
- Capturing Positional Uncertainty
- Object Locations as Bivariate Gaussian Features
- System Model
- Approximation with Discretized Uncertainty Regions
- Probing Objects through Probabilistic Verifiers
- Towards Approximate Answering with Error Guarantees
- Online Range Monitoring over Streaming Gaussians
- Evaluation Strategy
- Pruning Candidates Using Indicative Minimal Areas
- Optimized Examination of Elementary Boxes
- Experimental Evaluation
- Experimental Setup
- Experimental Results
- Conclusion
- References
- Probabilistic Frequent Pattern Growth for Itemset Mining in Uncertain Databases
- Introduction
- Uncertain Data Model
- Problem Definition
- Contributions
- Related Work
- Probabilistic Frequent-Pattern Tree (ProFP-tree)
- ProFP-Tree Construction
- Construction Analysis
- Extracting Certain and Uncertain Support Probabilities
- Efficient Computation of Probabilistic Frequent Itemsets
- Efficient Computation of Probabilistic Support
- Extracting Conditional ProFP-Trees
- ProFP-Growth Algorithm
- Experimental Evaluation
- Scalability
- Effect of Uncertainty and Certainty
- Effect of minSup
- Conclusion
- References
- Evaluating Trajectory Queries over Imprecise Location Data
- Introduction
- Related Work
- Problem and Preliminaries
- Problem Setting
- u-bisector
- Basic Method
- Solution Framework
- Filtering Phase
- Trajectory Filter
- Segment Filter
- Trajectory Refinement Phase
- Trajectory Refinement
- Pruning Bounds for Three Cases
- Experimental Results
- Setup
- Quality Metric
- Performance Evaluation
- Conclusion
- References
- Efficient Range Queries over Uncertain Strings
- Introduction
- Related Work
- Background on Q-grams and Frequency Distance
- Pruning Techniques
- Probabilistic Q-gram based Filtering
- Frequency-Distance Based Pruning
- Combined Pruning
- Experiments
- Experiment Setup
- Performance Comparison
- Effect of Parameters
- Conclusion
- References
- Continuous Probabilistic Sum Queries in Wireless Sensor Networks with Ranges
- Introduction
- Problem Definition
- Probabilistic Sum Queries in Probabilistic Wireless Sensor Networks Having Discrete Data Distributions
- Energy Efficient Computation of Probabilistic Sum Queries
- Performance Evaluation
- Conclusions
- References
- Parallel and Distributed Data Management
- Partitioning and Multi-core Parallelization of Multi-equation Forecast Models
- Introduction
- Background of Multi-equation Forecast Models
- Partitioning for Multi-equation Forecast Models
- Parallelization of Independent Forecast Models
- Experimental Evaluation
- Parameter Estimation
- Parameter Re-estimation
- Cache Utilization
- Scalability
- Related Work
- Conclusion
- References
- Integrating GPU-Accelerated Sequence Alignment and SNP Detection for Genome Resequencing Analysis
- Introduction
- Background and Related Work
- Sequence Alignment
- SNP Detection
- The Workflow of Genome Resequencing Analysis
- Related Work
- System Implementation
- Analysis on the Traditional Workflow
- System Overview
- Range Partitioning
- Alignment Result Compression
- Evaluation
- Experimental Setup
- Performance Impact of Integration Techniques
- End-to-End Performance Comparison
- Conclusion
- References
- Discovering Representative Skyline Points over Distributed Data
- Introduction
- Related Work
- Preliminaries and Problem Statement
- Distributed Representative Skyline Algorithms
- Distributed Skyline Algorithm (DSA)
- Distributed Skyline Representative Algorithm (DSR)
- Distributed Error-Based Representative Algorithm (DER)
- Experimental Evaluation
- Experiments with Distance-Based Representative
- Experiments with Dominance Representative
- Conclusions
- References
- SkyQuery: An Implementation of a Parallel Probabilistic Join Engine for Cross-Identification of Multiple Astronomical Databases
- Introduction
- Astronomical Surveys
- Astronomical Catalogs
- Coordinate-Based Cross-Identification
- Previous Work
- SQL for Astronomical Data Mining
- SQL Language Extensions for SkyQuery
- Defining the N-way Probabilistic Join
- Implementation Details
- Hardware and Software Setup
- Database Setup
- Query Optimization and Partitioning
- Jobs as Parallel Workflows
- Performance and Scaling Considerations
- Metadata Management and Provenance from Queries
- Summary and Future Work
- References
- Efficient Filtering in Micro-blogging Systems: We Won't Get Flooded Again
- Introduction
- Data Model
- Filter Indexing
- Experiments
- Memory Requirement
- Matching Time
- Related Work
- Conclusion
- References
- Graph Processing
- Regular Path Queries on Large Graphs
- Introduction
- Related Work
- Terms and Definitions
- Answering RPQs Using Rare Labels
- Rare Labels
- Searching the Graph Using Rare Labels
- Determining Rare Labels
- Implementation
- Search Algorithms
- Two-Way Search Complexity
- Parallelization
- Experimental Results
- Graphs and Queries
- Comparing with other Implementations
- Scalability: Graph Size and Density
- Influence of Query Types
- Parallelization
- Conclusion
- References
- Sampling Connected Induced Subgraphs Uniformly at Random
- Introduction
- Related Work
- Algorithms
- Acceptance-Rejection Sampling
- Random Vertex Expansion
- Metropolis-Hastings Sampling
- Neighbour Reservoir Sampling
- Performance Evaluation
- Experimental Setup
- Mixing Time
- Effectiveness
- Efficiency
- Efficiency versus Effectiveness
- Sampling Graph Properties
- Discussion
- Conclusion
- References
- Discovery of Top-k Dense Subgraphs in Dynamic Graph Collections
- Introduction
- Related Work and Contributions
- Dense Subgraphs in Graph Collections
- Preliminaries
- Dense Subgraph Discovery in a Set of Graphs
- Dense Subgraphs in a Stream of Graphs
- Performance Evaluation Study
- Performance of Exact Algorithms
- Trading Accuracy for Speed
- Concluding Remarks
- References
- On the Efficiency of Estimating Penetrating Rank on Large Graphs
- Introduction
- Preliminaries
- Two Forms of P-Rank Solution
- An Algorithm for P-Rank Deterministic Computation
- Probabilistic P-Rank Similarity Estimation
- A Probabilistic P-Rank Model
- A Scalable Algorithm for P-Rank Estimation
- Experimental Evaluation
- Experimental Settings
- Experimental Results
- Related Work
- Conclusion
- References
- Towards Efficient Join Processing over Large RDF Graph Using MapReduce
- Introduction
- Preliminaries
- Problem Definition
- Cost Model
- Query Processing
- Implementations
- Experiments
- Related Work
- Conclusion
- References
- Panel
- Panel on "Data Infrastructures and Data Management Research: Close Relatives or Total Strangers?"
- Mining Multidimensional Data
- Efficient Similarity Search in Very Large String Sets
- Introduction
- Related Work
- Basic Concepts and Definitions
- Similarity Search and Measures
- Tries and NFAs
- State Set Index
- Index Structure
- Algorithms
- Evaluation
- Evaluation of SSI Parameters
- Index Creation Time and Memory Consumption
- Query Answering
- Conclusion
- References
- Substructure Clustering: A Novel Mining Paradigm for Arbitrary Data Types
- Introduction
- Challenges in Substructure Clustering
- Substructure Clustering
- Substructure Definition
- Cluster Definition
- Clustering Definition
- Related Work
- An Algorithm for Subgraph Clustering
- Experiments
- Conclusion
- References
- BT* - An Advanced Algorithm for Anytime Classification
- Introduction
- Related Work
- BT*
- Anytime Bayesian classification
- Parameter Optimization
- Decision Design
- Experiments
- Parameter Optimization
- Decision Design
- Combining Approaches
- Scalability
- Summary
- Conclusion
- References
- Finding the Largest Empty Rectangle Containing Only a Query Point in Large Multidimensional Databases
- Introduction
- Related Work
- Empty Rectangle with Largest Area That Contains only a Query Point
- Basic Definitions
- Obtaining the CERs
- Computing the Rectangle with the Largest Area Containing q
- Experimental Results
- Conclusions
- References
- Sensitivity of Self-tuning Histograms: Query Order Affecting Accuracy and Robustness
- Introduction
- Self-tuning and Its Sensitivity to Learning
- Histogram Structure and Cardinality Estimation
- The Problem with Self-tuning: Sensitivity to Learning
- Histogram Initialization by Subspace Clustering
- Experiments
- Accuracy
- Robustness
- Conclusions
- References
- Provenance and Workflows
- Database Support for Exploring Scientific Workflow Provenance Graphs
- Introduction
- Preliminaries: Provenance Model and Query Language
- Operators for Exploring Workflow Provenance Graphs
- Implementation
- Experimental Results
- Related Work
- Conclusion
- References
- (Re)Use in Public Scientific Workflow Repositories
- Introduction
- Materials and Methods
- Data Sets
- Identifying Shared Workflow Elements
- Results
- Processors
- Dataflows
- Workflows (Top-Level Dataflows)
- Discussion
- Summary
- References
- Aggregating and Disaggregating Flexibility Objects
- Introduction
- Flex-object Databases
- Problem Formulation
- Aggregation and Disaggregation
- N-to-M Aggregation
- Incremental N-to-M Aggregation
- Experimental Evaluation
- Related Work
- Conclusion and Future Work
- References
- Fine-Grained Provenance Inference for a Large Processing Chain with Non-materialized Intermediate Views
- Introduction
- Motivating Scenario
- Proposed Multi-step Provenance Inference
- Overview of the Algorithm
- Documenting Coarse-Grained Provenance
- Backward Computation: Calculating Initial Tuple Boundary
- Forward Computation: Building Provenance Graph
- Evaluation
- Evaluating Criteria and Test cases
- Accuracy
- Precision and Recall
- Related Work
- Conclusion and Future Work
- References
- Automatic Conflict Resolution in a CDSS
- Introduction
- Automated Conflict Resolution in CDSS
- Example Scenario
- Conclusion and Future Work
- References
- Processing Scientific Queries
- Tracking Distributed Aggregates over Time-Based Sliding Windows
- Introduction
- Problem Definitions and Our Results
- The Forward/Backward Framework
- Warm-Up: Basic Counting
- Heavy Hitters
- Quantiles
- Other Functions
- Concluding Remarks
- References
- Hinging Hyperplane Models for Multiple Predicted Variables
- Introduction
- Related Work
- The Hinging Hyperplane Model
- Preliminaries
- The Hinge Finding Algorithm for a Single Output
- Hinge Regression for Multiple Outputs
- Finding the Consensus Separator
- Forcing Continuous Joins
- The Hinge Finding Algorithm for Multiple Outputs
- Experiments
- Conclusion
- References
- Optimizing Notifications of Subscription-Based Forecast Queries
- Introduction
- Foundations of Subscription-Based Forecast Queries
- Forecast-Based Subscriptions
- Processing Model
- Cost Model
- Optimization Problems
- Computation Approaches
- Experimental Evaluation
- Experimental Setting
- Evaluation of Computation Approaches
- Influence of Subscription Parameters
- Computational Costs
- Cost Model Validation
- Related Work
- Conclusion and Future Work
- References
- Minimizing Index Size by Reordering Rows and Columns
- Introduction
- Related Work
- Compressing Bitmap Indexes
- Data Reordering Techniques
- Theoretical Analysis
- Counting k-tuples
- Accidental Chunks
- Asymptotic Case
- Zipfian Data
- Experimental Measurements
- Number of Runs
- FastBit Index Sizes
- Conclusions
- References
- Data Vaults: A Symbiosis between Database Technology and Scientific File Repositories
- Introduction
- Related Work
- Data Vault Requirements
- Data Vault Architecture
- Data Vault for Remote Sensing
- Summary and Future Work
- References
- Keynote II
- Usage Data in Web Search: Benefits and Limitations
- Introduction
- The Third Web Search Revolution
- Search Usage Data
- Benefits
- Today's Entry Barrier?
- Three Conflicting Factors
- Size of Data
- Personalization
- Privacy
- The Wisdom of ``Ad Hoc" Crowds
- Final Remarks
- References
- Support for Demanding Applications
- Functional Feature Extraction and Chemical Retrieval
- Introduction
- Related Work
- Overview of Proposed Approach
- Chemical Feature Extraction
- Structural Formula Representation
- Functional Group Identification
- Chemical Functional Group (CFG) Graph
- Chemical Feature Extraction
- Query Retrieval
- Performance Evaluation
- Conclusion
- References
- Scalable Computation of Isochrones with Network Expiration
- Introduction
- Related Work
- Isochrones in Multimodal Networks
- Incremental Network Expansion in Multimodal Networks
- Algorithm MINEX
- Expiration of Vertices
- Properties
- Empirical Evaluation
- Overview
- Memory Consumption
- Multiple Loading of Tuples
- Runtime
- Conclusion and Future Work
- References
- A Dataflow Graph Transformation Language and Query Rewriting System for RDF Ontologies
- Introduction
- Motivating Example
- A Dataflow Language for Transforming RDF Ontologies
- IML Transforming Operations
- IML Query Rewriting and Optimization
- Query Pattern Rule (QPR) Sets
- Query Rewriting Process
- Rewriting Optimizations
- Rule-Based Optimizations
- Performance Optimizations
- Implementation
- Evaluation
- Rule Explosion and Rule Optimizations
- Best Rewritten Query Performance
- Impact of rewriting options
- Related Work
- Conclusions
- References
- Sensitive Label Privacy Protection on Social Network Data
- Introduction
- Related Work
- Problem Definition
- Algorithm
- Algorithm GINN
- Experimental Evaluation
- Data Utility
- Information Loss
- Algorithm Scalability
- Conclusions
- References
- Trading Privacy for Information Loss in the Blink of an Eye
- Introduction
- Anonymity Negotiation over a Full Lattice
- Experiments
- Summary and Pointers for Further Probing
- References
- Demonstration and Poster Papers
- Extracting Hot Spots from Satellite Data
- Introduction
- GEO-Grid
- MODIS
- ASTER
- Detecting Hot Spots
- Hot Spots
- Computing Radiance Temperature
- Threshold Based Method
- Statistics Based Method
- Implementation of Hot Spot Detections
- Evaluation
- Conclusions and Future Work
- References
- A Framework for Enabling Query Rewrites when Analyzing Workflow Records
- Introduction
- Motivation and Basic Concepts
- Query Rewriting and Evaluation
- References
- Towards Enabling Outlier Detection in Large, High Dimensional DataWarehouses
- Introduction
- A Framework for Detecting Outliers in a Data Warehouse
- Experiments and Concluding Remarks
- References
- Multiplexing Trajectories of Moving Objects
- Motivation
- A Multiplexing Framework against Trajectory Streams
- Problem Formulation
- Trajectory Encoding
- Group Detection
- Preliminary Evaluation
- Outlook
- References
- On Optimizing Workflows Using Query Processing Techniques
- Introduction
- Database-Inspired Solutions to Workflow Optimization
- Case Studies
- Conclusions
- References
- Optimizing Flows for Real Time Operations Management
- Modern Analytic Flows
- A Cyber-Physical Flow
- Optimization Techniques and Tradeoffs
- Conclusion
- References
- (In?)Extricable Links between Data and Visualization: Preliminary Results from the VISTAS Project
- Introduction
- VISTAS Project Overview
- VISTAS Architecture and Data Structures
- Conclusions
- References
- FireWatch: G.I.S.-Assisted Wireless Sensor Networks for Forest Fires
- Introduction
- System Architecture
- FireWatch Interface
- Conclusions
- References
- AIMS: A Tool for the View-Based Analysis of Streams of Flight Data
- Introduction
- System Architecture
- View-Based Flight Analysis
- Incremental Stream Analysis
- First Results
- Demonstration
- References
- TARCLOUD: A Cloud-Based Platform to Support miRNA Target Prediction
- Introduction
- The TARCLOUD Solution
- Framework
- Architecture
- User Interface
- Demonstration Scenarios
- References
- SALSA: A Software System for Data Management and Analytics in Field Spectrometry
- Introduction
- Field Spectrometry
- A Robotic Tram System
- A Software System
- Functionalities and Technicalities
- Conclusions
- References
- Incremental DNA Sequence Analysis in the Cloud
- Introduction
- The Stream-as-you-go Approach
- The DNA Sequence Analysis Use Case
- Read Alignment a la Stream-as-you-go
- SNP Detection a la Stream-as-you-go
- Demonstration Details
- References
- AITION: A Scalable Platform for Interactive Data Mining
- AITION Description
- KDD Workflow
- System Architecture
- Demonstration Overview
- Conclusion and Future Work
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.