Database and Expert Systems Applications

Name: Database and Expert Systems Applications | 26th International Conference, DEXA 2015, Valencia, Spain, September 1-4, 2015, Proceedings, Part I
Brand: Springer
Price: 53.49 EUR
Availability: OnlineOnly

26th International Conference, DEXA 2015, Valencia, Spain, September 1-4, 2015, Proceedings, Part I

Qiming Chen Abdelkader Hameurlain Farouk Toumani Roland Wagner Hendrik Decker(Editor)

Springer (Publisher)

Published on 10. August 2015

XXXI, 578 pages

E-Book

PDF with digital watermarking

System requirements

978-3-319-22849-5 (ISBN)

€53.49incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Intro
Preface
Organization
Organization of the Special Section Globe 2015(8th International Conference onData Management in Cloud, Grid and P2P Systems)
Keynote Talks
SQL, NoSQL, and Next Generation DataStores (Extended Abstract)
Pattern Recognition in Embedded Systems:An Overview
Contents - Part I
Contents - Part II
Keynote Talk
Pattern Recognition in Embedded Systems: An Overview
1 Introduction
2 Pattern Recognition
2.1 Different Signals Used in Pattern Recognition Tasks
2.2 Embedded Systems
2.3 Applications
2.4 Medical Applications
3 Approaches
3.1 Using Off-the-Shelf Architectures
3.2 Using Specialized DSP's
3.3 Using Massively Parallel HW (e.g. GPUs)
3.4 Using Ad-Hoc or Reconfigurable HW
References
Temporal, Spatial and High Dimensional Databases
Restricted Shortest Path in Temporal Graphs
1 Introduction
1.1 Related Work
1.2 Notations and Problem Formulation
1.3 Minimum Penalty Temporal Path
1.4 Organization of the Paper
2 Exact Algorithms Using Dynamic Programming
2.1 Query Time Span Independent Algorithms
2.2 Query Time Span Dependent Algorithm
3 A* Algorithm
3.1 Correctness and Complexity
3.2 Obtaining Estimates
4 Approximation Algorithm
5 Experimental Evaluation
5.1 Settings and Dataset
5.2 Algorithms on flight dataset
5.3 A* Algorithm on KONECT Dataset
6 Conclusion
References
An Efficient Distributed Index for Geospatial Databases
1 Introduction
2 Related Work
3 Distributed Spatial Index
3.1 Basis of Distributed Spatial Index
3.2 Index Design
3.3 LCP-Based Region Partition
4 The BGRP Tree
4.1 The BGRP Task
4.2 BGRP Tree Searching
4.3 BGRP Tree Insertion
5 Experimental Evaluation
5.1 Experimental Setup
5.2 Dataset
5.3 Comparison Method
5.4 Results and Discussion
6 Conclusion
References
The xBR+-tree: An Efficient Access Method for Points
1 Introduction
2 Related Work and Motivation
3 The xBR-tree Family
3.1 Internal Nodes
3.2 Leaf Nodes
3.3 Splitting of Internal Nodes
3.4 Tree Building
4 Query Processing Algorithms on the xBR-tree Family
5 Experimentation
6 Conclusions and Future Work
References
Semantic Web and Ontologies
Probabilistic Error Detecting in Numerical Linked Data
1 Introduction
2 Related Work
3 Probabilistic Error Detection in Numerical Attributes
3.1 Probabilistic Model
3.2 Probabilistic Error Detection
4 Implementation
4.1 Numerical Attribute Selecting
4.2 Data Preprocessing
5 Experimental Study
5.1 Datasets
5.2 Effectivity Evaluation Results
5.3 Efficiency Evaluation Results
5.4 Error Analysis
6 Conclusions
References
From General to Specialized Domain: Analyzing Three Crucial Problems of Biomedical Entity Disambiguation
1 Introduction
2 Problem Statement and Modeling
2.1 Identifying Important Properties of a Specialized Disambiguation System
2.2 Modeling the Properties in Context of a Biomedical Disambiguation System
3 Approach
3.1 Entity-Centric and Document-Centric Disambiguation
3.2 Feature Choice
3.3 Federated Entity Disambiguation
4 Data Set
5 Evaluation
5.1 Basic Parameter Settings
5.2 Entity Context and User Data
5.3 Knowledge Base Size and Heterogeneity
5.4 Noisy User Data
6 Related Work
7 Conclusion and Future Work
References
Ontology Matching with Knowledge Rules
1 Introduction
2 Ontology Matching
3 Representation of Domain Knowledge
4 Our New Knowledge-Based Strategy
5 Finding Complex Correspondences
6 Knowledge Aware Ontology Matching
7 Experiments
7.1 NBA
7.2 Census
7.3 OntoFarm
8 Conclusion
References
Modeling, Linked Open Data
Detection of Sequences with Anomalous Behavior in a Workflow Process
1 Introduction
2 Related Work
3 Proposed Approach
3.1 Learning a Workflow Model from Execution Logs
3.2 Detecting Sequences with Abnormal Behavior
4 Experimental Evaluation
4.1 Selection of the Smoothing Constant
4.2 Recognition of Abnormal Behavior Sequences
5 Conclusions
References
An Energy Model for Detecting Community in PPI Networks
1 Introduction
2 An Energy Model
2.1 Energy Between Vertices
2.2 Grouping Vertices into Community
3 Performance Evaluation
References
Filtering Inaccurate Entity Co-references on the Linked Open Data
Abstract
1 Introduction
2 Background
3 Proposed Approach
4 The Components of SCID
4.1 Frequency Count Statistics
4.2 The Category Distribution Function
4.3 The Category Selection Function
5 The SCID Filter
5.1 Algorithm 1 -- Constructing Disambiguation Vectors
5.2 Algorithm 2 -- Detection of Inaccurate Identity Links
6 Experimental Evaluation
7 Conclusion
References
Quality Metrics for Linked Open Data
Abstract
1 Introduction
2 Related Works
3 Our Proposed Approach for Metric Development
3.1 Identifying Quality Deficiencies
3.2 Proposed Metrics
4 Empirical Evaluation
5 Guidelines for Quality Improvement
6 Conclusion and Future Works
References
NoSQL, NewSQL, Data Integration
A Framework of Write Optimization on Read-Optimized Out-of-Core Column-Store Databases
1 Introduction
2 Background of the Column-Store Database
3 OOC Update Optimization
3.1 Timestamped BAT
3.2 Asynchronous Out-of-Core Update
3.3 Deletion Optimization
4 Update on Column-Stores in Map-Reduce
4.1 Update on BAT in Map-Reduce
4.2 Timestamped BAT in Map-Reduce
4.3 Asynchronous Map-Only Update on Column-Stores
4.4 Map-Reduce Selection on TBAT
5 A Write-Optimized Framework
6 Experiment Results
6.1 Tests on Conventional OOC Storage
6.2 Tests on HDFS
7 Conclusion and Future Works
References
Integrating Big Data and Relational Data with a Functional SQL-like Query Language
Abstract
1 Introduction
2 Query Language
2.1 MFR Notation
2.2 Combining SQL and MFR
3 Query Engine
4 Query Rewriting
4.1 Operation Pushdowns
4.2 MFR Rewrite Rules
5 Validation
6 Related Work
7 Conclusion
References
Comparative Performance Evaluation of Relational and NoSQL Databases for Spatial and Mobile Applications
1 Introduction
2 Related Work
3 Comparison Methodology
3.1 Processing Stages
3.2 Dataset
3.3 Data Loading
3.4 Spatial Queries
4 Experimental Evaluation
4.1 Evaluation Setup
4.2 Parameters and Metrics
4.3 Performance Evaluation
4.4 Relative Performance Summary
5 Concluding Remarks
References
Uncertain Data and Inconsistency Tolerance
Query Answering Explanation in Inconsistent Datalog+/- Knowledge Bases
1 Introduction
2 Formal Settings and Problem Statement
2.1 Language Specification
2.2 Problem Statement
2.3 Rule-Based Dung Argumentation Framework Instantiation
3 Argumentative Explanation
3.1 Explaining Query Acceptance
3.2 Explaining Query Failure
4 Algorithms
4.1 Computing Defense Tree
4.2 Computing Strong Proponent and Opponent Sets
4.3 Computing Explanations
5 Discussion and Conclusion
References
PARTY: A Mobile System for Efficiently Assessing the Probability of Extensions in a Debate
1 Introduction
2 Preliminaries
2.1 Abstract Argumentation
2.2 Probabilistic Abstract Argumentation
3 Computing Extensions' Probabilities in Abstract Argumentation
3.1 Computing PrcfF(S), PradF(S) and PrstF(S)
3.2 Estimating PrcoF(S), PrgrF(S) and PrprF(S)
4 The PARTY System
5 Experimental Evaluation
6 Related Work
7 Conclusions
References
Uncertain Groupings: Probabilistic Combination of Grouping Data
1 Introduction
1.1 Use Case
1.2 Combining Grouping Data
1.3 Related Work
2 Probabilistic Integration of Grouping Data
2.1 Running Example
2.2 Integration Views
2.3 Formalization
2.4 Integration Views Revisited
3 Evaluation
3.1 Experimental Setup
3.2 Experiments
4 Discussion
5 Conclusions
References
Database System Architecture
Cost-Model Oblivious Database Tuning with Reinforcement Learning
1 Introduction
2 Related Work
3 Problem Definition
4 Adaptive Performance Tuning
4.1 Algorithm Framework
4.2 Reducing the Search Space
4.3 Modified Policy Iteration with Cost Model Learning
5 Case Study: Index Tuning
5.1 Reducing the Search Space
5.2 Defining the Feature Mapping
5.3 Defining the Feature Mapping
6 Performance Evaluation
6.1 Experimental Setup
6.2 Dataset and Workload
6.3 Efficiency
6.4 Effectiveness
7 Conclusion
References
Towards Making Database Systems PCM-Compliant
1 Introduction
2 Problem Framework
3 The Sort Operator
4 The Hash Join Operator
5 The Group-By Operator
5.1 Hash-Based Grouping
5.2 Sort-Based Grouping
6 Simulation Testbed
6.1 Architectural Platform
6.2 Database and Queries
6.3 Performance Metrics
7 Experimental Results
7.1 Operator-Wise Analysis
7.2 Lifetime Analysis
7.3 Validating Write Estimators
8 Query Optimizer Integration
9 Conclusion
References
Workload-Aware Self-Tuning Histograms of String Data
1 Introduction
2 Background
3 Self-Tuning String Histograms
3.1 Preliminaries
3.2 Cardinality Estimation
3.3 Histogram Construction and Refinement
3.4 Bucket Merging
3.5 Discussion
4 Experiments
5 Conclusions
References
Data Mining I
Data Partitioning for Fast Mining of Frequent Itemsets in Massively Distributed Environments
1 Introduction
2 Definitions and Background
3 Parallel Absolute Top down Algorithm
3.1 Impact of Partitioning Data on 2-Jobs Schema
3.2 IBDP: An Overlapping Data Partitioning Strategy
3.3 1-Job Schema: Complete Approach
3.4 Proof of Correctness
4 Experiments
4.1 Experimental Setup
4.2 Real World Datasets
4.3 Runtime and Scalability
4.4 Data Communication and Energy Consumption
5 Related Work
6 Conclusion
References
Does Multilevel Semantic Representation Improve Text Categorization?
1 Introduction
2 Related Work
2.1 LDA and Online LDA
2.2 Topic Based Text Categorization
3 Methodology
3.1 Problem Formulation
3.2 ML-OLDA
3.3 Learning Semantic Space
3.4 Topical Feature Extraction
4 Experiments
4.1 Qualitative Experiment on Wikipedia
4.2 Text Categorization on 20newsgroups Dataset
5 Result and Discussion
6 Conclusion
References
Parallel Canopy Clustering on GPUs
1 Introduction
2 Related Work
2.1 Parallel Clustering
2.2 GPU Clustering
3 GPU Computing
4 Simple Canopy Clustering
4.1 Algorithm
4.2 Implementation
5 Canopy Clustering with Grid Index
5.1 Grid Index
5.2 Implementation
6 Experiments
6.1 Experimental Settings
6.2 Results
7 Conclusions
References
Query Processing and Optimization
Efficient Storage and Query Processing of Large String in Oracle
Abstract
1 Introduction
1.1 Related Work
2 Datatype and Storage
3 Locator vs. Scalar Value
4 Predicate Filter Injection and Operator Evaluation Optimization
5 Index and DML Issues
6 STANDARD_HASH Operator
7 Efficient Memory Management
7.1 Mutable Memory and 32 K Varchar
8 Query and DML Performance
8.1 Experimental Setup
8.2 Experiment I: INSERT
8.3 Experiment II: SELECT
8.4 Experiment III: DELETE
8.5 Experiment IV: Operator Evaluation Optimization
9 Conclusion
Acknowledgements
References
SAM: A Sorting Approach for Optimizing Multijoin Queries
1 Introduction
2 Related Work
3 Sorting Approach to Finding an Optimal Join Order
3.1 A Comparator for Sorting
3.2 SAM's 6-Step Sorting Approach
3.3 SAM's Optimality
3.4 Complexity Analysis
3.5 Selectivity Measurement
4 Experiments
4.1 Plan Time for Choosing a Join Order
4.2 Execution Time for the Chosen Join Order
5 Conclusion and Current Work
References
GPU Acceleration of Set Similarity Joins
1 Introduction
2 Similarity Joins Over Sets
2.1 Set Similarity Joins
2.2 MinHash
3 General-Purpose Processing on Graphics Processing Units
4 GPU Acceleration of Set Similarity Joins
4.1 Preprocessing
4.2 Signature Matrix Computation on GPU
4.3 Similarity Joins on GPU
5 Experiments
5.1 Datasets
5.2 Environment
5.3 Performance Comparison
5.4 Accuracy Evaluation
6 Related Work
7 Conclusions
References
Data Mining II
Parallel Eclat for Opportunistic Mining of Frequent Itemsets
1 Introduction
2 Problem Statement and Preliminaries
2.1 Problem Statement
2.2 Strategies for Mining Frequent Itemsets
2.3 Support Counting and Data Formats
3 Opportunistic Vertical Mining Approach
3.1 Our Hybrid Vertical Format
3.2 Enabling Our Opportunistic Vertical Mining
3.3 Our Search Strategy
4 New Parallel Algorithm Based on MapReduce
4.1 Our Peclat Algorithm
4.2 The mrCountingItems Job
4.3 The mrLargeK Job
4.4 The mrMiningSubtrees Job
5 Experimental Evaluation
5.1 Performance Comparison with Other Algorithms
5.2 Anatomy of Opportunistic Vertical Mining Approach
6 Conclusion
References
Sequential Data Analytics by Means of Seq-SQL Language
1 Introduction
2 Leading Example
2.1 Standard Approach
2.2 Our Approach
3 Related Work
4 Seq-SQL Data Model
5 Seq-SQL Language
6 Seq-SQL Prototype
6.1 Architecture
6.2 Performance Evaluation
7 Conclusions and Future Work
References
Clustering Attributed Multi-graphs with Information Ranking
1 Introduction
2 Related Work
2.1 Distance-Based Clustering
2.2 Model-Based Clustering
3 Problem Definition
4 The Proposed CAMIR Method
4.1 Overview
4.2 Information Ranking
4.3 Generating the Final Clusters
5 Experiments
5.1 Datasets
5.2 Evaluation Protocol
5.3 Evaluation on Synthetic Datasets
5.4 Evaluation on Real-World Datasets
6 Conclusion
References
Indexing and Decision Support Systems
Building Space-Efficient Inverted Indexes on Low-Cardinality Dimensions
1 Introduction
2 The Derived List
3 Candidates Generation
4 Derived Lists Selection
5 Experimental Evaluation
6 Conclusions
References
A Decision Support System for Hotel Facilities Inventory Management
Abstract
1 Introduction
2 Data Analysis
3 Forecasting Algorithms
3.1 First Algorithm (Simple Moving Average)
3.2 Second Algorithm (Order History Based)
3.3 Third Algorithm (Simple Moving Average with Number of the Guests)
3.4 Fourth Algorithm (Order History Based with Number of the Guests)
3.5 Forecasting Pick-up
4 Results
5 Conclusion
6 Future Works
Acknowledgments
References
TopCom: Index for Shortest Distance Query in Directed Graph
1 Introduction
2 Method
2.1 Topological Compression
2.2 Index Generation
2.3 Query Processing
3 Experimental Evaluation
4 Conclusions
References
A Universal Distributed Indexing Scheme for Data Centers with Tree-Like Topologies
1 Introduction
2 Related Work
3 Data Centers with Tree-Like Topologies
4 The U2-Tree
4.1 Local Index Construction
4.2 Potential Indexing Range Assignment
4.3 Publishing Scheme
4.4 Global Index Construction
5 Update and Maintenance
5.1 Index Updating
5.2 Index Tuning
5.3 Fault Tolerance
6 Query Processing
7 Performance Evaluation
8 Conclusion
References
Data Mining III
Improving Diversity Performance of Association Rule Based Recommender Systems
1 Introduction
2 Overview to Compute Diversity of Patterns
2.1 Overview of Diversity
2.2 Approach to Compute the Diversity of Patterns
3 Proposed Approach for Diverse Recommendations
3.1 Association Rule Based RS Approach
3.2 Proposed Approach
4 Experimental Results
4.1 Preparation of Data Set and Methodology
4.2 Results
5 Summary and Conclusions
References
A Prime Number Based Approach for Closed Frequent Itemset Mining in Big Data
1 Introduction
2 Related Work
3 Preliminary Notions
4 Frequent Closed Itemset Mining
4.1 First Step: Prime Number Transformation
4.2 Second Step: Closed Frequent Itemset Mining
5 Experimental Evaluation
6 Conclusion
References
Multilingual Documents Clustering Based on Closed Concepts Mining
1 Introduction
2 Literature Review
3 Multilingual Documents Clustering Approach
3.1 Mathematical Foundations: Key FCA Settings
3.2 Multilingual Closed Concepts Extraction
3.3 Closed Concepts Translation and Disambiguation
3.4 Multilingual Closed Concepts Alignment
4 Experiments and Results
4.1 Description of the Comparable Corpora
4.2 Evaluation Framework
4.3 Experimental Results and Discussion
5 Conclusion
References
Modeling, Extraction, Social Networks
Analyzing the Strength of Co-authorship Ties with Neighborhood Overlap
1 Introduction
2 Related Work
3 Datasets Main Features
4 Characterizing the Strength of Ties
4.1 Neighborhood Overlap Characterization
4.2 Granovetter's Theory Analysis
5 The Impact of the Properties on the Strength of Ties
5.1 Correlation Analysis
5.2 Regression Analysis
6 Conclusions
References
Event Extraction from Unstructured Text Data
1 Introduction
2 Related Work
3 Problem Definition and Approach
3.1 Data Pre-processing
3.2 Event Extraction
4 Experimental Evaluation
4.1 Datasets
4.2 Evaluation of Bigrams
4.3 Evaluation of Removing Stop Words
4.4 Event Extraction from the Enterprise Dataset
4.5 Event Extraction from the Twitter Dataset
5 Conclusion
References
A Cluster-Based Epidemic Model for Retweeting Trend Prediction on Micro-blog
1 Introduction
2 Related Work
2.1 Analysis and Prediction of Retweeting Behaviors
2.2 Retweeting Trend Prediction
3 Characteristics of Tweets' Retweeting on Micro-Blog
4 Problem Definition
5 Analogy Between Tweets Spread and Epidemic Spread
5.1 Subjects
5.2 Influence Factors
5.3 Spread Mechanisms
6 A Cluster-Based SIS Model for Retweeting Trend Prediction
6.1 The Model
6.2 Clustering of Infectious and Susceptible Crowds
6.3 Predicting a Tweet's Retweeting Trend
7 Evaluation
7.1 Set-Up
7.2 Performance of Multiple Retweeting Peaks Prediction
7.3 Performance of Retweeting Coverage Prediction
7.4 Performance of Retweeting Lifetime Prediction
8 Conclusion
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Database and Expert Systems Applications

Description

More details

Other editions

Additional editions

Content

System requirements