
Big Data Analytics and Knowledge Discovery
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
This book constitutes the refereed proceedings of the 17th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2015, held in Valencia, Spain, September 2015.
The 31 revised full papers presented were carefully reviewed and selected from 90 submissions. The papers are organized in topical sections similarity measure and clustering; data mining; social computing; heterogeneos networks and data; data warehouses; stream processing; applications of big data analysis; and big data.
More details
Other editions
Additional editions

Content
- Intro
- Preface
- Organization
- Contents
- Similarity Measure and Clustering
- Determining Query Readiness for Structured Data
- 1 Introduction
- 2 Formalizing Data Readiness Level
- 2.1 Toll Booth Example --- Traffic Flow Identification
- 2.2 Data Readiness Level: Intuition and Preliminaries
- 2.3 The Relevance Dimension of DRL
- 2.4 The Completeness Dimension of DRL
- 2.5 Putting It Together: Data Readiness Level Tuples
- 3 Improving Readiness Level of Data for Task at Hand
- 3.1 Taxonomy of DRL-Improving Operators
- 3.2 Illustration Using the Running Example
- 4 Use Case: Marketing via Targeted Mailings
- 5 Related Work
- 6 Conclusion and Future Work
- References
- Efficient Cluster Detection by Ordered Neighborhoods
- 1 Introduction
- 2 Formal Properties
- 2.1 Clustering in Neighborhoods
- 2.2 Ordered Neighborhoods
- 3 Mining the Neighborhoods
- 3.1 Complexity Analysis
- 4 Empirical Evaluation
- 4.1 Experimental Setup
- 4.2 Heterogeneous Datasets
- 4.3 Scalability Results
- 4.4 Real World Datasets
- 5 Conclusion
- References
- Unsupervised Semantic and Syntactic Based Classification of Scientific Citations
- 1 Introduction
- 2 Related Work
- 3 Citation Clustering
- 3.1 Semantic-Based Model
- 3.2 Syntactic-Based Model
- 4 Experiments, Results and Evaluation
- 5 Conclusion and Future Work
- References
- Data Mining
- HI-Tree: Mining High Influence Patterns Using External and Internal Utility Values
- 1 Introduction
- 2 Related Work
- 3 Preliminaries
- 3.1 Online Frequent Itemsets Mining
- 3.2 Influence Factor
- 4 High Influence Tree (HI-Tree)
- 4.1 HI-Tree Structure
- 4.2 HI-Tree Construction
- 4.3 HI-Tree Mining
- 5 Experimental Evaluation
- 5.1 Varying Minimum Threshold
- 5.2 Reduction Ratio
- 5.3 Scalability Test and Rule Validation
- 6 Conclusion and Future Work
- References
- Balancing Tree Size and Accuracy in Fast Mining of Uncertain Frequent Patterns
- 1 Introduction and Related Works
- 2 Background
- 3 Our MUF-tree Structure
- 4 Our MUF-growth Algorithm
- 5 Evaluation Results
- 5.1 Analytical Evaluation
- 5.2 Empirical Evaluation
- 6 Conclusions
- References
- Secure Outsourced Frequent Pattern Mining by Fully Homomorphic Encryption
- 1 Introduction
- 2 Related Works
- 3 Preliminaries
- 3.1 Frequent Pattern Mining Problem
- 3.2 Secure Outsourced Mining by Fully Homomorphic Encryption
- 3.3 A Variant of the Apriori Algorithm
- 4 Secured Protocol Based on FHE for Pattern Mining
- 5 Privacy Preserving Protocol for Pattern Mining
- 5.1 The Notion of -Pattern Uncertainty
- 5.2 Privacy Preserving Protocol for Counting Candidates
- 6 Experimental Evaluation
- 7 Conclusions and Future Work
- References
- Supervised Evaluation of Top-k Itemset Mining Algorithms
- 1 Introduction
- 2 Problem Statement and Algorithms
- 2.1 Notation and Problem Statement
- 2.2 Minimizing Noise (Asso)
- 2.3 Minimizing the Pattern Set Complexity (Hyper+)
- 2.4 Minimizing Multiple Cost Functions (PaNDa+ Framework)
- 3 Evaluation Methodology and Experiments
- 3.1 Parameter Setting of Pattern Mining Algorithms
- 3.2 Supervised Evaluation of Pattern Set
- 3.3 Experimental Results
- 4 Related Work
- 5 Conclusions
- References
- Finding Banded Patterns in Data: The Banded Pattern Mining Algorithm
- 1 Introduction
- 2 Previous Work
- 3 Formalism
- 4 The Banded Pattern Mining Algorithm
- 5 Worked Example
- 6 Evaluation
- 6.1 Effect of Data Set Size
- 6.2 Comparison with BPM, MBA and BC
- 6.3 Large Scale Study
- 7 Conclusion
- References
- Discrimination-Aware Association Rule Mining for Unbiased Data Analytics
- Abstract
- 1 Introduction
- 2 Related Work
- 2.1 Discrimination-Aware Methods
- 2.2 Association Rule Methods
- 3 The Proposed Method DAAR
- 3.1 DCI Measure
- 3.2 Discrimination Score
- 3.3 DAAR Algorithm
- 4 Datasets and Experimental Setup
- 5 Results and Discussion
- 6 Conclusions
- References
- Social Computing
- Big Data Analytics of Social Networks for the Discovery of ``Following'' Patterns
- 1 Introduction and Related Works
- 2 Background
- 3 Our Data Analytics Solution for Mining ``Following'' Patterns from Big Social Network Data
- 3.1 ``Following'' Relationships in Big Social Networks
- 3.2 Discovery of ``Following'' Patterns
- 3.3 The First Set of Map-Reduce Functions in BigFoP
- 3.4 The Second Set of Map-Reduce Functions in BigFoP
- 3.5 Subsequent Sets of Map-Reduce Functions in BigFoP
- 4 Observations, Evaluation and Discussion
- 5 Conclusions
- References
- Sentiment Extraction from Tweets: Multilingual Challenges
- 1 Introduction
- 2 Related Work
- 3 Data
- 4 Overview of Approach
- 4.1 Preprocessing
- 4.2 Features
- 4.3 Negation Identification and Polarity Reversal
- 4.4 Challenges of the Greek Language
- 5 Experiments
- 5.1 Greek Data
- 5.2 English Data
- 5.3 Time Consumption
- 5.4 Sensitivity Analysis
- 6 Conclusion and Future Work
- References
- TiDE: Template-Independent Discourse Data Extraction
- Abstract
- 1 Introduction
- 2 Related Work
- 3 Preliminaries
- 4 Template-Independent Discourse Data Extraction (TiDE)
- 4.1 Locate Comment Blocks
- 4.2 Extraction of Comments, Discussion Structure and Commenter
- 4.2.1 Discover Comment Text with Discussion Structure
- 4.2.2 Identification of Author Information
- 5 Experiments
- 5.1 Experimental Setup
- 5.2 Comparative Study of Comment Block Discovery Techniques
- 5.3 Effectivness of TiDE
- 5.4 News Comment Crawler Based on TiDE
- 6 Conclusion
- References
- Heterogeneous Networks and Data
- A New Relevance Measure for Heterogeneous Networks
- Abstract
- 1 Introduction
- 2 Related Work
- 3 A Novel Relevance Measure
- 3.1 Properties of the Proposed Measure
- 4 Illustration
- 5 Experimental Setup and Results
- 5.1 Dataset
- 5.2 Performance Comparison for Clustering
- 5.3 Performance Comparison for Query Task
- 5.4 Time Complexity Analysis
- 6 Conclusion and Future Research Directions
- References
- UFOMQ: An Algorithm for Querying for Similar Individuals in Heterogeneous Ontologies
- 1 Introduction
- 2 Related Work
- 3 Problem Definition
- 4 Unified Fuzzy Ontology Matching (UFOM)
- 5 Query Execution
- 6 Experimental Evaluation
- 7 Conclusion
- References
- Semantics-Based Multidimensional Query Over Sparse Data Marts
- 1 Introduction
- 2 Case Study
- 3 Semantic Multidimensional Model
- 4 Query Rewriting
- 5 Query Completeness
- 5.1 Completeness Procedure
- 5.2 Implementation and Computational Aspects
- 6 Related Work
- 7 Conclusion
- References
- Data Warehouses
- Automatically Tailoring Semantics-Enabled Dimensions for Movement Data Warehouses
- 1 Introduction
- 2 Basic Definitions
- 3 Approach to Tailor Dimensions for MDWs
- 4 Experiments
- 5 Related Work
- 6 Conclusions and Future Work
- References
- Real-Time Snapshot Maintenance with Incremental ETL Pipelines in Data Warehouses
- 1 Introduction
- 2 Related Work
- 3 Consistency Model
- 4 Workload Scheduler
- 5 Incremental ETL Pipeline on Kettle
- 6 Experimental Results
- 7 Conclusion
- References
- Eco-Processing of OLAP Complex Queries
- 1 Introduction
- 2 Related Work
- 3 Our Methodology
- 3.1 What Are the Key Parameters?
- 3.2 CPU and IO Costs for Pipelines
- 3.3 Estimation of IO and CPU Parameters
- 4 Experimental Evaluation
- 5 Conclusion
- References
- Materializing Baseline Views for Deviation Detection Exploratory OLAP
- 1 Introduction
- 2 Related Work
- 3 Using Baseline Materialized Views
- 4 Formal Framework
- 5 Baseline Materialized Views Life-Cycle
- 5.1 Updating Baseline Materialized Views
- 5.2 Statistics Structure and Merging Operation
- 5.3 Materialization Algorithm
- 5.4 Detection and Alerting
- 6 Proof of Concept
- 7 Conclusions and Future Work
- References
- Stream Processing
- Binary Shapelet Transform for Multiclass Time Series Classification
- 1 Introduction
- 2 Shapelet Based Classification
- 3 Classification Technique
- 4 Shapelet Transform Refinements
- 4.1 Binary Shapelets
- 4.2 Changing the Shapelet Evaluation Order
- 5 Results
- 5.1 Accuracy Improvement on Multi-class Problems
- 5.2 Accuracy Comparison to Other Shapelet Methods
- 5.3 Average Case Time Complexity Improvements
- 6 Conclusion
- References
- StreamXM: An Adaptive Partitional Clustering Solution for Evolving Data Streams
- 1 Introduction
- 2 Related Work
- 3 The StreamXM Algorithms
- 3.1 StreamXM with Lloyds
- 3.2 StreamXM
- 4 Experimental Setup
- 4.1 Datasets
- 5 Results
- 5.1 Performance and Quality of StreamXM on Static Datasets
- 5.2 Performance and Quality of StreamXM Run on Evolving Dataset Streams
- 5.3 Performance and Quality of StreamXM on Real-World Dataset
- 6 Conclusions
- References
- Data Stream Mining with Limited Validation Opportunity: Towards Instrument Failure Prediction
- 1 Introduction
- 2 Formalism
- 3 Failure Prediction
- 4 Evaluation
- 4.1 Simulation Environment
- 4.2 Evaluation Metrics
- 4.3 Single Sentinel Attribute Evaluation
- 4.4 Multiple Attribute Evaluation with a Single Sentinel Attribute
- 5 Summary and Conclusions
- References
- Distributed Classification of Data Streams: An Adaptive Technique
- 1 Introduction
- 2 Mining Data Streams via Granularity-Based Approaches
- 3 Classification Methods for Wireless Sensor Networks: The RA-Class Approach
- 4 Experimental Assessment and Analysis
- 5 Related Work
- 6 Conclusions and Future Work
- References
- New Word Detection and Tagging on Chinese Twitter Stream
- 1 Introduction
- 1.1 An Unsupervised Statistical Method for Detecting Out-of-Vocabulary (OOV) Words in Chinese Tweets
- 1.2 A Novel Method to Annotate New Word in Tweets by Tagging
- 2 New Word Detection
- 2.1 Definition of New Word
- 2.2 Word Extraction
- 3 New Word Tagging
- 3.1 Context Cosine Similarity
- 3.2 Choose Tag Word
- 4 Experiment
- 4.1 Dataset Setting
- 4.2 New Word Detection Result
- 4.3 New Word Tagging Result
- 5 Conclusion and Future Work
- References
- Applications of Big Data Analysis
- Text Categorization for Deriving the Application Quality in Enterprises Using Ticketing Systems
- 1 Introduction
- 2 Related Work
- 2.1 Identification of Application Quality via Tickets
- 2.2 Text Categorization of Support Tickets
- 3 Automatic Classification of Ticket Data
- 3.1 Dataset
- 3.2 Methodology
- 3.3 Metrics
- 4 Evaluation
- 4.1 Performance of Machine Learning Algorithms
- 4.2 Impact of Different Algorithms on Business-Relevant Metrics
- 5 Conclusions
- References
- MultiSpot: Spotting Sentiments with Semantic Aware Multilevel Cascaded Analysis
- 1 Introduction
- 2 Related Work
- 3 Multilevel Sentiment Detection
- 3.1 Problem Definition
- 3.2 Proposed Approach
- 4 Experiments and Results
- 4.1 Datasets
- 4.2 MultiSpot Fundamental Processes
- 4.3 MultiSpot Results and Methods Comparisons
- 5 Conclusions and Future Work
- References
- Online Urban Mobility Detection Based on Velocity Features
- 1 Introduction
- 2 Related Work
- 2.1 Model Abstraction
- 2.2 Probabilistic Model Generation
- 3 Online Mobility Model Generation
- 3.1 Architecture Overview
- 3.2 Trajectory Abstraction
- 3.3 Model Composition
- 3.4 General Workflow
- 4 Experiments
- 5 Conclusions and Future Work
- References
- Big Data
- Partition and Conquer: Map/Reduce Way of Substructure Discovery
- 1 Introduction
- 2 Related Work
- 3 Preliminaries and Problem Definition
- 4 Our Approach
- 4.1 Input Graph Representation
- 4.2 Partition Management
- 5 Map/Reduce Based Substructure Discovery in Graphs
- 5.1 M/R Algorithm Using Arbitrary Partitions (dynamicAL-SD)
- 5.2 M/R Algorithm Using Range-Based Partitions (staticAL-SD)
- 6 Experimental Analysis
- 7 Conclusion
- References
- Implementation of Multidimensional Databases with Document-Oriented NoSQL
- Abstract
- 1 Introduction
- 2 State of the Art
- 3 Multidimensional Conceptual Model and Olap Cube
- 3.1 Conceptual Multidimensional Model
- 3.2 The OLAP Cuboid
- 4 Document-Oriented Modeling of Multidimensional Data Warehouses
- 4.1 Formalism for Document-Oriented Data Models
- 4.2 Document-Oriented Models for Data Warehousing
- 4.3 Mapping from the Conceptual Model
- 5 Experiments
- 5.1 Protocol
- 5.2 Results
- 6 Conclusion
- References
- A Graph-Based Concept Discovery Method for n-Ary Relations
- 1 Introduction
- 2 Background and Related Work
- 3 The Proposed Approach
- 3.1 Graph Representation for n-Ary Relations
- 3.2 Path-Based Concept Discovery Under N-Ary Relations Graph
- 4 Experimental Analysis
- 4.1 Data Sets
- 4.2 Experimental Results
- 5 Conclusion
- References
- Exact Detection of Information Leakage in Database Access Control
- 1 Introduction
- 2 The Problem Statement
- 3 Using Data Exchange for Information-Leak Disclosure
- 3.1 Reviewing Data Exchange
- 3.2 Data Exchange in Information-Leak Disclosure
- 4 View-Verified Data Exchange
- 4.1 The Intuition
- 4.2 The Chase Formalism for View-Verified Data Exchange
- 4.3 View-Verified Data Exchange for Conjunctive Instances
- 5 Related Work
- 6 Conclusion
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.