Big Data Analytics and Knowledge Discovery

Name: Big Data Analytics and Knowledge Discovery | 19th International Conference, DaWaK 2017, Lyon, France, August 28-31, 2017, Proceedings
Brand: Springer
Price: 53.49 EUR
Availability: OnlineOnly

19th International Conference, DaWaK 2017, Lyon, France, August 28-31, 2017, Proceedings

Ladjel Bellatreche Sharma Chakravarthy(Editor)

Springer (Publisher)

Published on 11. August 2017

XIV, 488 pages

E-Book

PDF with digital watermarking

System requirements

978-3-319-64283-3 (ISBN)

€53.49incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Intro
Preface
Organization
Contents
New Generation Data Warehouses Design
Evaluation of Data Warehouse Design Methodologies in the Context of Big Data
Abstract
1 Introduction
2 Methodology Classification
3 Metrics for Design Evaluation of Methodologies
3.1 Metrics for Methodology Evaluation
3.2 Metrics for Schema Quality Evaluation
4 Experimental Results
4.1 Methodology Evaluation
4.2 Schema Evaluation
5 Conclusion
References
Optimal Task Ordering in Chain Data Flows: Exploring the Practicality of Non-scalable Solutions
1 Introduction
2 Preliminaries
2.1 Problem Complexity
2.2 Chains in TPC-DI
3 Accurate Algorithms for Linear Execution Plans
3.1 Backtracking
3.2 Dynamic Programming
3.3 Topological Sorting
4 Evaluation of the Time Overhead
5 Related Work
6 Conclusions
References
Exploiting Mathematical Structures of Statistical Measures for Comparison of RDF Data Cubes
1 Introduction
2 Model and Data Representation
3 Structural Comparison of RDF Data Cubes
3.1 Computability and Comparability
3.2 Comparison Functionalities
3.3 Experimentation
4 Conclusion
References
S2D: Shared Distributed Datasets, Storing Shared Data for Multiple and Massive Queries Optimization in a Distributed Data Warehouse
1 Introduction
2 Related Work
3 Overview of Shared Distributed Datasets
3.1 Phase 1: The Logical Representation
3.2 Phase 2: The Physical Representation
4 Experimental Evaluation
4.1 Experimental Setup
4.2 Experimental Results and Discussion
5 Conclusion and Future Work
References
Cloud and NoSQL Databases
Enforcing Privacy in Cloud Databases
1 Introduction
2 Non-cryptographic Methods
2.1 Differential Privacy
2.2 Data Anonymization
2.3 Data Fragmentation
3 Secret Sharing-Based Methods
3.1 Verifiable Secret Sharing
3.2 Order-Preserving Secret Sharing
3.3 Discussion
4 Index-Based Methods
4.1 Bucketization-Based Indexing
4.2 Order-Preserving Indexing
4.3 Searchable Encryption
4.4 Discussion
5 Secure Databases
5.1 CryptDB
5.2 MONOMI
5.3 Multi-valued Order Preserving Encryption (MV-OPE)
5.4 Secure Trusted Hardware
5.5 Discussion
6 Conclusion
6.1 Security
6.2 Query Post-processing
6.3 Storage Overhead
6.4 Computational Overhead
6.5 Wrap-up
References
TARDIS: Optimal Execution of Scientific Workflows in Apache Spark
1 Introduction
2 Problem Definition
3 Background
3.1 Spark
4 TARDIS Engine
4.1 Architecture
4.2 TARDIS Language
4.3 Data Placement
4.4 Scheduling
4.5 Collecting Output Files
5 Experiments
6 Conclusion
References
MDA-Based Approach for NoSQL Databases Modelling
Abstract
1 Introduction
2 Research Problem and Related Work
3 UMLtoNoSQL Approach
3.1 UMLtoGenericModel Transformation
3.2 GenericModeltoPhysicalModel Transformation
4 Experiments
4.1 Implementation
4.2 Evaluation
5 Conclusion and Future Work
References
Advanced Programming Paradigms
MiSeRe-Hadoop: A Large-Scale Robust Sequential Classification Rules Mining Framework
1 Introduction
2 Preliminaries
3 MiSeRe Algorithm
4 MiSeRe Hadoop Algorithm
4.1 Step I:
4.2 Step II:
5 Experiments
6 Conclusion and Future Work
References
An Efficient Map-Reduce Framework to Mine Periodic Frequent Patterns
1 Introduction
2 Background
2.1 Mining Periodic-Frequent Patterns on a Single Machine
2.2 Mining PFPs with Period Summary
2.3 Map-Reduce Framework
2.4 Parallel FP-growth
3 Proposed Approaches
3.1 Parallel Periodic Frequent Pattern Growth (PPF-growth)
3.2 PPF-growth Using Partition Summary
4 Performance Evaluation
5 Conclusion
References
MapReduce-Based Complex Big Data Analytics over Uncertain and Imprecise Social Networks
1 Introduction and Related Work
2 Background: Data Science
3 Mining Complex Big Data in Uncertain and Imprecise Social Networks
3.1 Interdependencies Between Followers and Followees in Complex Big Social Networks
3.2 Discovery of Popular Followees
3.3 The First Set of MapReduce Functions in BigUISN
3.4 The Second Set of MapReduce Functions in BigUISN
3.5 Beyond the Second Set of MapReduce Functions in BigUISN
4 Evaluation, Observations, and Discussion
5 Conclusions and Future Work
References
Non-functional Requirements Satisfaction
A Case for Abstract Cost Models for Distributed Execution of Analytics Operators
1 Introduction
2 Piecewise Linear Model Structure and Training
3 Makespan Model for Sorting
3.1 Round-Time Estimation for Map and Reduce Phase
3.2 Exploiting Model Structure for Optimization
4 Dense Matrix Product
4.1 Makespan Model for Block-Wise Matrix Multiplication
4.2 Optimal Partitioning
5 Experiments
5.1 Basic Setup
5.2 Sorting
5.3 Matrix Multiplication
6 Related Work
7 Conclusions
References
Pre-processing and Indexing Techniques for Constellation Queries in Big Data
1 Introduction
2 Related Works
3 Problem Formulation
4 CQ Processing
4.1 Query Pre-processing
4.2 Query Transformation
4.3 Dataset Pre-processing
5 Experiments
5.1 Query Pre-processing
5.2 PH-tree Versus Quad-Tree
6 Conclusion
References
A Lightweight Elastic Queue Middleware for Distributed Streaming Pipeline
1 Introduction
2 Elastic Queue Middleware
2.1 The Role of EQM in Elastic Streaming Processing Engines
2.2 Implementing EQM Based on HBase
3 Experiments
4 Related Work
5 Conclusion
References
Modeling Data Flow Execution in a Parallel Environment
1 Introduction
1.1 Parallelizing Data Flows
1.2 Assumptions Regarding a Single Multi-core Machine Execution Environment
1.3 Motivation for Devising a New Cost Model
2 Other Related Work
3 Preliminaries
4 Our Cost Model
4.1 A Generalized Cost Model for Response Time
4.2 Models Without Considering the Communication Cost
4.3 Considering Communication Costs
4.4 Considering Partitioned Parallelism
5 Conclusions and Future Work
References
Machine Learning
Accelerating K-Means by Grouping Points Automatically
Abstract
1 Introduction
2 Related Work
3 Proposed Method
3.1 The Framework of Our Algorithm
3.2 Filtering for Clusters of Points
3.3 Fission Step: Grouping Points Automatically
3.4 Filtering for Groups of Points
3.5 Fusion Step: Limiting the Increasing Number of Groups
3.6 Algorithm
4 Experiment and Analysis
4.1 Experiment Design
4.2 Cost Comparison and Relative Speedup
4.3 Separability
4.4 Avoided Distance Calculations
5 Conclusion and Future Work
References
A Machine Learning Trainable Model to Assess the Accuracy of Probabilistic Record Linkage
1 Introduction
2 Related Work
3 Assessing the Accuracy of Record Linkage
4 Machine Learning Algorithms
4.1 Decision Trees
4.2 Gradient Boosted Trees
4.3 Random Forests
4.4 Naïve Bayes
4.5 Linear Support Vector Machine
4.6 Logistic Regression
4.7 Comparative Analysis
5 Proposed Trainable Model
5.1 Pre-processing
5.2 Transformation
5.3 Model Selection
5.4 Model Execution
6 Experimental Results
7 Conclusions and Future Work
References
An Efficient Approach for Instance Selection
1 Introduction
2 Related Works
3 Notations
4 The XLDIS Algorithm
5 Experiments
6 Conclusion
References
Search Result Personalization in Twitter Using Neural Word Embeddings
1 Introduction
2 Related Work
2.1 Twitter Search
2.2 Personalized Twitter Search
3 Our Approach
3.1 User Modeling
3.2 Results Re-ranking
4 Evaluation
4.1 Twitter Lists Based Evaluation
4.2 Hashtags Based Evaluation
5 Conclusions
References
Diverse Selection of Feature Subsets for Ensemble Regression
1 Introduction
2 Related Work
3 Diverse Subset Selection Strategy (DS3)
3.1 Problem Overview
3.2 Solution Overview
3.3 Relevance Based Generation of Initial Candidates
3.4 Multiple Feature Sets Based on Difference and Quality
3.5 Unifying Multiple Subsets by Ensemble Regression
3.6 Time Complexity
4 Experiments
4.1 Synthetic Data Sets
4.2 Real-World Data Sets
4.3 Parameter Analysis
4.4 Iterations
5 Conclusions
References
K-Means Clustering Using Homomorphic Encryption and an Updatable Distance Matrix: Secure Third Party Data Clustering with Limited Data Owner Interaction
1 Introduction
2 Related Work
3 Preliminaries
3.1 K-Means Clustering
3.2 Liu's Homomorphic Encryption Scheme
4 The Updatable Distance Matrix Concept
5 Secure K-Means Clustering Using the UDM Concept
5.1 Data Owner Process
5.2 Third Party Process
6 Evaluation
7 Conclusion
References
Reweighting Forest for Extreme Multi-label Classification
Abstract
1 Introduction
2 Related Work
3 Proposed Method
3.1 Problem Definition and Proposed Framework
3.2 The Reweighting Phase
3.3 The Pretesting Phase
4 Experiments
4.1 Experimental Setup
4.2 Experimental Results
5 Conclusion
References
Social Media and Twitter Analysis
A Relativistic Opinion Mining Approach to Detect Factual or Opinionated News Sources
1 Introduction
2 Related Work
3 Experimental Setup
3.1 Dataset
3.2 Knowledge-Base and Preprocessing
3.3 Sentiment Analysis
4 Experimental Results
5 Conclusion
6 Future Work
References
A Reliability-Based Approach for Influence Maximization Using the Evidence Theory
1 Introduction
2 Related Works
2.1 Influence Maximization Models
2.2 Influence and Theory of Belief Functions
3 Theory of Belief Functions
4 Reliability-Based Influence Maximization
4.1 Influence Characterization
4.2 Estimating Reliability
4.3 Influence Estimation
5 Results and Discussion
6 Conclusion
References
Sentiment Analysis on Twitter to Improve Time Series Contextual Anomaly Detection for Detecting Stock Market Manipulation
1 Introduction
2 Methods
2.1 Sentiment Analysis on Twitter
2.2 Data
2.3 Data Preprocessing
2.4 Modelling
2.5 Feature Selection
2.6 Classifiers
2.7 Classifier Evaluation
2.8 Calculating Polarity for Each Stock
3 Results and Discussion
References
Automatic Segmentation of Big Data of Patent Texts
1 Introduction
2 Related Work
3 Segmentation Guidelines
4 Methods and Evaluations
4.1 Workflow
4.2 Headings Identification
4.3 Meaning of Headings (Semantic of Headings)
4.4 Heuristic Methods
4.5 Big Data Approach
4.6 Implementation
4.7 Evaluation
5 Conclusion
References
Sentiment Analysis and User Influence
Tag Me a Label with Multi-arm: Active Learning for Telugu Sentiment Analysis
1 Introduction
2 Related Work
3 Dataset Generation
3.1 Word Embeddings Generation
3.2 Feature Engineering
4 The Proposed Approach
4.1 Active Learning
4.2 Input to the System
4.3 Query Selection Strategies
4.4 Classification Model for Telugu Sentiment Analysis
5 Experiments and Results
6 Conclusion
6.1 Future Work
References
Belief Temporal Analysis of Expert Users: Case Study Stack Overflow
1 Introduction
2 Related Work
3 Theory of Belief Functions: An Overview
3.1 Particular Belief Functions
3.2 Discounting
3.3 Decision Making
4 Belief Model of Users in Stack Overflow
4.1 Hypothesis
4.2 Definition of Mass Functions
4.3 Data Aggregation and Decision Making
5 Experimental Evaluation and Analysis
5.1 Time Analysis of the Data Set
5.2 Analysis of Users' Behavior over Time
6 Conclusion
References
Leveraging Hierarchy and Community Structure for Determining Influencers in Networks
1 Introduction
1.1 Contributions and Organization
2 Related Work
3 Preliminaries
4 Influence Scoring Using Position, Reachability and Interaction
4.1 Trussness Based Hierarchical Decomposition
4.2 Positional Index
4.3 Reachability Index
4.4 Interaction Index
4.5 Influence Score
5 Experimental Analysis
5.1 Investigation Using SIR Model
5.2 Monotonicity
6 Conclusion
References
Using Social Media for Word-of-Mouth Marketing
1 Introduction
2 Related Work
3 Problem Definition
4 Analysis of Online Social Groups
5 Social Interaction Graph
5.1 Measuring Topical Relevance
6 Finding Influential Users in OSG
7 Reinforced Marketing
8 Evaluations
8.1 Experimental Setup
8.2 Evaluation Metrics
8.3 Effectiveness of Algorithms
8.4 Precision Analysis
8.5 Marketing Across Topics
8.6 Empirical Evaluation
8.7 Temporal Dynamics
9 Conclusion
References
Knowledge Discovery
Knowledge Discovery of Complex Data Using Gaussian Mixture Models
1 Introduction
2 Related Work
2.1 Data Representations
2.2 Similarity Measures
2.3 Indexes
3 Methods
3.1 Gaussian Mixture Models
3.2 Infinite Euclidean Distance for Distributions
4 Experimental Evaluations
4.1 Data Sets
4.2 Query Performance
4.3 Classification on NBA Data
4.4 Clustering on Weather Data
5 Conclusions and Future Work
References
Optimized Mining of Potential Positive and Negative Association Rules
1 Introduction and Motivations
2 Preliminary Concepts
3 OM2PNR Algorithm
3.1 Optimization of the Research the Frequent Patterns
3.2 Optimization of the Course of Research the Potential Rules
4 Experimental Resultants
5 Conclusion
References
Extracting Non-redundant Correlated Purchase Behaviors by Utility Measure
1 Introduction
2 Preliminaries and Problem Statement
3 Proposed CoHUIM Algorithm for Mining CoHUIs
3.1 Properties of the CoHUI
3.2 Reducing Database Size Using Projection Mechanism
3.3 Proposed Sorted Downward Closure Property
3.4 Procedure of the Projection-Based CoHUIM Algorithm
4 Experimental Results
4.1 Dataset and Experimental Setup
4.2 Pattern Analysis
4.3 Runtime Analysis
5 Conclusions
References
Data Flow Management and Optimization
Detecting Feature Interactions in Agricultural Trade Data Using a Deep Neural Network
1 Introduction and Motivation
2 Related Research
3 Deep Belief Network Components
3.1 Parameter Initialisation and Optimisation
3.2 Architectural Configuration
4 An Approach to Interpreting Deep Representations
5 Experiments
5.1 Setup
5.2 Results and Analysis
6 Conclusions and Future Work
References
Air Quality Monitoring System and Benchmarking
1 Introduction
2 System Design and Implementation
3 Benchmarking
3.1 Experimental Settings
3.2 Benchmarking Methods
3.3 Benchmarking Results
4 Related Work
5 Conclusions and Future Work
References
Electric Vehicle Charging Station Deployment for Minimizing Construction Cost
1 Introduction
2 Information Extraction
2.1 Idle Trip
2.2 Charging Demand Model
2.3 Impact of Traffic Condition
2.4 Estate Price Model
3 Construction Cost Optimization
4 Evaluation
4.1 Data Set
4.2 Baselines
4.3 Evaluation Metrics
4.4 Experiment Settings and Evaluation Results
5 Related Work
6 Conclusion
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Big Data Analytics and Knowledge Discovery

Description

More details

Other editions

Additional editions

Content

System requirements