Big Data Analytics and Knowledge Discovery

Name: Big Data Analytics and Knowledge Discovery | 26th International Conference, DaWaK 2024, Naples, Italy, August 26-28, 2024, Proceedings
Brand: Springer
Price: 80.24 EUR
Availability: OnlineOnly

26th International Conference, DaWaK 2024, Naples, Italy, August 26-28, 2024, Proceedings

Robert Wrembel Silvia Chiusano Gabriele Kotsis A. Min Tjoa Ismail Khalil(Editor)

Springer (Publisher)

Published on 17. August 2024

XXII, 402 pages

E-Book

PDF with digital watermarking

System requirements

978-3-031-68323-7 (ISBN)

€80.24incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Intro
Preface
Organization
Abstracts of Keynote Talks
Multimodal Deep Learning in Medical Imaging
Digital Humanism as an Enabler for a Holistic Socio-Technical Approach to the Latest Developments in Computer Science and Artificial Intelligence
Deep Entity Processing in the Era of Large Language Models: Challenges and Opportunities
Contents
Modeling and Design
LiteSelect: A Lightweight Adaptive Learning Algorithm for Online Index Selection
1 Introduction
2 The Online Index Selection Problem
3 LiteSelect: An Lightweight Online Index Tuner
3.1 Algorithm LiteSelect
3.2 Fine Tuning LiteSelect
4 Experimental Evaluation
4.1 Experimental Setup
4.2 Parameter Impact Analysis
4.3 Index Tuning Performance Comparison
5 Related Work
6 Conclusion
References
IDAGEmb: An Incremental Data Alignment Based on Graph Embedding
1 Introduction
2 Background
2.1 Existing Data Alignment Approaches
2.2 Graph Embedding in Representation Learning
2.3 Discussion
3 Methodology
3.1 Research Design
3.2 Preliminaries
3.3 Adopted Algorithm for IDAGEmb
4 Experiments and Results
4.1 Experiment Configuration
4.2 Experiment #1: Embedding Method Selection
4.3 Experiment #2: Comparison with Static Methods (effectiveness and Efficiency)
4.4 Experiment #3: Model Sensitivity to Data Order Variation
5 Conclusion and Outlook
References
Learning Paradigms and Modelling Methodologies for Digital Twins in Process Industry
1 Introduction and Motivation
1.1 Research Questions (RQs)
1.2 Structure of Review
2 Literature Search Strategy
2.1 Quality Assessment Checks
2.2 Selection of Primary Studies
2.3 Data Synthesis and Analysis Approach
3 Reporting the Review
3.1 Overview of All Studies
3.2 Overview of All Primary Studies
4 Evaluating the Research Questions
5 Discussion and Conclusion
References
Entity Matching and Similarity
MultiMatch: Low-Resource Generalized Entity Matching Using Task-Conditioned Hyperadapters in Multitask Learning
1 Introduction
2 Background
2.1 Problem Formulation
2.2 Entity Matching with Single-task Objective Models
2.3 Fully Fine-tuning Methods
2.4 Parameter-Efficient Fine-tuning Methods
2.5 Entity Matching with Parameter-Efficient Multi-task Models
3 MultiMatch Training
4 Experiments
5 Analysis
5.1 Single Versus Multiple Objective Models
5.2 Task Ablation Experiments
6 Conclusions and Future Work
References
Embedding-Based Data Matching for Disparate Data Sources
1 Context and Main Issues
2 Proposed Framework
2.1 Problem Statement
2.2 Overview
3 Experiments
3.1 RQ1. Effectiveness and Stability
3.2 RQ2. Ablation
4 Conclusion
References
Subtree Similarity Search Based on Structure and Text
1 Introduction
2 Problem Definition
3 Related Works
3.1 Tree Edit Distance
3.2 Lower Bounds of Tree Edit Distance
3.3 Upper Bounds of Tree Edit Distance
3.4 Subtree Similarity Search
3.5 Other Related Problems
4 Preliminaries
5 Proposed Method
6 Experiments
6.1 Dataset
6.2 Methods
6.3 Effect of the Recall
6.4 Effect of the Document Size
6.5 Effect of the Query Size
6.6 Accuracy
7 Conclusion
References
Classification
Towards Hybrid Embedded Feature Selection and Classification Approach with Slim-TSF
1 Introduction
2 Related Work
3 Methodology
4 Experimental Evaluations
4.1 Data Collection
4.2 Experimental Settings
4.3 Bootstrapping
4.4 Remarks
5 Conclusions
References
Evaluation of High Sparsity Strategies for Efficient Binary Classification
1 Introduction
2 Related Work
3 Materials and Methods
4 Results and Discussion
5 Conclusions and Future Work
References
Incremental SMOTE with Control Coefficient for Classifiers in Data Starved Medical Applications
1 Introduction
2 Related Work
3 Method
3.1 An Incremental Synthetic Data Generation System
4 Experiments
4.1 Datasets and Experiments Setup
4.2 Statistical Analysis
4.3 Performance Evaluation on Classifiers
5 Conclusions
References
Exploring Evaluation Metrics for Binary Classification in Data Analysis: the Worthiness Benchmark Concept
1 Introduction and Related Research
2 Methodology
3 Discussion and Conclusion
References
Machine Learning Methods and Applications
Exploring Causal Chain Identification: Comprehensive Insights from Text and Knowledge Graphs
1 Introduction
2 Related Work
3 Methodology
3.1 In-Chain Domain Knowledge
3.2 CK-CEVAE
3.3 Chained Prediction Unit
4 Experiments
4.1 Chains Acquisition
4.2 Domain Detection Model
4.3 Models Configurations
4.4 Overall Analysis
4.5 Ablation Study
5 Case Study: Understanding Semantic Continuity in Knowledge Graphs
6 Discussion
7 Conclusion
References
Towards Regional Explanations with Validity Domains for Local Explanations
1 Introduction
2 Related Work
2.1 Explanation Methods
2.2 Explanation Evaluation Metrics
2.3 Validity Domain of Models
3 Toy Example
4 Our Proposal
4.1 Validity Domain
4.2 Model Summary
4.3 Evaluation Metrics
5 Experiments
5.1 Protocol
5.2 Evaluation of Methods
5.3 Model Summary
5.4 Sensitivity Analysis
6 Discussion and Limits
7 Conclusion and Perspectives
References
Analyzing a Decade of Evolution: Trends in Natural Language Processing
1 Introduction
2 Methodology
2.1 PDF Parsing
3 Results
4 Conclusion
5 Limitations
References
Improving Serendipity for Collaborative Metric Learning Based on Mutual Proximity
1 Introduction
2 Background
2.1 Serendipity
2.2 Collaborative Metric Learning (CML)
2.3 Mutual Proximity (MP)
2.4 Advantages and Originality of the Proposed Method
3 Methodology
3.1 Learning Embeddings
3.2 Searching Embedding Space and Recommending Items
4 Experiments
4.1 Datasets
4.2 Metrics
4.3 Results
5 Conclusions and Discussion
References
Ada2vec: Adaptive Representation Learning for Large-Scale Dynamic Heterogeneous Networks
1 Introduction
2 Related Work
3 Problem Definition
4 The Ada2vec Framework
4.1 Part 1 Dynamic
4.2 Part 2 Heterogeneity
4.3 Part 3 Change
5 Experimental Evaluations
5.1 Data
5.2 Benchmarks
5.3 Classification
5.4 Clustering
5.5 Performance Analysis
6 Conclusion and Future Work
References
Differentially-Private Neural Network Training with Private Features and Public Labels
1 Introduction
2 Background
2.1 Differential Privacy
2.2 DP-SGD
3 Related Work
4 Proposed Approach
4.1 Sanitization Layer
4.2 Bounding Sensitivity and Adding Noise
4.3 Design Choices and Tradeoffs
5 Experimental Evaluation
5.1 Experimental Settings
5.2 Results
6 Conclusion
References
Time Series
Series2Graph++: Distributed Detection of Correlation Anomalies in Multivariate Time Series
1 Introduction
2 Related Work
3 Series2Graph++
4 Experiments
5 Conclusion
References
Anomaly Detection from Time Series Under Uncertainty
1 Introduction
2 Related Work
3 Proposed Approach
4 Experiments
4.1 Uncertainty Quantification Evaluation
4.2 Model Performance
5 Conclusion
References
Comparison of Measures for Characterizing the Difficulty of Time Series Classification
1 Introduction
2 Methodology
2.1 Data and Models
2.2 Complexity Measures
3 Analysis
3.1 Correlation Analysis
3.2 Relationships Between the Complexity Measures
4 Conclusion
References
Dynamic Time Warping for Phase Recognition in Tribological Sensor Data
1 Introduction
2 Related Work
3 Method
3.1 Dynamic Time Warping (DTW)
3.2 Tribological Use Case
3.3 Experiments
4 Results
4.1 Classification of the Whole Wear Phases
4.2 Partial Classification of the Wear Phases
5 Conclusion
References
Data Repositories
Putting Co-Design-Supporting Data Lakes to the Test: An Evaluation on AEC Case Studies
1 Motivation: Data Management in AEC
2 ArchIBALD Architecture Development and Definition
2.1 Requirement Analysis
2.2 Design of the ArchIBALD Architecture
3 Scenario-Based Case Studies: Context and Overview
3.1 The livMatS Biomimetic Shell
3.2 Co-Design of Robotic Prefabrication
3.3 Co-Design of End-Effectors for On-Site Assembly
3.4 Co-Design of On-Site Planning and Execution
4 Evaluation
4.1 Case Study 1: Co-Design of Robotic Prefabrication
4.2 Case Study 2: Co-Design of End-Effectors
4.3 Case Study 3: Co-Design of On-Site Planning and Execution
5 Conclusion
References
Creating and Querying Data Cubes in Python Using PyCube
1 Introduction
2 Related Work
3 Preliminaries
4 Use Case
4.1 Initializing PyCube
4.2 Analyzing the Data in the View
5 Populating the View
5.1 Generating the SQL Query
5.2 Converting Result Sets to Dataframes
6 Experiments
6.1 Experimental Setup
6.2 Data Retrieval Speeds
6.3 Memory Usage
6.4 Code Comparison
7 Conclusion and Future Work
References
An E-Commerce Benchmark for Evaluating Performance Trade-Offs in Document Stores
1 Introduction
2 Benchmark Design
2.1 E-Commerce Application
2.2 Data Models and Benchmark Queries
2.3 Benchmark Implementation
3 Conclusion
References
Optimization
Effective Reward Schemes for Tardiness Optimization
1 Introduction
2 Related Work
3 Technical Problem Statement
4 Reward Function
5 Experimental Results
References
A Novel Technique for Query Plan Representation Based on Graph Neural Nets
1 Introduction
2 Problem Statement
3 Related Work
4 Model Architecture
4.1 Feature Encoding
4.2 Bidirectional GNN for Query Plan Tree
5 Experimental Study
5.1 Experimental Setup
5.2 Existing Tree Model Cost Estimation Performance
5.3 GNN-Based Tree Model Cost Estimation Performance
5.4 Plan Selection Performance and Analysis
6 Conclusions and Future Work
References
FairMC Fair-Markov Chain Rank Aggregation Methods
1 Introduction
2 Markov Chain Methods for Rank Aggregation
3 FairMC
4 Experiments
4.1 Performance Evaluation
5 Conclusions
References
LSiX: A Scheme for Efficient Multiple Continuous Window Aggregation Over Streams
1 Introduction
2 Related Work
3 Proposed Method: Longest-Shortest-Window-Based Indexing (LSiX)
4 Experiment
4.1 Data and Evaluations
4.2 Varying the Window Size
5 Conclusion
References
Applications
QPAVE: A Multi-task Question Answering Approach for Fine-Grained Product Attribute Value Extraction
1 Introduction
2 Related Work
3 QPAVE
3.1 Problem Definition
3.2 Model Overview
3.3 Question Answering
3.4 Adaptive Decoder
3.5 Category Classifier
3.6 Masked Language Modelling
4 Experimental Setup
5 Experimental Results
5.1 Results on All and Selected Attributes
5.2 Results of Discovering New Attributes
5.3 Ablation Study
6 Conclusion
References
Open-Source Drift Detection Tools in Action: Insights from Two Use Cases
1 Introduction
2 Architecture
3 Comparative Analysis
4 Conclusion
A Study on Database Intrusion Detection Based on Query Execution Plans
1 Introduction
2 QEP-Based Detection of Anomalous SQL Queries
3 Experimental Evaluation
4 Related Work
5 Conclusions and Future Work
References
Visual Transformers Meet Convolutional Neural Networks: Providing Context for Convolution Layers in Semantic Segmentation of Remote Sensing Photovoltaic Imaging
1 Introduction
2 Related Work
3 Materials and Methods
3.1 Dataset
3.2 Semantic Segmentation Models
3.3 Training Phase and Performance Evaluation
4 Results
5 Discussion
6 Conclusion
References
Data Quality and Applications
NADA: NMF-Based Anomaly Detection in Adjacency-Matrices for Industrial Machine Log-Files
1 Introduction
2 Related Work
3 Method
3.1 Preparation of Event-Logs
3.2 NADA Using Event-Logs
4 Experiments
4.1 Experimental Setup and NADA Settings
4.2 Scenarios Description
4.3 Results
5 Discussion and Conclusion
References
Trusting Fair Data: Leveraging Quality in Fairness-Driven Data Removal Techniques
1 Introduction and Related Work
2 Enhancing Trust in Fair Data
2.1 Fairness
2.2 Coverage
2.3 Data Loss
3 Optimization Objectives
3.1 Multi-Objective Optimization
3.2 Single-Objective Optimization
4 Evaluation
4.1 Hyperparameter Optimization
4.2 Bias Mitigation and Classification Performance
5 Conclusion
References
``The Absence of Evidence is Not the Evidence of Absence'': Fact Verification via Information Retrieval-Based In-Context Learning
1 Introduction
2 In-Context Learning for Claim Validity
3 Evaluation
4 Conclusions and Future Work
References
Discovering Relationships Among Properties in Wikidata Knowledge Graph
1 Introduction
2 Preliminaries and Motivation
3 Discovering Relationships Among Properties
4 Empirical Evaluation
5 Related Work
6 Conclusions and Future Work
References
Using a Spatial Grid Model to Interpret Players Movement in Field Sports
1 Introduction
2 Related Research
3 Methodology
4 Results
4.1 Spatial Mapping of the Pitch
4.2 TWG Generation
4.3 Analysis of Areas of Activity
5 Conclusion and Future Work
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Big Data Analytics and Knowledge Discovery

Description

More details

Other editions

Additional editions

Content

System requirements