
Big Data Analytics and Knowledge Discovery
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
This book constitutes the proceedings of the 26th International Conference on Big Data Analytics and Knowledge Discovery, DaWaK 2024, which too place in Naples, Italy, during August 26-28, 2024.
The 16 full and 20 short papers included in this book were carefully reviewed and selected from 83 submissions. They were organized in topical sections as follows: Modeling and design; entity matching and similarity; classification; machine learning methods and applications; time series; data repositories;optimization; and data quality and applications.
More details
Other editions
Additional editions

Content
- Intro
- Preface
- Organization
- Abstracts of Keynote Talks
- Multimodal Deep Learning in Medical Imaging
- Digital Humanism as an Enabler for a Holistic Socio-Technical Approach to the Latest Developments in Computer Science and Artificial Intelligence
- Deep Entity Processing in the Era of Large Language Models: Challenges and Opportunities
- Contents
- Modeling and Design
- LiteSelect: A Lightweight Adaptive Learning Algorithm for Online Index Selection
- 1 Introduction
- 2 The Online Index Selection Problem
- 3 LiteSelect: An Lightweight Online Index Tuner
- 3.1 Algorithm LiteSelect
- 3.2 Fine Tuning LiteSelect
- 4 Experimental Evaluation
- 4.1 Experimental Setup
- 4.2 Parameter Impact Analysis
- 4.3 Index Tuning Performance Comparison
- 5 Related Work
- 6 Conclusion
- References
- IDAGEmb: An Incremental Data Alignment Based on Graph Embedding
- 1 Introduction
- 2 Background
- 2.1 Existing Data Alignment Approaches
- 2.2 Graph Embedding in Representation Learning
- 2.3 Discussion
- 3 Methodology
- 3.1 Research Design
- 3.2 Preliminaries
- 3.3 Adopted Algorithm for IDAGEmb
- 4 Experiments and Results
- 4.1 Experiment Configuration
- 4.2 Experiment #1: Embedding Method Selection
- 4.3 Experiment #2: Comparison with Static Methods (effectiveness and Efficiency)
- 4.4 Experiment #3: Model Sensitivity to Data Order Variation
- 5 Conclusion and Outlook
- References
- Learning Paradigms and Modelling Methodologies for Digital Twins in Process Industry
- 1 Introduction and Motivation
- 1.1 Research Questions (RQs)
- 1.2 Structure of Review
- 2 Literature Search Strategy
- 2.1 Quality Assessment Checks
- 2.2 Selection of Primary Studies
- 2.3 Data Synthesis and Analysis Approach
- 3 Reporting the Review
- 3.1 Overview of All Studies
- 3.2 Overview of All Primary Studies
- 4 Evaluating the Research Questions
- 5 Discussion and Conclusion
- References
- Entity Matching and Similarity
- MultiMatch: Low-Resource Generalized Entity Matching Using Task-Conditioned Hyperadapters in Multitask Learning
- 1 Introduction
- 2 Background
- 2.1 Problem Formulation
- 2.2 Entity Matching with Single-task Objective Models
- 2.3 Fully Fine-tuning Methods
- 2.4 Parameter-Efficient Fine-tuning Methods
- 2.5 Entity Matching with Parameter-Efficient Multi-task Models
- 3 MultiMatch Training
- 4 Experiments
- 5 Analysis
- 5.1 Single Versus Multiple Objective Models
- 5.2 Task Ablation Experiments
- 6 Conclusions and Future Work
- References
- Embedding-Based Data Matching for Disparate Data Sources
- 1 Context and Main Issues
- 2 Proposed Framework
- 2.1 Problem Statement
- 2.2 Overview
- 3 Experiments
- 3.1 RQ1. Effectiveness and Stability
- 3.2 RQ2. Ablation
- 4 Conclusion
- References
- Subtree Similarity Search Based on Structure and Text
- 1 Introduction
- 2 Problem Definition
- 3 Related Works
- 3.1 Tree Edit Distance
- 3.2 Lower Bounds of Tree Edit Distance
- 3.3 Upper Bounds of Tree Edit Distance
- 3.4 Subtree Similarity Search
- 3.5 Other Related Problems
- 4 Preliminaries
- 5 Proposed Method
- 6 Experiments
- 6.1 Dataset
- 6.2 Methods
- 6.3 Effect of the Recall
- 6.4 Effect of the Document Size
- 6.5 Effect of the Query Size
- 6.6 Accuracy
- 7 Conclusion
- References
- Classification
- Towards Hybrid Embedded Feature Selection and Classification Approach with Slim-TSF
- 1 Introduction
- 2 Related Work
- 3 Methodology
- 4 Experimental Evaluations
- 4.1 Data Collection
- 4.2 Experimental Settings
- 4.3 Bootstrapping
- 4.4 Remarks
- 5 Conclusions
- References
- Evaluation of High Sparsity Strategies for Efficient Binary Classification
- 1 Introduction
- 2 Related Work
- 3 Materials and Methods
- 4 Results and Discussion
- 5 Conclusions and Future Work
- References
- Incremental SMOTE with Control Coefficient for Classifiers in Data Starved Medical Applications
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 An Incremental Synthetic Data Generation System
- 4 Experiments
- 4.1 Datasets and Experiments Setup
- 4.2 Statistical Analysis
- 4.3 Performance Evaluation on Classifiers
- 5 Conclusions
- References
- Exploring Evaluation Metrics for Binary Classification in Data Analysis: the Worthiness Benchmark Concept
- 1 Introduction and Related Research
- 2 Methodology
- 3 Discussion and Conclusion
- References
- Machine Learning Methods and Applications
- Exploring Causal Chain Identification: Comprehensive Insights from Text and Knowledge Graphs
- 1 Introduction
- 2 Related Work
- 3 Methodology
- 3.1 In-Chain Domain Knowledge
- 3.2 CK-CEVAE
- 3.3 Chained Prediction Unit
- 4 Experiments
- 4.1 Chains Acquisition
- 4.2 Domain Detection Model
- 4.3 Models Configurations
- 4.4 Overall Analysis
- 4.5 Ablation Study
- 5 Case Study: Understanding Semantic Continuity in Knowledge Graphs
- 6 Discussion
- 7 Conclusion
- References
- Towards Regional Explanations with Validity Domains for Local Explanations
- 1 Introduction
- 2 Related Work
- 2.1 Explanation Methods
- 2.2 Explanation Evaluation Metrics
- 2.3 Validity Domain of Models
- 3 Toy Example
- 4 Our Proposal
- 4.1 Validity Domain
- 4.2 Model Summary
- 4.3 Evaluation Metrics
- 5 Experiments
- 5.1 Protocol
- 5.2 Evaluation of Methods
- 5.3 Model Summary
- 5.4 Sensitivity Analysis
- 6 Discussion and Limits
- 7 Conclusion and Perspectives
- References
- Analyzing a Decade of Evolution: Trends in Natural Language Processing
- 1 Introduction
- 2 Methodology
- 2.1 PDF Parsing
- 3 Results
- 4 Conclusion
- 5 Limitations
- References
- Improving Serendipity for Collaborative Metric Learning Based on Mutual Proximity
- 1 Introduction
- 2 Background
- 2.1 Serendipity
- 2.2 Collaborative Metric Learning (CML)
- 2.3 Mutual Proximity (MP)
- 2.4 Advantages and Originality of the Proposed Method
- 3 Methodology
- 3.1 Learning Embeddings
- 3.2 Searching Embedding Space and Recommending Items
- 4 Experiments
- 4.1 Datasets
- 4.2 Metrics
- 4.3 Results
- 5 Conclusions and Discussion
- References
- Ada2vec: Adaptive Representation Learning for Large-Scale Dynamic Heterogeneous Networks
- 1 Introduction
- 2 Related Work
- 3 Problem Definition
- 4 The Ada2vec Framework
- 4.1 Part 1 Dynamic
- 4.2 Part 2 Heterogeneity
- 4.3 Part 3 Change
- 5 Experimental Evaluations
- 5.1 Data
- 5.2 Benchmarks
- 5.3 Classification
- 5.4 Clustering
- 5.5 Performance Analysis
- 6 Conclusion and Future Work
- References
- Differentially-Private Neural Network Training with Private Features and Public Labels
- 1 Introduction
- 2 Background
- 2.1 Differential Privacy
- 2.2 DP-SGD
- 3 Related Work
- 4 Proposed Approach
- 4.1 Sanitization Layer
- 4.2 Bounding Sensitivity and Adding Noise
- 4.3 Design Choices and Tradeoffs
- 5 Experimental Evaluation
- 5.1 Experimental Settings
- 5.2 Results
- 6 Conclusion
- References
- Time Series
- Series2Graph++: Distributed Detection of Correlation Anomalies in Multivariate Time Series
- 1 Introduction
- 2 Related Work
- 3 Series2Graph++
- 4 Experiments
- 5 Conclusion
- References
- Anomaly Detection from Time Series Under Uncertainty
- 1 Introduction
- 2 Related Work
- 3 Proposed Approach
- 4 Experiments
- 4.1 Uncertainty Quantification Evaluation
- 4.2 Model Performance
- 5 Conclusion
- References
- Comparison of Measures for Characterizing the Difficulty of Time Series Classification
- 1 Introduction
- 2 Methodology
- 2.1 Data and Models
- 2.2 Complexity Measures
- 3 Analysis
- 3.1 Correlation Analysis
- 3.2 Relationships Between the Complexity Measures
- 4 Conclusion
- References
- Dynamic Time Warping for Phase Recognition in Tribological Sensor Data
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Dynamic Time Warping (DTW)
- 3.2 Tribological Use Case
- 3.3 Experiments
- 4 Results
- 4.1 Classification of the Whole Wear Phases
- 4.2 Partial Classification of the Wear Phases
- 5 Conclusion
- References
- Data Repositories
- Putting Co-Design-Supporting Data Lakes to the Test: An Evaluation on AEC Case Studies
- 1 Motivation: Data Management in AEC
- 2 ArchIBALD Architecture Development and Definition
- 2.1 Requirement Analysis
- 2.2 Design of the ArchIBALD Architecture
- 3 Scenario-Based Case Studies: Context and Overview
- 3.1 The livMatS Biomimetic Shell
- 3.2 Co-Design of Robotic Prefabrication
- 3.3 Co-Design of End-Effectors for On-Site Assembly
- 3.4 Co-Design of On-Site Planning and Execution
- 4 Evaluation
- 4.1 Case Study 1: Co-Design of Robotic Prefabrication
- 4.2 Case Study 2: Co-Design of End-Effectors
- 4.3 Case Study 3: Co-Design of On-Site Planning and Execution
- 5 Conclusion
- References
- Creating and Querying Data Cubes in Python Using PyCube
- 1 Introduction
- 2 Related Work
- 3 Preliminaries
- 4 Use Case
- 4.1 Initializing PyCube
- 4.2 Analyzing the Data in the View
- 5 Populating the View
- 5.1 Generating the SQL Query
- 5.2 Converting Result Sets to Dataframes
- 6 Experiments
- 6.1 Experimental Setup
- 6.2 Data Retrieval Speeds
- 6.3 Memory Usage
- 6.4 Code Comparison
- 7 Conclusion and Future Work
- References
- An E-Commerce Benchmark for Evaluating Performance Trade-Offs in Document Stores
- 1 Introduction
- 2 Benchmark Design
- 2.1 E-Commerce Application
- 2.2 Data Models and Benchmark Queries
- 2.3 Benchmark Implementation
- 3 Conclusion
- References
- Optimization
- Effective Reward Schemes for Tardiness Optimization
- 1 Introduction
- 2 Related Work
- 3 Technical Problem Statement
- 4 Reward Function
- 5 Experimental Results
- References
- A Novel Technique for Query Plan Representation Based on Graph Neural Nets
- 1 Introduction
- 2 Problem Statement
- 3 Related Work
- 4 Model Architecture
- 4.1 Feature Encoding
- 4.2 Bidirectional GNN for Query Plan Tree
- 5 Experimental Study
- 5.1 Experimental Setup
- 5.2 Existing Tree Model Cost Estimation Performance
- 5.3 GNN-Based Tree Model Cost Estimation Performance
- 5.4 Plan Selection Performance and Analysis
- 6 Conclusions and Future Work
- References
- FairMC Fair-Markov Chain Rank Aggregation Methods
- 1 Introduction
- 2 Markov Chain Methods for Rank Aggregation
- 3 FairMC
- 4 Experiments
- 4.1 Performance Evaluation
- 5 Conclusions
- References
- LSiX: A Scheme for Efficient Multiple Continuous Window Aggregation Over Streams
- 1 Introduction
- 2 Related Work
- 3 Proposed Method: Longest-Shortest-Window-Based Indexing (LSiX)
- 4 Experiment
- 4.1 Data and Evaluations
- 4.2 Varying the Window Size
- 5 Conclusion
- References
- Applications
- QPAVE: A Multi-task Question Answering Approach for Fine-Grained Product Attribute Value Extraction
- 1 Introduction
- 2 Related Work
- 3 QPAVE
- 3.1 Problem Definition
- 3.2 Model Overview
- 3.3 Question Answering
- 3.4 Adaptive Decoder
- 3.5 Category Classifier
- 3.6 Masked Language Modelling
- 4 Experimental Setup
- 5 Experimental Results
- 5.1 Results on All and Selected Attributes
- 5.2 Results of Discovering New Attributes
- 5.3 Ablation Study
- 6 Conclusion
- References
- Open-Source Drift Detection Tools in Action: Insights from Two Use Cases
- 1 Introduction
- 2 Architecture
- 3 Comparative Analysis
- 4 Conclusion
- A Study on Database Intrusion Detection Based on Query Execution Plans
- 1 Introduction
- 2 QEP-Based Detection of Anomalous SQL Queries
- 3 Experimental Evaluation
- 4 Related Work
- 5 Conclusions and Future Work
- References
- Visual Transformers Meet Convolutional Neural Networks: Providing Context for Convolution Layers in Semantic Segmentation of Remote Sensing Photovoltaic Imaging
- 1 Introduction
- 2 Related Work
- 3 Materials and Methods
- 3.1 Dataset
- 3.2 Semantic Segmentation Models
- 3.3 Training Phase and Performance Evaluation
- 4 Results
- 5 Discussion
- 6 Conclusion
- References
- Data Quality and Applications
- NADA: NMF-Based Anomaly Detection in Adjacency-Matrices for Industrial Machine Log-Files
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Preparation of Event-Logs
- 3.2 NADA Using Event-Logs
- 4 Experiments
- 4.1 Experimental Setup and NADA Settings
- 4.2 Scenarios Description
- 4.3 Results
- 5 Discussion and Conclusion
- References
- Trusting Fair Data: Leveraging Quality in Fairness-Driven Data Removal Techniques
- 1 Introduction and Related Work
- 2 Enhancing Trust in Fair Data
- 2.1 Fairness
- 2.2 Coverage
- 2.3 Data Loss
- 3 Optimization Objectives
- 3.1 Multi-Objective Optimization
- 3.2 Single-Objective Optimization
- 4 Evaluation
- 4.1 Hyperparameter Optimization
- 4.2 Bias Mitigation and Classification Performance
- 5 Conclusion
- References
- ``The Absence of Evidence is Not the Evidence of Absence'': Fact Verification via Information Retrieval-Based In-Context Learning
- 1 Introduction
- 2 In-Context Learning for Claim Validity
- 3 Evaluation
- 4 Conclusions and Future Work
- References
- Discovering Relationships Among Properties in Wikidata Knowledge Graph
- 1 Introduction
- 2 Preliminaries and Motivation
- 3 Discovering Relationships Among Properties
- 4 Empirical Evaluation
- 5 Related Work
- 6 Conclusions and Future Work
- References
- Using a Spatial Grid Model to Interpret Players Movement in Field Sports
- 1 Introduction
- 2 Related Research
- 3 Methodology
- 4 Results
- 4.1 Spatial Mapping of the Pitch
- 4.2 TWG Generation
- 4.3 Analysis of Areas of Activity
- 5 Conclusion and Future Work
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.