
Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
This multi-volume set, LNAI 14941 to LNAI 14950, constitutes the refereed proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2024, held in Vilnius, Lithuania, in September 2024.
The papers presented in these proceedings are from the following three conference tracks: -
Research Track: The 202 full papers presented here, from this track, were carefully reviewed and selected from 826 submissions. These papers are present in the following volumes: Part I, II, III, IV, V, VI, VII, VIII.
Demo Track: The 14 papers presented here, from this track, were selected from 30 submissions. These papers are present in the following volume: Part VIII.
Applied Data Science Track: The 56 full papers presented here, from this track, were carefully reviewed and selected from 224 submissions. These papers are present in the following volumes: Part IX and Part X.
More details
Other editions
Additional editions

Content
- Intro
- Preface
- Organization
- Invited Talks Abstracts
- The Dynamics of Memorization and Unlearning
- The Emerging Science of Benchmarks
- Enhancing User Experience with AI-Powered Search and Recommendations at Spotify
- How to Utilize (and Generate) Player Tracking Data in Sport
- Resource-Aware Machine Learning-A User-Oriented Approach
- Contents - Part X
- Applied Data Science Track
- MT-HCCAR: Multi-task Deep Learning with Hierarchical Classification and Attention-Based Regression for Cloud Property Retrieval
- 1 Introduction
- 2 Related Work
- 3 Problem Statement and Data Simulation
- 3.1 Radiative Transfer Simulation
- 3.2 Cloud Property Retrieval
- 4 MT-HCCAR Model
- 4.1 Encoder-Decoder Sub-Network
- 4.2 Hierarchical Classification (HC) Sub-Network
- 4.3 Classification Assisted Regression Sub-Network Based on Cross Attention Mechanism (CAR)
- 4.4 Model Training of MT-HCCAR
- 5 Experiments
- 5.1 Experiment Setup
- 5.2 Evaluation Metrics
- 5.3 Comparison with Baseline Models
- 5.4 Ablation Study
- 5.5 Earth Science Evaluation
- 6 Conclusions
- References
- Machine Learning Based Tool for Automated Sperm Cell Tracking and Sperm Bundle Detection
- 1 Introduction
- 2 Background
- 2.1 Computer-Assisted Sperm Analysis Systems
- 3 Methodology
- 3.1 Sperm Cell Detection
- 3.2 Path Reconstruction
- 3.3 The Kalman Filter Implementation
- 3.4 Bounding Box Classification
- 3.5 Final Analysis
- 4 Results
- 5 Conclusion
- References
- DISCO: An End-to-End Bandit Framework for Personalised Discount Allocation
- 1 Introduction
- 2 Problem Formulation
- 3 Disco Architecture
- 3.1 Action Feature Representation
- 3.2 Context Feature Representation
- 3.3 Reward Prediction: Bayesian Log-Linear Regression
- 3.4 Optimisation of Discount Code Allocation
- 4 Experiments
- 4.1 Information Sharing and Price Elasticity with RBF Encoding
- 4.2 Reward Prediction Model
- 4.3 Active Learning with Global Constraints
- 5 Online A/B Test
- 6 Concluding Discussion
- References
- Advancing Solar Flare Prediction Using Deep Learning with Active Region Patches
- 1 Introduction
- 2 Related Work
- 3 Data and Model
- 4 Experimental Evaluation
- 4.1 Experimental Settings
- 4.2 Evaluation
- 4.3 Discussion
- 5 Conclusion and Future Work
- References
- Exceptional Subitizing Patterns: Exploring Mathematical Abilities of Finnish Primary School Children with Piecewise Linear Regression
- 1 Introduction
- 2 The FUnctional Numerical Assessment Study
- 3 Background
- 3.1 Segmented Linear Regression
- 3.2 Connections to Existing SD/EMM Approaches
- 4 Our Proposed Flattening Approach
- 4.1 Domain-Specific Aggregations Functions
- 5 Our Proposed Target Model
- 6 Experiments
- 6.1 Results Experiment 1
- 6.2 Results Experiment 2
- 7 Discussion and Conclusion
- References
- Intent Enhanced Self-supervised Hypergraph Learning for Session-Based Recommendation
- 1 Introduction
- 2 Related Work
- 2.1 Traditional Methods
- 2.2 Deep Learning-Based Methods
- 2.3 Self-supervised Learning
- 3 Preliminaries
- 3.1 Problem Statement
- 3.2 Hypergraph
- 4 Methodology
- 4.1 Hypergraph Construction
- 4.2 Hypergraph Convolutional Neural Network
- 4.3 Session Representation Learning
- 4.4 Recommendation Generation
- 4.5 Enhancing SBR with Self-supervised Learning Task
- 4.6 Model Optimization
- 5 Experiments
- 5.1 Experimental Setup
- 5.2 Overall Performance (Q1)
- 5.3 Ablation Study (Q2)
- 5.4 Hyperparameters Analysis (Q3)
- 6 Conclusion
- References
- Missing Data Imputation: Do Advanced ML/DL Techniques Outperform Traditional Approaches?
- 1 Introduction
- 2 Related Work
- 3 Background
- 4 Imputation Methods
- 4.1 Statistical Methods
- 4.2 Machine Learning Methods
- 4.3 Deep Learning Methods
- 5 Experiments
- 5.1 Datasets and Experimental Setting
- 5.2 Experimental Results
- 6 Discussion
- 7 Conclusions and Future Directions
- References
- Evaluating Vision Transformer Models for Visual Quality Control in Industrial Manufacturing
- 1 Introduction
- 2 Related Work
- 2.1 Vision Backbone
- 2.2 Anomaly Detection and Localization
- 3 Experimental Setup
- 3.1 Backbone Architectures
- 3.2 Anomaly Detection Architectures
- 3.3 Datasets
- 3.4 Implementation Details
- 3.5 Metrics
- 3.6 Experiments
- 4 Results and Discussion
- 4.1 Comparison with the VT-ADL and FastFlow Models
- 4.2 Comparison of GMMs and NF Models
- 4.3 Performance of the Backbones
- 4.4 Considerations and Limitations for Practical Application
- 5 Conclusion
- References
- GraphRPM: Risk Pattern Mining on Industrial Large Attributed Graphs
- 1 Introduction
- 2 Problem Formulation
- 3 Methods
- 3.1 Potential Subgraph Enumeration
- 3.2 Two-Staged Pattern Mining
- 3.3 Pattern Risk Assessment
- 4 Experiments
- 5 Deployment
- 6 Conclusion
- References
- Solving a Real-World Optimization Problem Using Proximal Policy Optimization with Curriculum Learning and Reward Engineering
- 1 Introduction
- 2 Related Work
- 3 Real Environment and Problem Description
- 4 Reinforcement Learning Problem Formulation
- 4.1 State Space
- 4.2 Action Space
- 4.3 Environment Dynamics
- 5 Reward Tuning
- 5.1 Simple Gaussian Reward
- 5.2 Custom Reward
- 5.3 Precision Reward
- 6 Methodology
- 7 Experimental Evaluation
- 7.1 Experimental Setup
- 7.2 Results
- 7.3 Discussion
- 8 Conclusion
- References
- Spatial-Temporal PDE Networks for Traffic Flow Forecasting
- 1 Introduction
- 2 Related Work
- 3 Preliminaries
- 3.1 Problem Formulation
- 3.2 Network Architectures
- 4 Methods
- 4.1 Vanilla PDE for Traffic Flow
- 4.2 Discrete-Time PDE Solutions
- 4.3 Graph PDE Layer
- 4.4 Integrating PDE Layer with GNNs
- 5 Experiments
- 5.1 Datasets and Baselines
- 5.2 Performance Evaluation
- 5.3 Model Analysis
- 5.4 Case Study
- 6 Conclusion
- References
- Symbolic Prompt Tuning Completes the App Promotion Graph
- 1 Introduction
- 2 Background and Related Work
- 2.1 Definitions
- 2.2 Collected App Promotion Dataset
- 2.3 APHG Completion Task
- 2.4 Related Work
- 3 SymPrompt
- 3.1 Embedding-Based Symbolic Prompts
- 3.2 Metapath-Based Symbolic Prompts
- 3.3 Combined Input Tokens
- 4 Experiment
- 4.1 Setup
- 4.2 Performance on App Promotion HGC
- 4.3 Component Analysis
- 4.4 Random Permutation on Model Learning
- 5 Conclusion
- References
- Boosting Protein Language Models with Negative Sample Mining
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Negative Sampling
- 3.2 Negative Mining in Cross Attention Space
- 3.3 Inference Phase of NM-Transformer
- 4 Experiments
- 4.1 Experimental Settings
- 4.2 Main Results
- 4.3 Interpretability of NM-Transformer, Case Study
- 5 Conclusion
- References
- MedSyn: LLM-Based Synthetic Medical Text Generation Framework
- 1 Introduction
- 2 Related Work
- 2.1 Medical Knowledge Graphs
- 2.2 LLMs in Medical Domain
- 3 Method
- 3.1 Medical Knowledge Graph
- 3.2 Instruction-Following Dataset
- 3.3 Fine-Tuning
- 3.4 Generation Task
- 3.5 Symptoms Sampling
- 3.6 Synthetic Dataset
- 4 Experiments
- 4.1 Datasets and Tasks
- 4.2 Models
- 4.3 Evaluation
- 4.4 Results
- 4.5 Human Assessment
- 5 Discussion
- 6 Conclusion
- References
- A Crystal Knowledge-Enhanced Pre-training Framework for Crystal Property Estimation
- 1 Introduction
- 2 Related Work
- 3 Preliminaries
- 4 Methodology
- 4.1 Framework Overview
- 4.2 Reconstruction Under Mutually Exclusive Masked Views
- 4.3 Multi-graph Attention Module
- 4.4 Crystal Knowledge Enhanced Module
- 4.5 Optimization Objectives
- 5 Experiments
- 5.1 Dataset Description
- 5.2 CROP Configurations
- 5.3 Experimental Results
- 5.4 Ablation Studies
- 5.5 Parameter Sensitivity Analysis
- 6 Conclusion
- References
- Multiplex Community Detection for Resilient Electrical Segmentation Enabling Management of an Increasingly Complex Power Grid
- 1 Introduction
- 2 Related Work
- 3 Modeling of Resilient Electrical Segmentation
- 3.1 Formulation of Multiplex Graph Flattening
- 3.2 Optimization Methods
- 4 Experiments
- 4.1 Resilient Segmentation Pipeline
- 4.2 Simulation
- 4.3 Electrical Application: Security Analysis
- 5 Conclusion
- References
- Bandits for Sponsored Search Auctions Under Unknown Valuation Model: Case Study in E-Commerce Advertising
- 1 Introduction
- 2 Related Work
- 3 Problem Formulation
- 3.1 Learning in Sponsored Search Auctions
- 4 BatchEXP3: Algorithm for Learning in SSA
- 5 Deployment
- 5.1 Bidding System Architecture
- 5.2 Live Test Design and Unfolding
- 6 Experimental Results and Discussion
- 6.1 Group Level Analysis
- 6.2 Risk of Decreasing Costs
- 7 From Practice Back to Theory: Additional Insights
- 8 Conclusion
- References
- Unbiased Recommendation Through Invariant Representation Learning
- 1 Introduction
- 2 Related Work
- 2.1 Unbiased Recommendation
- 2.2 Causal Inference and Invariant Risk Minimization
- 3 Problem Formulation
- 4 Causal Analysis of Recommendation Bias
- 5 Causal Invariant Recommendation Model
- 5.1 Recommendation Model
- 5.2 Invariant Representation Learning
- 5.3 Data Partition Learning
- 6 Experiments
- 6.1 Experimental Settings
- 6.2 Main Results
- 6.3 Ablation Study
- 7 Conclusion
- References
- Enhancing Multi-objective Optimisation Through Machine Learning-Supported Multiphysics Simulation
- 1 Introduction
- 2 Related Work
- 2.1 Surrogate Modelling
- 2.2 Multiphysics Optimisation and Data Extension
- 3 Machine Learning Supported Optimisation Strategy
- 3.1 Data Acquisition
- 3.2 Surrogate Models
- 3.3 Interpretable Surrogate Modelling (xAI Module)
- 3.4 Multiobjective Optimisation and Validation
- 4 Experimental Design
- 4.1 Use Case 1: Motor Dataset
- 4.2 Use Case 2: U-Bend Dataset
- 5 Results
- 5.1 Prediction Performance of Surrogate Models
- 5.2 Identifying Critical Features and Relevant Dependencies
- 5.3 Evaluation of Solution Candidates in the Multiobjective Optimisation Task
- 5.4 Validation of Solution Candidates in the Multiobjective Optimisation
- 6 Conclusion
- References
- DistALANER: Distantly Supervised Active Learning Augmented Named Entity Recognition in the Open Source Software Ecosystem
- 1 Introduction
- 2 Related Work
- 3 Dataset
- 4 Source Details
- 5 Preliminaries
- 6 Methodology
- 6.1 Stage 1: Construction and Matching of Dictionary
- 6.2 Stage 2: Entity Distillation and Dictionary Expansion
- 6.3 Stage 3: The NER Model
- 7 Heuristics
- 8 Experimental Setup
- 9 Results
- 9.1 Progressive Learning
- 9.2 Motivation of Different Split of HOn
- 10 Issues in Finding Distant Labels Using LLMs
- 11 Additional Experiments for Task-Based Evaluation
- 11.1 Relation Extraction
- 12 Error Analysis
- 13 Conclusion
- References
- DiffSynth: Latent In-Iteration Deflickering for Realistic Video Synthesis
- 1 Introduction
- 2 Related Work
- 2.1 Diffusion Models
- 2.2 Diffusion-Based Video Synthesis
- 3 Methodology
- 3.1 Preliminaries
- 3.2 Latent In-Iteration Deflickering
- 3.3 Patch Blending Algorithm
- 4 Experiments
- 4.1 Experimental Settings
- 4.2 Quantitive Comparison
- 4.3 Ablation Study
- 5 More Pipelines for Video Synthesis Applications
- 5.1 Image-Guided Video Stylization
- 5.2 Video Restoring
- 5.3 3D Rendering
- 6 Industrial Application
- 7 Conclusion and Future Work
- References
- Offline Imitation of Badminton Player Behavior via Experiential Contexts and Brownian Motion
- 1 Introduction
- 2 Related Work
- 2.1 Inverse Reinforcement Learning
- 2.2 Offline Imitation Learning via Behavior Cloning
- 3 Preliminaries
- 3.1 Badminton: A Typical Example of a Turn-Based Sport
- 3.2 The Contextual Markov Decision Process
- 3.3 Problem Formulation
- 4 Methodology
- 4.1 Experiential Context Selector (ECS)
- 4.2 Latent Geometric Brownian Motion (LGBM)
- 4.3 Action Projection Layer
- 4.4 Loss Function
- 5 Experiments
- 5.1 Experimental Setup
- 5.2 Quantitative Results (RQ1)
- 5.3 Length Distribution Difference (RQ2)
- 5.4 Win Rate Difference (RQ3)
- 5.5 Case Studies (RQ4)
- 6 Conclusion
- References
- Fast and Adaptive Questionnaires for Voting Advice Applications
- 1 Introduction
- 2 Methods
- 2.1 Data
- 2.2 Spatial Models
- 2.3 Selection Methods
- 2.4 Evaluation Metrics
- 3 Results
- 3.1 Selecting the Spatial Model
- 3.2 Optimizing the Questionnaire
- 4 Conclusions
- References
- Job Title Prediction as a Dual Task of Expertise Prediction in Open Source Software
- 1 Introduction
- 2 Related Work
- 3 The TOSE Dataset
- 3.1 Raw Data Collection
- 3.2 API Expertise Sequence Construction
- 3.3 Job Title Sequence Construction
- 3.4 Sequence Alignment
- 4 The DualJE Model
- 4.1 The Primal Task: Expertise to Job Titles
- 4.2 The Dual Task: Job Titles to Expertise
- 4.3 Model Training
- 5 Performance Evaluation
- 5.1 Data Configuration
- 5.2 Baseline Models
- 5.3 Hyperparameters and Evaluation Metrics
- 5.4 RQ1: DualJE vs. Baseline Models
- 5.5 RQ2: DualJE vs. Ablation Models
- 5.6 RQ3: Hyperparameter Tuning
- 6 Discussion and Conclusion
- References
- LLMs in the Loop: Leveraging Large Language Model Annotations for Active Learning in Low-Resource Languages
- 1 Introduction
- 2 LLMs in the Loop
- 3 Experiments
- 3.1 Foundation Model Selection
- 3.2 Effect of Prompt Design and Querying LLMs in Batches
- 3.3 Data Contamination
- 3.4 Active Learning
- 4 Conclusion
- References
- Multi-spectral Gradient Residual Network for Haze Removal in Multi-sensor Remote Sensing Imagery
- 1 Introduction
- 2 Related Work
- 2.1 Haze Removal with Priors and Assumptions
- 2.2 Haze Removal with Deep Learning
- 3 Methodology
- 3.1 Problem Formulation - Input and Output
- 3.2 Model Architecture
- 3.3 Loss Function
- 4 Experiments
- 4.1 Dataset
- 4.2 Experimental Setup
- 5 Results
- 5.1 Quantitative Results
- 6 Conclusions
- References
- ExTea: An Evolutionary Algorithm-Based Approach for Enhancing Explainability in Time-Series Models
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Problem Definition and Individual Coding
- 3.2 Population Generation
- 3.3 Fitness Function Design
- 3.4 Growth
- 3.5 Crossover and Mutation
- 3.6 Explanation
- 4 Experiment
- 4.1 Experiment Setup
- 4.2 Evaluation
- 5 Conclusion
- References
- BiCAE - A Bimodal Convolutional Autoencoder for Seed Purity Testing
- 1 Introduction
- 2 Related Work
- 2.1 Computer Vision for Seed Analysis
- 2.2 Multimodal Autoencoders
- 3 Methodology
- 3.1 Unimodal Baselines
- 3.2 Bimodal Convolutional Autoencoder (BiCAE)
- 4 Experiments
- 4.1 Data
- 4.2 Training Setting
- 4.3 Results
- 5 Discussion
- 5.1 Model Comparison
- 5.2 Societal Impact
- 6 Conclusion and Outlook
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.