
Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
The multi-volume set LNAI 12975 until 12979 constitutes the refereed proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2021, which was held during September 13-17, 2021. The conference was originally planned to take place in Bilbao, Spain, but changed to an online event due to the COVID-19 pandemic.
The 210 full papers presented in these proceedings were carefully reviewed and selected from a total of 869 submissions.
The volumes are organized in topical sections as follows:
Research Track:
Part I: Online learning; reinforcement learning; time series, streams, and sequence models; transfer and multi-task learning; semi-supervised and few-shot learning; learning algorithms and applications.
Part II: Generative models; algorithms and learning theory; graphs and networks; interpretation, explainability, transparency, safety.
Part III: Generative models; search and optimization; supervised learning; text mining and natural language processing; image processing, computer vision and visual analytics.
Applied Data Science Track:
Part IV: Anomaly detection and malware; spatio-temporal data; e-commerce and finance; healthcare and medical applications (including Covid); mobility and transportation.
Part V: Automating machine learning, optimization, and feature engineering; machine learning based simulations and knowledge discovery; recommender systems and behavior modeling; natural language processing; remote sensing, image and video processing; social media.
More details
Other editions
Additional editions

Content
- Intro
- Preface
- Organization
- Contents - Part IV
- Anomaly Detection and Malware
- Anomaly Detection: How to Artificially Increase Your F1-Score with a Biased Evaluation Protocol
- 1 Introduction
- 2 Related Work
- 3 Issues When Using F1-Score and AVPR Metrics
- 3.1 Formalism and Problem Statement
- 3.2 Definition of the Metrics
- 3.3 Evaluation Protocols: Theory vs Practice
- 3.4 Metrics Sensitivity to the Contamination Rate of the Test Set
- 3.5 How to Artificially Increase Your F1-Score and AVPR
- 3.6 F1-Score Cannot Compare Datasets Difficulty
- 4 Call for Action
- 4.1 Use AUC
- 4.2 Do Not Waste Anomalous Samples
- 5 Conclusion
- References
- Mining Anomalies in Subspaces of High-Dimensional Time Series for Financial Transactional Data
- 1 Introduction
- 2 Related Work
- 3 Definitions and Notation
- 4 System Architecture
- 4.1 Subspace Searching Module
- 4.2 Discord Mining Module
- 4.3 Discussion
- 5 Evaluation
- 5.1 Alternative Approaches
- 5.2 Synthetic Data
- 5.3 Real-World Transactional Data
- 6 Conclusion
- References
- AIMED-RL: Exploring Adversarial Malware Examples with Reinforcement Learning
- 1 Introduction
- 2 Related Work
- 2.1 Reinforcement Learning
- 2.2 Further Approaches
- 3 AIMED-RL
- 3.1 Framework and Notation
- 3.2 Experimental Setting
- 3.3 Environment
- 4 Experimental Results
- 4.1 Diversity of Perturbations
- 4.2 Evasion Rate
- 5 Availability
- 6 Conclusion
- References
- Learning Explainable Representations of Malware Behavior
- 1 Introduction
- 2 Related Work
- 3 Problem Setting and Operating Environment
- 3.1 Network Events
- 3.2 Identification of Threats
- 3.3 Data Collection and Quantitative Analysis
- 4 Models
- 4.1 Architectures
- 4.2 Unsupervised Pre-training
- 5 Experiments
- 5.1 Hyperparameter Optimization
- 5.2 Malware-Classification Performance
- 5.3 Indicators of Compromise
- 6 Conclusion
- References
- Strategic Mitigation Against Wireless Attacks on Autonomous Platoons
- 1 Introduction
- 1.1 Related Work
- 2 Message Falsification Attacks Against Platoons
- 2.1 Vehicular Platoon Control Policy
- 2.2 Attack Model
- 2.3 Attack Detection Algorithm
- 3 Security Game-Based Mitigation Framework
- 3.1 Numerical Example
- 4 Simulation Setup
- 5 Simulation Results and Discussion
- 5.1 Realistic Driving Scenario
- 6 Conclusion
- References
- DeFraudNet: An End-to-End Weak Supervision Framework to Detect Fraud in Online Food Delivery
- 1 Introduction
- 2 Related Work
- 3 The Framework: DeFraudNet
- 3.1 Problem Definition
- 3.2 Fraud Detection Pipeline
- 4 Data and Feature Processing
- 4.1 Dataset
- 4.2 Feature Engineering
- 5 Label Generation
- 5.1 Generating Noisy Labels Using LFs
- 5.2 Snorkel Generative Model
- 5.3 Class-Specific Autoencoders for Denoising
- 6 Discriminator Models
- 6.1 Multi Layer Perceptron
- 6.2 LSTM Sequence Model
- 7 Deployment and Serving Infrastructure
- 8 Ablation Experiments
- 8.1 Setup and Baseline
- 8.2 Experiments
- 9 Conclusion
- References
- Spatio-Temporal Data
- Time Series Forecasting with Gaussian Processes Needs Priors
- 1 Introduction
- 2 Gaussian Processes
- 2.1 Kernel Compositions
- 2.2 The Composition
- 2.3 Training Strategy
- 2.4 MAP Estimation
- 2.5 Forecasting
- 3 Experiments
- 4 Dealing with Multiple Seasonalities
- 5 Code and Replicability
- 6 Conclusions
- References
- Task Embedding Temporal Convolution Networks for Transfer Learning Problems in Renewable Power Time Series Forecast
- 1 Introduction
- 2 Related Work
- 3 Proposed Method
- 3.1 Definition of MTL, TL, and Zero-Shot Learning
- 3.2 Proposed Method
- 4 Experimental Evaluation of the Task-Temporal Convolution Network
- 4.1 GemanSolarFarm and EuropeWindFarm Dataset
- 4.2 Evaluation Measures
- 4.3 MTL Experiment
- 4.4 Zero-Shot Learning Experiment
- 4.5 Inductive TL Experiment
- 5 Conclusion and Future Work
- References
- Generating Multi-type Temporal Sequences to Mitigate Class-Imbalanced Problem
- 1 Introduction
- 2 Related Work
- 2.1 GAN for Sequence Data
- 2.2 RL for GANs with Sequences of Discrete Tokens
- 2.3 Gumbel-Softmax Distribution for GANs with Sequences of Discrete Tokens
- 3 Methodology
- 3.1 Definitions
- 3.2 RL and Policy Improvement to Train GAN
- 3.3 An Approximation with Gumbel-Softmax Distribution
- 4 Data Experiments
- 4.1 Synthetic Dataset
- 4.2 Evaluation Metric
- 4.3 Experiment Setup
- 4.4 Experiment Results
- 5 Conclusions
- References
- Recognizing Skeleton-Based Hand Gestures by a Spatio-Temporal Network
- 1 Introduction
- 2 Related Work
- 2.1 Hand Pose and Gesture Representation
- 2.2 Hand Gesture Recognition
- 3 Problem Formulation
- 3.1 Definition
- 3.2 Embedding Representation for Skeletal Data
- 4 Our Model
- 4.1 Spatio-Temporal Feature Encoder
- 4.2 Attention Scorer
- 4.3 Network-Based Classifier
- 5 Experiments
- 5.1 Datasets and Preprocessing
- 5.2 Experimental Set-Ups and Baselines
- 5.3 Comparison Results on Publicly-Available Datasets
- 5.4 Comparisons Results on TaiChi2021
- 5.5 Ablation Study
- 6 Conclusion
- References
- E-commerce and Finance
- Smurf-Based Anti-money Laundering in Time-Evolving Transaction Networks
- 1 Introduction
- 2 Related Work
- 3 Dataset Description
- 4 Extraction of Smurf-Like Motifs from Transaction Graph
- 4.1 Proposed Pipeline
- 4.2 Results
- 5 Conclusion
- References
- Spatio-Temporal Multi-graph Networks for Demand Forecasting in Online Marketplaces
- 1 Introduction
- 2 Prior Work
- 3 Proposed Method
- 3.1 Problem Formulation
- 3.2 Graph Construction
- 3.3 Graph Neural Networks
- 3.4 Sequential Model
- 4 Experimental Results
- 4.1 Implementation Details
- 4.2 Comparison with Baseline
- 4.3 Demand Forecasting for Multi-seller Products and Cold Start Offers
- 5 Conclusion
- References
- The Limit Order Book Recreation Model (LOBRM): An Extended Analysis
- 1 Introduction
- 2 Background and Related Work
- 2.1 The Limit Order Book (LOB)
- 2.2 Generating Synthetic LOB Data
- 3 Model Formulation
- 3.1 Motivation
- 3.2 Problem Description
- 3.3 Formalized Workflow of LOBRM
- 4 Experiment and Empirical Analysis
- 4.1 Data Preprocessing
- 4.2 Model Comparison
- 4.3 Ablation Study
- 4.4 Superiority of Sparse Encoding for TAQ
- 4.5 Is the Model Well-Trained?
- 5 Conclusion
- References
- Taking over the Stock Market: Adversarial Perturbations Against Algorithmic Traders
- 1 Introduction
- 2 Background
- 2.1 Algorithmic Trading
- 2.2 Adversarial Learning
- 3 Problem Description
- 3.1 Trading Setup
- 3.2 Threat Model
- 4 Proposed Attack
- 5 Evaluation Setup
- 5.1 Dataset
- 5.2 Feature Extraction
- 5.3 Models
- 5.4 Evaluation
- 6 White-Box Attack
- 7 Black-Box Attack
- 8 Mitigation
- 9 Conclusions
- References
- Continuous-Action Reinforcement Learning for Portfolio Allocation of a Life Insurance Company
- 1 Introduction
- 2 Problem Definition
- 2.1 Formalization
- 2.2 Implementation Details
- 2.3 Optimization Problem
- 3 Solution
- 3.1 Structural and Parametric Constraints
- 4 Experimental Evaluation
- 4.1 Three Assets Scenario.
- 4.2 Six Assets Scenario
- 5 Related Work
- 6 Conclusions
- References
- XRR: Explainable Risk Ranking for Financial Reports
- 1 Introduction
- 2 Methodology
- 2.1 Definitions and Problem Formulation
- 2.2 Post-event Return Volatility
- 2.3 Multilevel Explanation Structure
- 2.4 Pairwise Deep Ranking
- 3 Experiments
- 3.1 Data Description
- 3.2 Experimental Settings
- 3.3 Pre-trained Word Embedding
- 3.4 Compared Methods
- 3.5 Experimental Results
- 3.6 Fine-Grained Analysis
- 3.7 Different Risk Measure Analysis
- 4 Discussions on Explainability
- 4.1 Financial Sentiment Terms Analysis
- 4.2 Financial Sentiment Sentences Analysis
- 5 Conclusion
- References
- Healthcare and Medical Applications (including Covid)
- Self-disclosure on Twitter During the COVID-19 Pandemic: A Network Perspective
- 1 Introduction
- 2 Dataset
- 3 Self-disclosure Measurements
- 3.1 Measurement Scale
- 3.2 Manual Annotations
- 3.3 Label Generation
- 4 Analysis
- 4.1 Self-disclosure Assortativity in Twitter Reply Networks
- 4.2 Persistent Groups and Self-disclosure
- 4.3 Characterizing Sensitive Disclosures in Temporally Persistent Social Connections
- 5 Discussion
- 6 Related Work
- 7 Conclusion
- References
- COVID Edge-Net: Automated COVID-19 Lung Lesion Edge Detection in Chest CT Images
- 1 Introduction
- 2 Related Works
- 2.1 COVID-19 Segmentation
- 2.2 Edge Detection
- 3 Methodology
- 3.1 Task Definition
- 3.2 Overview of COVID Edge-Net
- 3.3 The Edge Detection Backbone
- 3.4 Multi-scale Residual Dual Attention (MSRDA) Module
- 3.5 Canny Operator Module
- 3.6 Global Loss Function
- 4 Experiments and Discussions
- 4.1 Experimental Settings
- 4.2 Comparison with State-of-the-Arts
- 4.3 Ablation Study
- 4.4 Additional Experiments
- 5 Conclusions
- References
- Improving Ambulance Dispatching with Machine Learning and Simulation
- 1 Introduction
- 2 Related Work
- 3 The Data Set: Historic Dispatch Decisions
- 3.1 Feature Engineering
- 4 Capturing the Dispatch Policy with a Decision Tree
- 4.1 Performance Analysis of the Learned Decision Tree and Policy
- 4.2 The Penalty-Based Closest-Idle Policy
- 5 Current Policy as a Basis for Improvement
- 5.1 Evaluating Potential Enhancements Using Simulation
- 5.2 Performance of the Improved Policy
- 6 Conclusion
- References
- Countrywide Origin-Destination Matrix Prediction and Its Application for COVID-19
- 1 Introduction
- 2 Related Work
- 2.1 Crowd and Traffic Flow Prediction
- 2.2 Mobility-Based COVID-19 Simulation
- 3 Problem Definition
- 4 OD Matrix Prediction Model
- 4.1 Overview
- 4.2 Origin-Destination Convolution (OD-Conv)
- 4.3 Origin-Destination Convolutional Recurrent Unit (ODCRU)
- 4.4 Dynamic Graph Constructor (DGC)
- 5 OD Matrix Based Epidemic Simulation Model
- 6 Experiment
- 6.1 Data
- 6.2 Setting
- 6.3 Evaluation on OD Matrix Prediction
- 6.4 Evaluation on COVID-19 Simulation
- 7 Conclusion
- References
- Single Model for Influenza Forecasting of Multiple Countries by Multi-task Learning
- 1 Introduction
- 2 Datasets
- 3 Methods for Finding Search Queries
- 4 Building a Flu Forecasting Model for Multiple Countries
- 4.1 Problem Formulation
- 4.2 Model Structure
- 4.3 Extension to Multi-task Model
- 5 Experiments and Results
- 5.1 Experimental Settings
- 5.2 Comparative Models
- 5.3 Results
- 6 Discussions
- 6.1 Multi-model Performance for Other Countries
- 6.2 Comparison of Models Without and with Search Queries
- 6.3 Analysis of the Methods to Find Search Queries
- 7 Conclusions
- References
- Automatic Acoustic Mosquito Tagging with Bayesian Neural Networks
- 1 Introduction
- 2 Background
- 2.1 Mosquito Control Efforts
- 2.2 Acoustic Machine Learning
- 3 Methods
- 3.1 HumBug Pipeline
- 3.2 Bayesian Neural Networks
- 4 Model Configuration
- 5 Results
- 5.1 Validation Performance
- 5.2 Automatically Labelling Field Data with Uncertainty Metrics
- 6 Conclusion
- A Appendix
- References
- Multitask Recalibrated Aggregation Network for Medical Code Prediction
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Input Layer
- 3.2 Bidirectional GRU Layer
- 3.3 Recalibrated Aggregation Module
- 3.4 Attention Classification Layers
- 3.5 Multitask Training
- 4 Experiments
- 4.1 Datasets
- 4.2 Settings
- 4.3 Baselines
- 4.4 Results
- 4.5 Ablation Study
- 4.6 A Detailed Analysis of the Properties of the RAM
- 5 Conclusion
- References
- Open Data Science to Fight COVID-19: Winning the 500k XPRIZE Pandemic Response Challenge
- 1 Introduction
- 2 Related Work
- 3 Data
- 4 Predictors of COVID-19 Cases
- 4.1 Notation
- 4.2 SIR Epidemiological Model
- 4.3 Baseline or Standard Predictor
- 4.4 ValenciaIA4COVID (V4C) Predictor
- 5 Prescriptor of Intervention Policies
- 5.1 Modeling the NPI - COVID-19 Cases Space
- 5.2 Prescriptors
- 5.3 Intervention Policy Definition
- 6 Experimental Results
- 6.1 Predictor
- 6.2 Speed and Resource Use
- 6.3 Prescriptor
- 7 Conclusions and Future Work
- References
- Mobility and Transportation
- Getting Your Package to the Right Place: Supervised Machine Learning for Geolocation
- 1 Introduction
- 2 Supervised Geolocation by Ranking
- 2.1 Candidate Filtering and Generation
- 2.2 Feature Vectors
- 2.3 Base Classifiers and Implementation
- 3 Experiments
- 3.1 Datasets
- 3.2 Loss vs. Business Objective
- 3.3 How Does It Perform Against Baselines?
- 3.4 How Is the Tail Affected by Model Capacity?
- 3.5 Lesion Studies and RankNet Comparison
- 4 Discussion
- 4.1 Real-World Offline Evaluations
- 4.2 Real-World Online Evaluations
- 4.3 Limitations
- 5 Related Work
- 6 Conclusion and Future Work
- References
- Machine Learning Guided Optimization for Demand Responsive Transport Systems
- 1 Introduction
- 1.1 Context
- 1.2 Motivation
- 1.3 Contribution
- 2 Related Work
- 3 Machine Learning Guided Optimization
- 4 MLGO Applied to DRT Systems
- 4.1 Model and Notations
- 4.2 Generation of Feasible Solutions
- 4.3 Simulation Framework
- 4.4 Surrogate Model
- 4.5 Offline Optimization Framework
- 5 Experiments
- 5.1 Choice of a Machine Learning Model for the Optimization
- 5.2 Computational Results
- 5.3 Optimization Results
- 6 Conclusion
- References
- OBELISC: Oscillator-Based Modelling and Control Using Efficient Neural Learning for Intelligent Road Traffic Signal Calculation
- 1 Introduction
- 2 Materials and Methods
- 2.1 Oscillator-Based Modelling of Traffic Dynamics
- 2.2 Robust Control of the Oscillator-Based Networked Dynamics
- 2.3 Representation, Learning, and Dynamics in Neural Networks
- 3 Experiments and Results
- 4 Discussion
- 5 Conclusions
- References
- VAMBC: A Variational Approach for Mobility Behavior Clustering
- 1 Introduction
- 2 Related Work
- 3 Preprocessing and Problem Definition
- 4 The VAMBCModel
- 4.1 Decomposing Hidden Variables
- 4.2 Training Objectives and Neural Layers
- 4.3 Network Design
- 4.4 Relationship to VAE and Gaussian-Mixture VAE
- 5 Experiments
- 5.1 Environment and Experiment Settings
- 5.2 Quantitative Analysis
- 5.3 Ablation Study
- 5.4 The Training Progress of VAMBC
- 6 Conclusion
- References
- Multi-agent Deep Reinforcement Learning with Spatio-Temporal Feature Fusion for Traffic Signal Control
- 1 Introduction
- 2 Related Works
- 3 Problem Definition
- 4 Method
- 4.1 Spatio-Temporal Input Embedding
- 4.2 Spatio-Temporal Feature Fusion
- 4.3 Q-Value Prediction
- 5 Experiment
- 5.1 Datasets
- 5.2 Baseline Methods
- 5.3 Performance Metrics and Parameter Settings
- 5.4 Comparison with Baseline Methods
- 5.5 Effect of Spatio-Temporal Feature Fusion Components
- 6 Conclusion
- References
- Monte Carlo Search Algorithms for Network Traffic Engineering
- 1 Introduction
- 2 Problem Formulation
- 3 Monte Carlo Search on Routing Problem
- 3.1 Monte Carlo Search
- 3.2 Modeling with Monte Carlo Search
- 3.3 Improvement
- 4 Experimental Results
- 4.1 Dataset
- 4.2 Comparison of the Monte Carlo Algorithms
- 4.3 Impact of the Metric Space
- 4.4 Comparison
- 4.5 Random Dense Graphs
- 5 Conclusion
- References
- Energy and Emission Prediction for Mixed-Vehicle Transit Fleets Using Multi-task and Inductive Transfer Learning
- 1 Introduction
- 2 Model
- 2.1 Predicting Energy Consumed and Emissions
- 2.2 Preliminaries and Model Formulation
- 3 Approach
- 3.1 Mapping Vehicle Trajectories to Route Segments
- 3.2 Generating Samples
- 3.3 Learning
- 4 Experiments and Results
- 4.1 Hyperparameter Tuning and Baseline Models
- 4.2 Multi-task Model Evaluation
- 4.3 Inductive Transfer Learning Evaluation
- 4.4 Discussion
- 5 Conclusion
- References
- CQNet: A Clustering-Based Quadruplet Network for Decentralized Application Classification via Encrypted Traffic
- 1 Introduction
- 2 Related Work
- 2.1 Web Application Classification
- 2.2 Mobile Application Classification
- 2.3 Decentralized Applications Classification
- 3 Preliminaries
- 3.1 DApps Background
- 3.2 Problem Definition
- 3.3 Limitation of Existing Methods
- 4 CQNet
- 4.1 FE-set (RQ1)
- 4.2 The Proposed Quadruplet Network (RQ2)
- 5 Performance Evaluation
- 5.1 Dataset Collection
- 5.2 Experiments Settings
- 5.3 Hyperparameters of CQNet
- 5.4 Performance Comparison
- 5.5 Ablation Studies
- 6 Conclusion
- References
- SPOT: A Framework for Selection of Prototypes Using Optimal Transport
- 1 Introduction
- 2 Background
- 2.1 Optimal Transport (OT)
- 2.2 Prototype Selection
- 2.3 Submodularity
- 3 SPOT Framework
- 3.1 SPOT Problem Formulation
- 3.2 Equivalent Reduced Representations of SPOT Objective
- 3.3 SPOT Optimization Algorithms
- 3.4 k-Medoids as a Special Case of SPOT
- 4 Related Works and Discussion
- 5 Experiments
- 5.1 Prototype Selection Within Same Domain
- 5.2 Prototype Selection from Different Domains
- 6 Conclusion
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.