Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track

Name: Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track | European Conference, ECML PKDD 2021, Bilbao, Spain, September 13-17, 2021, Proceedings, Part IV
Brand: Springer
Price: 85.59 EUR
Availability: OnlineOnly

European Conference, ECML PKDD 2021, Bilbao, Spain, September 13-17, 2021, Proceedings, Part IV

Yuxiao Dong Nicolas Kourtellis Barbara Hammer Jose A. Lozano(Editor)

Springer (Publisher)

Published on 9. September 2021

XXXIV, 554 pages

E-Book

PDF with digital watermarking

System requirements

978-3-030-86514-6 (ISBN)

€85.59incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

The multi-volume set LNAI 12975 until 12979 constitutes the refereed proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2021, which was held during September 13-17, 2021. The conference was originally planned to take place in Bilbao, Spain, but changed to an online event due to the COVID-19 pandemic.

The 210 full papers presented in these proceedings were carefully reviewed and selected from a total of 869 submissions.

The volumes are organized in topical sections as follows:

Research Track:

Part I: Online learning; reinforcement learning; time series, streams, and sequence models; transfer and multi-task learning; semi-supervised and few-shot learning; learning algorithms and applications.

Part II: Generative models; algorithms and learning theory; graphs and networks; interpretation, explainability, transparency, safety.

Part III: Generative models; search and optimization; supervised learning; text mining and natural language processing; image processing, computer vision and visual analytics.

Applied Data Science Track:

Part IV: Anomaly detection and malware; spatio-temporal data; e-commerce and finance; healthcare and medical applications (including Covid); mobility and transportation.

Part V: Automating machine learning, optimization, and feature engineering; machine learning based simulations and knowledge discovery; recommender systems and behavior modeling; natural language processing; remote sensing, image and video processing; social media.

More details

Other editions

Content

Intro
Preface
Organization
Contents - Part IV
Anomaly Detection and Malware
Anomaly Detection: How to Artificially Increase Your F1-Score with a Biased Evaluation Protocol
1 Introduction
2 Related Work
3 Issues When Using F1-Score and AVPR Metrics
3.1 Formalism and Problem Statement
3.2 Definition of the Metrics
3.3 Evaluation Protocols: Theory vs Practice
3.4 Metrics Sensitivity to the Contamination Rate of the Test Set
3.5 How to Artificially Increase Your F1-Score and AVPR
3.6 F1-Score Cannot Compare Datasets Difficulty
4 Call for Action
4.1 Use AUC
4.2 Do Not Waste Anomalous Samples
5 Conclusion
References
Mining Anomalies in Subspaces of High-Dimensional Time Series for Financial Transactional Data
1 Introduction
2 Related Work
3 Definitions and Notation
4 System Architecture
4.1 Subspace Searching Module
4.2 Discord Mining Module
4.3 Discussion
5 Evaluation
5.1 Alternative Approaches
5.2 Synthetic Data
5.3 Real-World Transactional Data
6 Conclusion
References
AIMED-RL: Exploring Adversarial Malware Examples with Reinforcement Learning
1 Introduction
2 Related Work
2.1 Reinforcement Learning
2.2 Further Approaches
3 AIMED-RL
3.1 Framework and Notation
3.2 Experimental Setting
3.3 Environment
4 Experimental Results
4.1 Diversity of Perturbations
4.2 Evasion Rate
5 Availability
6 Conclusion
References
Learning Explainable Representations of Malware Behavior
1 Introduction
2 Related Work
3 Problem Setting and Operating Environment
3.1 Network Events
3.2 Identification of Threats
3.3 Data Collection and Quantitative Analysis
4 Models
4.1 Architectures
4.2 Unsupervised Pre-training
5 Experiments
5.1 Hyperparameter Optimization
5.2 Malware-Classification Performance
5.3 Indicators of Compromise
6 Conclusion
References
Strategic Mitigation Against Wireless Attacks on Autonomous Platoons
1 Introduction
1.1 Related Work
2 Message Falsification Attacks Against Platoons
2.1 Vehicular Platoon Control Policy
2.2 Attack Model
2.3 Attack Detection Algorithm
3 Security Game-Based Mitigation Framework
3.1 Numerical Example
4 Simulation Setup
5 Simulation Results and Discussion
5.1 Realistic Driving Scenario
6 Conclusion
References
DeFraudNet: An End-to-End Weak Supervision Framework to Detect Fraud in Online Food Delivery
1 Introduction
2 Related Work
3 The Framework: DeFraudNet
3.1 Problem Definition
3.2 Fraud Detection Pipeline
4 Data and Feature Processing
4.1 Dataset
4.2 Feature Engineering
5 Label Generation
5.1 Generating Noisy Labels Using LFs
5.2 Snorkel Generative Model
5.3 Class-Specific Autoencoders for Denoising
6 Discriminator Models
6.1 Multi Layer Perceptron
6.2 LSTM Sequence Model
7 Deployment and Serving Infrastructure
8 Ablation Experiments
8.1 Setup and Baseline
8.2 Experiments
9 Conclusion
References
Spatio-Temporal Data
Time Series Forecasting with Gaussian Processes Needs Priors
1 Introduction
2 Gaussian Processes
2.1 Kernel Compositions
2.2 The Composition
2.3 Training Strategy
2.4 MAP Estimation
2.5 Forecasting
3 Experiments
4 Dealing with Multiple Seasonalities
5 Code and Replicability
6 Conclusions
References
Task Embedding Temporal Convolution Networks for Transfer Learning Problems in Renewable Power Time Series Forecast
1 Introduction
2 Related Work
3 Proposed Method
3.1 Definition of MTL, TL, and Zero-Shot Learning
3.2 Proposed Method
4 Experimental Evaluation of the Task-Temporal Convolution Network
4.1 GemanSolarFarm and EuropeWindFarm Dataset
4.2 Evaluation Measures
4.3 MTL Experiment
4.4 Zero-Shot Learning Experiment
4.5 Inductive TL Experiment
5 Conclusion and Future Work
References
Generating Multi-type Temporal Sequences to Mitigate Class-Imbalanced Problem
1 Introduction
2 Related Work
2.1 GAN for Sequence Data
2.2 RL for GANs with Sequences of Discrete Tokens
2.3 Gumbel-Softmax Distribution for GANs with Sequences of Discrete Tokens
3 Methodology
3.1 Definitions
3.2 RL and Policy Improvement to Train GAN
3.3 An Approximation with Gumbel-Softmax Distribution
4 Data Experiments
4.1 Synthetic Dataset
4.2 Evaluation Metric
4.3 Experiment Setup
4.4 Experiment Results
5 Conclusions
References
Recognizing Skeleton-Based Hand Gestures by a Spatio-Temporal Network
1 Introduction
2 Related Work
2.1 Hand Pose and Gesture Representation
2.2 Hand Gesture Recognition
3 Problem Formulation
3.1 Definition
3.2 Embedding Representation for Skeletal Data
4 Our Model
4.1 Spatio-Temporal Feature Encoder
4.2 Attention Scorer
4.3 Network-Based Classifier
5 Experiments
5.1 Datasets and Preprocessing
5.2 Experimental Set-Ups and Baselines
5.3 Comparison Results on Publicly-Available Datasets
5.4 Comparisons Results on TaiChi2021
5.5 Ablation Study
6 Conclusion
References
E-commerce and Finance
Smurf-Based Anti-money Laundering in Time-Evolving Transaction Networks
1 Introduction
2 Related Work
3 Dataset Description
4 Extraction of Smurf-Like Motifs from Transaction Graph
4.1 Proposed Pipeline
4.2 Results
5 Conclusion
References
Spatio-Temporal Multi-graph Networks for Demand Forecasting in Online Marketplaces
1 Introduction
2 Prior Work
3 Proposed Method
3.1 Problem Formulation
3.2 Graph Construction
3.3 Graph Neural Networks
3.4 Sequential Model
4 Experimental Results
4.1 Implementation Details
4.2 Comparison with Baseline
4.3 Demand Forecasting for Multi-seller Products and Cold Start Offers
5 Conclusion
References
The Limit Order Book Recreation Model (LOBRM): An Extended Analysis
1 Introduction
2 Background and Related Work
2.1 The Limit Order Book (LOB)
2.2 Generating Synthetic LOB Data
3 Model Formulation
3.1 Motivation
3.2 Problem Description
3.3 Formalized Workflow of LOBRM
4 Experiment and Empirical Analysis
4.1 Data Preprocessing
4.2 Model Comparison
4.3 Ablation Study
4.4 Superiority of Sparse Encoding for TAQ
4.5 Is the Model Well-Trained?
5 Conclusion
References
Taking over the Stock Market: Adversarial Perturbations Against Algorithmic Traders
1 Introduction
2 Background
2.1 Algorithmic Trading
2.2 Adversarial Learning
3 Problem Description
3.1 Trading Setup
3.2 Threat Model
4 Proposed Attack
5 Evaluation Setup
5.1 Dataset
5.2 Feature Extraction
5.3 Models
5.4 Evaluation
6 White-Box Attack
7 Black-Box Attack
8 Mitigation
9 Conclusions
References
Continuous-Action Reinforcement Learning for Portfolio Allocation of a Life Insurance Company
1 Introduction
2 Problem Definition
2.1 Formalization
2.2 Implementation Details
2.3 Optimization Problem
3 Solution
3.1 Structural and Parametric Constraints
4 Experimental Evaluation
4.1 Three Assets Scenario.
4.2 Six Assets Scenario
5 Related Work
6 Conclusions
References
XRR: Explainable Risk Ranking for Financial Reports
1 Introduction
2 Methodology
2.1 Definitions and Problem Formulation
2.2 Post-event Return Volatility
2.3 Multilevel Explanation Structure
2.4 Pairwise Deep Ranking
3 Experiments
3.1 Data Description
3.2 Experimental Settings
3.3 Pre-trained Word Embedding
3.4 Compared Methods
3.5 Experimental Results
3.6 Fine-Grained Analysis
3.7 Different Risk Measure Analysis
4 Discussions on Explainability
4.1 Financial Sentiment Terms Analysis
4.2 Financial Sentiment Sentences Analysis
5 Conclusion
References
Healthcare and Medical Applications (including Covid)
Self-disclosure on Twitter During the COVID-19 Pandemic: A Network Perspective
1 Introduction
2 Dataset
3 Self-disclosure Measurements
3.1 Measurement Scale
3.2 Manual Annotations
3.3 Label Generation
4 Analysis
4.1 Self-disclosure Assortativity in Twitter Reply Networks
4.2 Persistent Groups and Self-disclosure
4.3 Characterizing Sensitive Disclosures in Temporally Persistent Social Connections
5 Discussion
6 Related Work
7 Conclusion
References
COVID Edge-Net: Automated COVID-19 Lung Lesion Edge Detection in Chest CT Images
1 Introduction
2 Related Works
2.1 COVID-19 Segmentation
2.2 Edge Detection
3 Methodology
3.1 Task Definition
3.2 Overview of COVID Edge-Net
3.3 The Edge Detection Backbone
3.4 Multi-scale Residual Dual Attention (MSRDA) Module
3.5 Canny Operator Module
3.6 Global Loss Function
4 Experiments and Discussions
4.1 Experimental Settings
4.2 Comparison with State-of-the-Arts
4.3 Ablation Study
4.4 Additional Experiments
5 Conclusions
References
Improving Ambulance Dispatching with Machine Learning and Simulation
1 Introduction
2 Related Work
3 The Data Set: Historic Dispatch Decisions
3.1 Feature Engineering
4 Capturing the Dispatch Policy with a Decision Tree
4.1 Performance Analysis of the Learned Decision Tree and Policy
4.2 The Penalty-Based Closest-Idle Policy
5 Current Policy as a Basis for Improvement
5.1 Evaluating Potential Enhancements Using Simulation
5.2 Performance of the Improved Policy
6 Conclusion
References
Countrywide Origin-Destination Matrix Prediction and Its Application for COVID-19
1 Introduction
2 Related Work
2.1 Crowd and Traffic Flow Prediction
2.2 Mobility-Based COVID-19 Simulation
3 Problem Definition
4 OD Matrix Prediction Model
4.1 Overview
4.2 Origin-Destination Convolution (OD-Conv)
4.3 Origin-Destination Convolutional Recurrent Unit (ODCRU)
4.4 Dynamic Graph Constructor (DGC)
5 OD Matrix Based Epidemic Simulation Model
6 Experiment
6.1 Data
6.2 Setting
6.3 Evaluation on OD Matrix Prediction
6.4 Evaluation on COVID-19 Simulation
7 Conclusion
References
Single Model for Influenza Forecasting of Multiple Countries by Multi-task Learning
1 Introduction
2 Datasets
3 Methods for Finding Search Queries
4 Building a Flu Forecasting Model for Multiple Countries
4.1 Problem Formulation
4.2 Model Structure
4.3 Extension to Multi-task Model
5 Experiments and Results
5.1 Experimental Settings
5.2 Comparative Models
5.3 Results
6 Discussions
6.1 Multi-model Performance for Other Countries
6.2 Comparison of Models Without and with Search Queries
6.3 Analysis of the Methods to Find Search Queries
7 Conclusions
References
Automatic Acoustic Mosquito Tagging with Bayesian Neural Networks
1 Introduction
2 Background
2.1 Mosquito Control Efforts
2.2 Acoustic Machine Learning
3 Methods
3.1 HumBug Pipeline
3.2 Bayesian Neural Networks
4 Model Configuration
5 Results
5.1 Validation Performance
5.2 Automatically Labelling Field Data with Uncertainty Metrics
6 Conclusion
A Appendix
References
Multitask Recalibrated Aggregation Network for Medical Code Prediction
1 Introduction
2 Related Work
3 Method
3.1 Input Layer
3.2 Bidirectional GRU Layer
3.3 Recalibrated Aggregation Module
3.4 Attention Classification Layers
3.5 Multitask Training
4 Experiments
4.1 Datasets
4.2 Settings
4.3 Baselines
4.4 Results
4.5 Ablation Study
4.6 A Detailed Analysis of the Properties of the RAM
5 Conclusion
References
Open Data Science to Fight COVID-19: Winning the 500k XPRIZE Pandemic Response Challenge
1 Introduction
2 Related Work
3 Data
4 Predictors of COVID-19 Cases
4.1 Notation
4.2 SIR Epidemiological Model
4.3 Baseline or Standard Predictor
4.4 ValenciaIA4COVID (V4C) Predictor
5 Prescriptor of Intervention Policies
5.1 Modeling the NPI - COVID-19 Cases Space
5.2 Prescriptors
5.3 Intervention Policy Definition
6 Experimental Results
6.1 Predictor
6.2 Speed and Resource Use
6.3 Prescriptor
7 Conclusions and Future Work
References
Mobility and Transportation
Getting Your Package to the Right Place: Supervised Machine Learning for Geolocation
1 Introduction
2 Supervised Geolocation by Ranking
2.1 Candidate Filtering and Generation
2.2 Feature Vectors
2.3 Base Classifiers and Implementation
3 Experiments
3.1 Datasets
3.2 Loss vs. Business Objective
3.3 How Does It Perform Against Baselines?
3.4 How Is the Tail Affected by Model Capacity?
3.5 Lesion Studies and RankNet Comparison
4 Discussion
4.1 Real-World Offline Evaluations
4.2 Real-World Online Evaluations
4.3 Limitations
5 Related Work
6 Conclusion and Future Work
References
Machine Learning Guided Optimization for Demand Responsive Transport Systems
1 Introduction
1.1 Context
1.2 Motivation
1.3 Contribution
2 Related Work
3 Machine Learning Guided Optimization
4 MLGO Applied to DRT Systems
4.1 Model and Notations
4.2 Generation of Feasible Solutions
4.3 Simulation Framework
4.4 Surrogate Model
4.5 Offline Optimization Framework
5 Experiments
5.1 Choice of a Machine Learning Model for the Optimization
5.2 Computational Results
5.3 Optimization Results
6 Conclusion
References
OBELISC: Oscillator-Based Modelling and Control Using Efficient Neural Learning for Intelligent Road Traffic Signal Calculation
1 Introduction
2 Materials and Methods
2.1 Oscillator-Based Modelling of Traffic Dynamics
2.2 Robust Control of the Oscillator-Based Networked Dynamics
2.3 Representation, Learning, and Dynamics in Neural Networks
3 Experiments and Results
4 Discussion
5 Conclusions
References
VAMBC: A Variational Approach for Mobility Behavior Clustering
1 Introduction
2 Related Work
3 Preprocessing and Problem Definition
4 The VAMBCModel
4.1 Decomposing Hidden Variables
4.2 Training Objectives and Neural Layers
4.3 Network Design
4.4 Relationship to VAE and Gaussian-Mixture VAE
5 Experiments
5.1 Environment and Experiment Settings
5.2 Quantitative Analysis
5.3 Ablation Study
5.4 The Training Progress of VAMBC
6 Conclusion
References
Multi-agent Deep Reinforcement Learning with Spatio-Temporal Feature Fusion for Traffic Signal Control
1 Introduction
2 Related Works
3 Problem Definition
4 Method
4.1 Spatio-Temporal Input Embedding
4.2 Spatio-Temporal Feature Fusion
4.3 Q-Value Prediction
5 Experiment
5.1 Datasets
5.2 Baseline Methods
5.3 Performance Metrics and Parameter Settings
5.4 Comparison with Baseline Methods
5.5 Effect of Spatio-Temporal Feature Fusion Components
6 Conclusion
References
Monte Carlo Search Algorithms for Network Traffic Engineering
1 Introduction
2 Problem Formulation
3 Monte Carlo Search on Routing Problem
3.1 Monte Carlo Search
3.2 Modeling with Monte Carlo Search
3.3 Improvement
4 Experimental Results
4.1 Dataset
4.2 Comparison of the Monte Carlo Algorithms
4.3 Impact of the Metric Space
4.4 Comparison
4.5 Random Dense Graphs
5 Conclusion
References
Energy and Emission Prediction for Mixed-Vehicle Transit Fleets Using Multi-task and Inductive Transfer Learning
1 Introduction
2 Model
2.1 Predicting Energy Consumed and Emissions
2.2 Preliminaries and Model Formulation
3 Approach
3.1 Mapping Vehicle Trajectories to Route Segments
3.2 Generating Samples
3.3 Learning
4 Experiments and Results
4.1 Hyperparameter Tuning and Baseline Models
4.2 Multi-task Model Evaluation
4.3 Inductive Transfer Learning Evaluation
4.4 Discussion
5 Conclusion
References
CQNet: A Clustering-Based Quadruplet Network for Decentralized Application Classification via Encrypted Traffic
1 Introduction
2 Related Work
2.1 Web Application Classification
2.2 Mobile Application Classification
2.3 Decentralized Applications Classification
3 Preliminaries
3.1 DApps Background
3.2 Problem Definition
3.3 Limitation of Existing Methods
4 CQNet
4.1 FE-set (RQ1)
4.2 The Proposed Quadruplet Network (RQ2)
5 Performance Evaluation
5.1 Dataset Collection
5.2 Experiments Settings
5.3 Hyperparameters of CQNet
5.4 Performance Comparison
5.5 Ablation Studies
6 Conclusion
References
SPOT: A Framework for Selection of Prototypes Using Optimal Transport
1 Introduction
2 Background
2.1 Optimal Transport (OT)
2.2 Prototype Selection
2.3 Submodularity
3 SPOT Framework
3.1 SPOT Problem Formulation
3.2 Equivalent Reduced Representations of SPOT Objective
3.3 SPOT Optimization Algorithms
3.4 k-Medoids as a Special Case of SPOT
4 Related Works and Discussion
5 Experiments
5.1 Prototype Selection Within Same Domain
5.2 Prototype Selection from Different Domains
6 Conclusion
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track

Description

More details

Other editions

Additional editions

Content

System requirements