Machine Learning and Knowledge Discovery in Databases

Name: Machine Learning and Knowledge Discovery in Databases | European Conference, ECML PKDD 2015, Porto, Portugal, September 7-11, 2015, Proceedings, Part I
Brand: Springer
Price: 53.49 EUR
Availability: OnlineOnly

European Conference, ECML PKDD 2015, Porto, Portugal, September 7-11, 2015, Proceedings, Part I

Annalisa Appice Pedro Pereira Rodrigues Vítor Santos Costa Carlos Soares João Gama Alípio Jorge(Editor)

Springer (Publisher)

Published on 28. August 2015

LVIII, 709 pages

E-Book

PDF with digital watermarking

System requirements

978-3-319-23528-8 (ISBN)

€53.49incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Intro
Preface
Organization
Abstracts of Invited Talks
Towards Declarative, Domain-OrientedData Analysis
Sum-Product Networks: Deep Modelswith Tractable Inference
Mining Online Networks and Communities
Learning to Acquire Knowledge in a SmartGrid Environment
Untangling the Web's Invisible Net
Towards a Digital Time Machine Fueled by BigData and Social Mining
Abstracts of Journal Track Articles
Contents - Part I
Contents - Part II
Contents - Part III
Research Track Classification, Regression and Supervised Learning
Data Split Strategiesfor Evolving Predictive Models
1 Introduction
2 Data Splits for Model Fitting, Selection,and Assessment
3 Issues with Evolving Models
4 Data Splits for Evolving Models
4.1 Parallel Dump Workflow
4.2 Serial Waterfall Workflow
4.3 Hybrid Workflow
5 Bias Due to Test Set Reuse
6 Illustration on Synthetic Data
7 Case Study: Paraphrase Detection
8 Related Work
9 Conclusions
A Appendix: Bias Due to Test Set Reuse
References
Discriminative Interpolation for Classification of Functional Data
1 Introduction
2 Function Representations and Wavelets
3 Related Work
4 Classification by Discriminative Interpolation
4.1 Training Formulation
4.2 Testing Formulation
5 Experiments
6 Conclusion
References
Fast Label Embeddings via Randomized Linear Algebra
1 Introduction
1.1 Contributions
2 Algorithm Derivation
2.1 Notation
2.2 Background
2.3 Rank-Constrained Estimation and Embedding
2.4 Rembrandt
3 Related Work
4 Experiments
4.1 ALOI
4.2 ODP
4.3 LSHTC
5 Discussion
References
Maximum Entropy Linear Manifold for Learning Discriminative Low-Dimensional Representation
1 Introduction
2 General Idea
3 Theory
4 Closed form Solution for Objective and its Gradient
5 Experiments
6 Conclusions
References
Novel Decompositions of Proper Scoring Rules for Classification: Score Adjustment as Precursor to Calibration
1 Introduction
2 Proper Scoring Rules
2.1 Scoring Rules
2.2 Divergence, Entropy and Properness
2.3 Expected Loss and Empirical Loss
3 Decompositions with Ideal Scores and Calibrated Scores
3.1 Ideal Scores Q and the Decomposition L=EL+IL
3.2 Calibrated Scores C and the Decomposition L=CL+RL
4 Adjusted Scores A and the Decomposition L=AL+PL
4.1 Adjustment
4.2 The Right Adjustment Procedure Guarantees Decreased Loss
5 Decomposition Theorems and Terminology
5.1 Decompositions with S,C,Q,Y
5.2 Decompositions with S,A,C,Q,Y and Terminology
6 Algorithms and Experiments
7 Related Work
8 Conclusions
References
Parameter Learning of Bayesian Network Classifiers Under Computational Constraints
1 Introduction
2 Related Work
3 Background and Notation
4 Algorithms for Online Learning of Reduced-Precision Parameters
4.1 Learning Maximum Likelihood Parameters
4.2 Learning Maximum Margin Parameters
5 Experiments
5.1 Datasets
5.2 Results
6 Discussions
References
Predicting Unseen Labels Using Label Hierarchies in Large-Scale Multi-label Learning
1 Introduction
2 Multi-label Classification
3 Model Description
3.1 Joint Space Embeddings
3.2 Learning with Hierarchical Structures Over Labels
3.3 Efficient Gradients Computation
3.4 Label Ranking to Binary Predictions
4 Experimental Setup
5 Experimental Results
5.1 Learning All Labels Together
5.2 Learning to Predict Unseen Labels
6 Pretrained Label Embeddings as Good Initial Guess
6.1 Understanding Label Embeddings
6.2 Results
7 Conclusions
Regression with Linear Factored Functions
1 Introduction
1.1 Kernel Regression
1.2 Factored Basis Functions
2 Regression
3 Linear Factored Functions
3.1 Function Class
3.2 Constraints
3.3 Regularization
3.4 Optimization
4 Empirical Evaluation
4.1 Demonstration
4.2 Evaluation
5 Discussion
Appendix A LFF Definition and Properties
Appendix B Inner Loop Derivation
Appendix C Proofs of the Propositions
References
Ridge Regression, Hubness, and Zero-Shot Learning
1 Introduction
1.1 Background
1.2 Research Objective and Contributions
2 Zero-Shot Learning as a Regression Problem
3 Hubness Phenomenon and the Variance of Data
4 Hubness in Regression-Based Zero-Shot Learning
4.1 Shrinkage of Projected Objects
4.2 Influence of Shrinkage on Nearest Neighbor Search
4.3 Additional Argument for Placing Target Objects Closer to the Origin
4.4 Summary of the Proposed Approach
5 Related Work
6 Experiments
6.1 Experimental Setups
6.2 Task Descriptions and Datasets
6.3 Experimental Results
7 Conclusion
References
Solving Prediction Games with Parallel Batch Gradient Descent
1 Introduction
2 Problem Setting and Data Transformation Model
3 Analysis of Equilibrium Points
3.1 Existence of Equilibrium Points
3.2 Uniqueness of Equilibrium Points
4 Finding the Unique Equilibrium Point Efficiently
4.1 Inexact Line Search
4.2 Arrow-Hurwicz-Uzawa Method
4.3 Parallelized Methods
5 Experimental Results
5.1 Reference Methods
5.2 Performance of the Parameterized Transformation Model
5.3 Optimization Algorithms
5.4 Parallelized Models
6 Conclusion
References
Structured Regularizer for Neural Higher-Order Sequence Models
1 Introduction
2 Related Work
3 Higher-Order Conditional Random Fields
3.1 Parameter Learning
3.2 Forward Algorithm for 2nd-Order CRFs
4 Structured Regularizer
5 Experiments
5.1 TIMIT Data Set
5.2 Experimental Setup
5.3 Labeling Results Using Only MLP Networks
5.4 Labeling Results Using LC-CRFs with Linear or Neural Higher-Order Factors
6 Conclusion
References
Versatile Decision Trees for Learning Over Multiple Contexts
1 Introduction
2 Dataset Shift
3 Versatile Decision Trees
3.1 Constructing Splits Using Percentiles
3.2 Adapting for Output Shifts
3.3 Versatile Model for Decision Trees
4 Experimental Results
4.1 Generating Synthetic Shifts
4.2 Results of the Synthetic Shifts
4.3 Results on Non-synthetic Shifts
5 Conclusion
References
When is Undersampling Effective in Unbalanced Classification Tasks?
1 Introduction
2 The Warping Effect of Undersampling on the Posterior Probability
3 The Interaction Between Warping and Variance of the Estimator
4 Experimental Validation
4.1 Synthetic Datasets
4.2 Real Datasets
5 Conclusion
References
Clustering and Unsupervised Learning
A Kernel-Learning Approach to Semi-supervised Clustering with Relative Distance Comparisons
1 Introduction
2 Related Work
3 Kernel Learning with Relative Distances
3.1 Basic Definitions
3.2 Relative Distance Constraints
3.3 Extension to a Kernel Space
3.4 Log Determinant Divergence for Kernel Learning
3.5 Problem Definition
4 Semi-supervised Kernel Learning
4.1 Bregman Projections for Constrained Optimization
4.2 Semi-supervised Kernel Learning with Relative Comparisons
Selecting the Bandwidth Parameter.
Semi-Supervised Kernel Learning with Relative Comparisons.
Clustering Method.
5 Experimental Results
5.1 Datasets
5.2 Relative Constraints vs. Pairwise Constraints
5.3 Multi-resolution Analysis
5.4 Generalization Performance
5.5 Effect of Equality Constraints
6 Conclusion
References
Bayesian Active Clustering with Pairwise Constraints
1 Introduction
2 Problem Statement
3 Bayesian Active Clustering
3.1 The Bayesian Clustering Model
Marginalization of Cluster Labels.
3.2 Active Query Selection
Selection Criteria.
Computing the Selection Objectives.
3.3 The Sequential MCMC Sampling of W
3.4 Find the MAP Solution
4 Experiments
4.1 Dataset and Setup
4.2 Effectiveness of the Proposed Clustering Model
4.3 Effectiveness of the Overall Active Clustering Model
4.4 Analysis of the Acyclic Graph Restriction
5 Related Work
6 Conclusion
References
ConDist: A Context-Driven Categorical Distance Measure
1 Introduction
2 Related Work
3 The Distance Measure ConDist
3.1 Definition of ConDist
3.2 Attribute Distance dX
3.3 Attribute Weighting Function wX
3.4 Correlation, Context and Impact
3.5 Heterogeneous Data Sets
4 Experiments
4.1 Evaluation Methodology
4.2 Experiment 1 -- Context Attribute Selection
4.3 Experiment 2 -- Comparison in the Context of Classification
4.4 Experiment 3 -- Comparison in the Context of Clustering
5 Discussion
5.1 Experiment 1 -- Context Attribute Selection
5.2 Experiment 2 -- Comparison in the Context of Classification
5.3 Experiment 3 -- Comparison in the Context of Clustering
6 Summary
References
Discovering Opinion Spammer Groups by Network Footprints
1 Introduction
2 Measuring Network Footprints
2.1 Neighbor Diversity of Nodes
2.2 Self-Similarity in Real-World Graphs
2.3 NFS Measure
3 Detecting Spammer Groups
4 Evaluation
4.1 Performance of NFS on Synthetic Data
4.2 Performance of GroupStrainer on Synthetic Data
4.3 Results on Real-World Data
5 Related Work
6 Conclusion
References
Gamma Process Poisson Factorization for Joint Modeling of Network and Documents
1 Introduction
2 Background and Related Work
2.1 Negative Binomial Distribution
2.2 Gamma Process
2.3 Network Modeling, Topic Modeling and Count Matrix Factorization
3 Joint Gamma Process Poisson Factorization (J-GPPF)
3.1 Inference via Gibbs Sampling
3.2 Special Cases: Network Only GPPF (N-GPPF) and Corpus Only GPPF (C-GPPF)
3.3 Computation Complexity
4 Experimental Results
4.1 Experiments with Synthetic Data
4.2 Experiments with Real World Data
5 Conclusion
References
Generalization in Unsupervised Learning
1 Introduction
1.1 Preliminaries and Setup
2 A General Learning Framework
2.1 Generalization and Stability
3 Empirical Generalization Analysis
3.1 Estimating n From a Finite Data Set
3.2 The Trend of "0362n and The Stability Line
3.3 Comparing Two Algorithms: A1 vs. A2
4 Empirical Validation on Real Data Sets
4.1 Generalization Assessment of k--Means Clustering
4.2 Generalization Assessment of PCA, LEM, and LLE
5 Concluding Remarks
References
Multiple Incomplete Views Clustering via Weighted Nonnegative Matrix Factorization with L2,1 Regularization
1 Introduction
2 Problem Formulation and Backgrounds
2.1 Problem Formulation
2.2 Weighted Nonnegative Matrix Factorization
3 Multi-Incomplete-View Clustering
3.1 Objective Function of MIC
3.2 Optimization
Fixing {U(i)} and {V(i)} , minimize O over U* .
Fixing U* , minimize O over {U(i)} and {V(i)} .
4 Experiments and Results
4.1 Comparison Methods
4.2 Dataset
4.3 Results
4.4 Parameter Study
4.5 Convergence Study
5 Related Work
6 Conclusion
References
Solving a Hard Cutting Stock Problem by Machine Learning and Optimisation
1 Introduction
2 Cutting Stock Problems
3 Problem Formalization
4 ILP Model for the CSAWCSP
5 Machine Learning Approach for the CSAWCSP
5.1 Distribution Learning
5.2 Generating Uniformly Distributed Random Vectors
5.3 k-Medoids Clustering
6 Emprirical Study
7 Conclusions and Future Work
References
Data Preprocessing
Markov Blanket Discovery in Positive-Unlabelled and Semi-supervised Data
1 Introduction
2 Background: Markov Blanket
2.1 Markov Blanket Discovery Algorithms
2.2 Testing Conditional Independence in Categorical Data
2.3 Suggested Approach for Semi-supervised MB Discovery
3 Background: Partially-Labelled Data
3.1 Positive-Unlabelled Data
3.2 Semi-supervised Data
4 Markov Blanket Discovery in Positive-Unlabelled Data
4.1 Testing Conditional Independence in PU Data
4.2 Evaluation of Markov Blanket Discovery in PU Data
5 Markov Blanket Discovery in Semi-supervised Data
5.1 Testing Conditional Independence in Semi-supervised Data
5.2 Incorporating Prior Knowledge on Markov Blanket Discovery
6 Exploring our Framework Under Class Prior Change --- When and how the Unlabelled Data Help
7 Conclusions and Future Work
A Generation of Network Data and Experimental Protocol
References
Multi-view Semantic Learning for Data Representation
1 Introduction
2 Related Work
2.1 Common Notations
2.2 NMF-Based Latent Subspace Learning
3 Multi-view Semantic Learning
3.1 Matrix Factorization with Multi-view Data
3.2 Graph Embedding for Multi-view Semantic Learning
3.3 Sparseness Constraint
3.4 Objective Function of MvSL
4 Optimization
4.1 Optimizing {U(v)}v=1H
4.2 Optimizing V
5 Experiment
5.1 Data Set
5.2 Baselines
5.3 Evaluation Metric
5.4 Experiment Results
5.5 Parameter Sensitive Analysis
6 Conclusion
References
Unsupervised Feature Analysis with Class Margin Optimization
1 Introduction
2 Notations and Definitions
3 Proposed Method
4 Optimization
5 Experiments
5.1 Experiment Setup
5.2 Experimental Results
5.3 Studies on Parameter Sensitivity and Convergence
6 Conclusion
References
Data Streams and Online Learning
Ageing-Based Multinomial Naive Bayes Classifiers Over Opinionated Data Streams
1 Introduction
2 Related Work
3 Basic Concepts
3.1 Basic Model: Multinomial Naive Bayes
4 Ageing-Based Multinomial Naive Bayes
4.1 Ageing-Based MNB Model
4.2 Ageing-Based MNB Classification
4.3 Aggressive Fading MNB Alternative
5 Experiments
5.1 Data and Concept Changes
5.2 Evaluation Methods and Evaluation Measures
5.3 Classifier Performance
5.4 Impact of the Fading Factor on the New Algorithms
5.5 The Effect of Temporal Granularity and How to Set
6 Conclusions and Outlook
References
Drift Detection Using Stream Volatility
1 Introduction
2 Related Work
3 Preliminaries
4 Our Concept and Design
4.1 Predictive Approach
4.2 Online Adaptation Approach
4.3 Application onto ADWIN
5 Experimental Evaluation
5.1 False Positive Test
5.2 True Positive Test
5.3 False Negative Test
5.4 Real-World Data: Power Supply Dataset
5.5 Case Study: Incremental Classifier
6 Conclusion and Future Work
References
Early Classification of Time Series as a Non Myopic Sequential Decision Making Problem
1 Introduction
2 A Generic Framework and Positions of Related Works
3 A Formal Analysis and a Naïve Approach
4 The Proposed Approach
5 Implementation
6 Experiments
6.1 Controlled Experiments
6.2 Experiments on a Real Data Set
7 Conclusion and Future Works
References
Ising Bandits with Side Information
1 Introduction
2 Background and Preliminaries
2.1 Semi-supervised Graph Classifier Complexity
2.2 Ising Model at Low Temperature
2.3 Multi-Armed Bandit Problem (MAB)
2.4 Formulation
3 Maximum Flow Computation
3.1 Playing Ising Bandits
4 Experiments
4.1 Dataset Description
4.2 Synthetic Dataset
4.3 Graph Generation from Datasets
4.4 Evaluation Criteria
4.5 Results
5 Conclusion
References
Refined Algorithms for Infinitely Many-Armed Bandits with Deterministic Rewards
1 Introduction
2 Model Formulation and Lower Bound
3 Optimal Sample Size
4 Extensions
4.1 Anytime Algorithm
4.2 Non-retainable Arms
5 Experiments
5.1 Retainable Arms
5.2 Anytime Algorithm
5.3 Non-Retainable Arms
6 Conclusion and Discussion
References
Deep Learning
An Empirical Investigation of Minimum Probability Flow Learning Under Different Connectivity Patterns
1 Introduction
2 Restricted Boltzmann Machines
3 Minimum Probability Flow
3.1 Dynamics of the Model
3.2 Form of the Transition Matrix
4 Probability Flow Rates
4.1 1-bit Flip Connections
4.2 Factorized Minimum Probability Flow
4.3 Persistent Minimum Probability Flow
5 Experiments
5.1 MNIST - Exact Log Likelihood
5.2 MNIST - Estimating Log Likelihood
5.3 Caltech 101 Silhouettes - Estimating Log Likelihood
6 Conclusion
References
A Minimum Probability Flow
A.1 Dynamics of The Model
Difference Target Propagation
1 Introduction
2 Target Propagation
2.1 Formulating Targets
2.2 How to Assign a Proper Target to Each Layer
2.3 Difference Target Propagation
2.4 Training an Auto-Encoder with Difference Target Propagation
3 Experiments
3.1 Deterministic Feedforward Deep Networks
3.2 Networks with Discretized Transmission Between Units
3.3 Stochastic Networks
3.4 Auto-Encoder
4 Conclusion
References
A Proof of Theorem 1
B Proof of Theorem 2
Online Learning of Deep Hybrid Architectures for Semi-supervised Categorization
1 Introduction
2 Related Work
3 Deep Hybrid Architectures
3.1 The Stacked Boltzmann Experts Network (SBEN)
3.2 Hybrid Stacked Denoising Auto-Encoders (HSDA)
3.3 Ensembling of Layer-Wise Experts
4 Experimental Results
4.1 Finite Dataset Learning Performance
4.2 Incremental Learning Performance
5 Conclusions
References
Scoring and Classifying with Gated Auto-Encoders
1 Introduction
2 Gated Auto-Encoders
3 Gated Auto-Encoder Scoring
3.1 Vector Field Representation
3.2 Scoring the GAE
4 Relationship to Restricted Boltzmann Machines
4.1 Gated Auto-Encoder and Factored Gated Conditional Restricted Boltzmann Machines
4.2 Mean-Covariance Auto-Encoder and Mean-Covariance Restricted Boltzmann Machines
5 Classification with Gated Auto-Encoders
5.1 Classification Using Class-Specific Gated Auto-Encoders
5.2 Multi-label Classification via Optimization in Label Space
6 Conclusion
References
Sign Constrained Rectifier Networks with Applications to Pattern Decompositions
1 Introduction
2 The Categories of Separable Pattern Sets
3 Binary Classification with Rectifier Networks
4 Single-Hidden-Layer Sign Constrained Rectifier Networks
5 Two-Hidden-Layer Sign Constrained Rectifier Networks
6 Discussion
References
Aggregation Under Bias: Rényi Divergence Aggregation and Its Implementation via Machine Learning Markets
1 Introduction
2 Background
3 Problem Statement
4 Weighted Divergence Aggregation
4.1 Weighted Rényi Divergence Aggregation
Properties.
5 Maximum Entropy Arguments
Interim Summary.
6 Implementation
7 Experiments
Task 1: Aggregation on Simulated Sata.
Task 2: Aggregation on Chords from Bach Chorales.
Task 3: Aggregation on Kaggle Competition.
Results.
8 Machine Learning Markets and Rényi Divergence Aggregation
9 Discussion
References
Distance and Metric Learning
Higher Order Fused Regularization for Supervised Learning with Grouped Parameters
1 Introduction
2 Regularized Supervised Learning
3 Higher Order Fused Regularizer
3.1 Review of Submodular Functions and Robust Pn Potential
3.2 Definition of HOF Penalty
4 Optimization
4.1 Proximity Operator via Minimum-Norm-Point Problem
4.2 Network Flow Algorithm
5 Related Work
6 Experiments
6.1 Synthetic Data
6.2 Real-World Data
7 Conclusion
References
Joint Semi-supervised Similarity Learning for Linear Classification
1 Introduction
2 Related Work
3 Joint Metric and Classifier Learning
4 Generalization Bound for Joint Similarity Learning
5 Experiments
5.1 Experimental Setting
5.2 Experimental Results
6 Conclusion
References
Learning Compact and Effective Distance Metrics with Diversity Regularization
1 Introduction
2 Related Works
3 Diversify Distance Metric Learning
3.1 A Latent Space Modeling View of DML
3.2 Diversify DML
3.3 Optimization
4 Experiments
4.1 Datasets
4.2 Experimental Settings
4.3 Retrieval
4.4 Clustering
4.5 Classification
4.6 Sensitivity to Parameters
5 Conclusions
References
Scalable Metric Learning for Co-Embedding
1 Introduction
2 Metric Learning
3 Co-embedding as Metric Learning
4 Algorithm
5 Empirical Computational Complexity
6 Case Study: Multi-label Classification
7 Case Study: Tagging via Tensor Completion
8 Conclusion
References
A An Auxiliary Lemma
Large Scale Learning and Big Data
Adaptive Stochastic Primal-Dual Coordinate Descent for Separable Saddle Point Problems
1 Introduction
2 Primal-dual Framework for Convex-Concave Saddle Point Problems
3 Adaptive Stochastic Primal-Dual Coordinate Descent
3.1 Convergence Analysis
3.2 More Comparison with SDPC
4 Empirical Results
4.1 Ridge Regression
4.2 Binary Classification on Real-world Datasets
5 Conclusion and Future Work
References
Hash Function Learning via Codewords
1 Introduction
2 Formulation
3 Learning Algorithm
4 Insights to Generalization Performance
5 Experiments
5.1 Supervised Hash Learning Results
5.2 Transductive Hash Learning Results
6 Conclusions
References
HierCost: Improving Large Scale Hierarchical Classification with Cost Sensitive Learning
1 Introduction
2 Definitions and Notations
3 Motivation and Related Work
4 Methods
4.1 Cost Calculations
4.2 Optimization
4.3 Dealing with Hierarchical Multi-label Classification
5 Experimental Evaluations
5.1 Datasets
5.2 Evaluation Metrics
5.3 Experimental Details
5.4 Methods for Comparison
5.5 Results
6 Conclusions
References
Large Scale Optimization with Proximal Stochastic Newton-Type Gradient Descent
1 Introduction and Problem Statement
1.1 Notations and Assumptions
2 The PROXTONE Method
2.1 The Regularized Quadratic Model in Algorithm 2
2.2 The Hessian Approximation
3 Convergence Analysis
4 Numerical Experiments
5 Conclusions
A Proof of Theorem 1
B Proof of Theorem 2
References
Erratum to: Bayesian Active Clustering with Pairwise Constraints
Erratum to: Scalable Metric Learning for Co-Embedding
Author Index
Erratum to: Predicting Unseen Labels Using Label Hierarchies in Large-Scale Multi-label Learning

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Machine Learning and Knowledge Discovery in Databases

Description

More details

Other editions

Additional editions

Content

System requirements