Machine Learning and Knowledge Discovery in Databases

Name: Machine Learning and Knowledge Discovery in Databases | European Conference, ECML PKDD 2010, Athens, Greece, September 5-9, 2011, Proceedings, Part I
Brand: Springer
Price: 53.49 EUR
Availability: OnlineOnly

European Conference, ECML PKDD 2010, Athens, Greece, September 5-9, 2011, Proceedings, Part I

Dimitrios Gunopulos Thomas Hofmann Donato Malerba Michalis Vazirgiannis(Editor)

Springer (Publisher)

Published on 6. September 2011

XXX, 649 pages

E-Book

PDF with digital watermarking

System requirements

978-3-642-23780-5 (ISBN)

€53.49incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Title
Preface
Organization
Table of Contents
Invited Talks (Abstracts)
Enriching Education through Data Mining
References
Human Dynamics: From Human Mobility to Predictability
Embracing Uncertainty: Applied Machine Learning Comes of Age
Highly Dimensional Problems in Computational Advertising
Learning from Constraints
Permutation Structure in 0-1 Data
Industrial Invited Talks (Abstracts)
Reading Customers Needs and Expectations with Analytics
Algorithms and Challenges on the GeoWeb
Data Science and Machine Learning at Scale
Smart Cities: How Data Mining and Optimization Can Shape Future Cities
Regular Papers
Preference-Based Policy Learning
Introduction
Related Work
Preference-Based Policy Learning
PPL Overview and Notations
The Behavioral Representation
Policy Return Estimate Learning
New Policy Generation
Initialization
Convergence Study
Experimental Validation
Experiment Goals and Experimental Setting
2D RiverSwim
Reaching the End of the Maze
Synchronized Exploration
Conclusion and Perspectives
References
Constraint Selection for Semi-supervised Topological Clustering
Introduction
Integrating Constraints in SOM
Constraints
The Topological Constrained Algorithm
Constraint Selection
Informativeness
Coherence
Hard Selection
Soft Selection
Experimental Results
Evaluation of the Proposed Approach
Results of Constraint Selection
Results of Selection on FCPS Data Sets
Results of Selection on Leukemia Data Set
Visualization
Conclusion
References
Is There a Best Quality Metric for Graph Clusters?
Introduction
Related Work
Quality Metrics
Graph Definitions
Modularity
Silhouette Index
Conductance
Coverage
Performance
Clustering Algorithms
Markov Clustering
Bisecting K-means
Spectral Clustering
Normalized Cut
Experiments
Methodology
Results
Conclusion
References
Adaptive Boosting for Transfer Learning Using Dynamic Updates
Introduction
Boosting-Based Transfer Learning
Proposed Algorithm
Algorithm Description
Theoretical Analysis of the Algorithm
Empirical Analysis
``Weight Drift'' and ``Correction Factor'' (Theorems 1, 2, 3, 5)
Rate of Convergence (Theorem 2)
Sum of Source Weights (Theorem 5, Axiom 1)
Experimental Results on Real-World Datasets
Experiment Setup
Real-World Datasets
Experimental Results
Discussion and Extensions
Conclusion
References
Peer and Authority Pressure in Information-Propagation Models
Introduction
Related Work
Peer and Authority Models
Methodology
Measuring Social Influence
Randomization Test
Experimental Results
Datasets and Implementation
Gain of Authority Integration
Analyzing the MemeTracker Dataset
Analyzing the Bibsonomy Dataset
Experiments on Synthetic Data
Conclusions
References
Constrained Logistic Regression for Discriminative Pattern Mining
Introduction
Existing Methods
Need for Constrained Models
Related Work
Our Contributions
Preliminaries
Logistic Regression
Supervised Distribution Difference
Proposed Algorithm
Experimental Results
Results on Synthetic Datasets
The Comparison of the Distance Measure
The Sensitivity of the Distance Measure
Conclusion
References
a-Clusterable Sets
Introduction
Background Material
Kleinberg's Axioms
Window Density Function
Proposed Theoretical Framework
Experimental Framework
Experimental Results
Investigate the Effect of the Parameter
Comparing Proposed Algorithm against Well-Known Clustering Algorithms
Investigate the Scalability of the Algorithm
Conlusions
References
Privacy Preserving Semi-supervised Learning for Labeled Graphs
Introduction
Privacy in Labeled Graphs
Labeled Graph
Matrix Partitioning Model
Graph Privacy Model
Our Approach
Problem Statement
Decentralized Label Propagation
Cryptographic Tools
The Main Protocol
Privacy Preserving Label Propagation
Security of the protocol
Output Privacy of Label Propagation
Expansion to Directed Graphs
Experimental Analysis
Privacy-Accuracy Trade-Off
Computational Efficiency
Conclusion
References
Novel Fusion Methods for Pattern Recognition
Introduction
Multiple Kernel Learning
Classifier Fusion with Non-Linear Constraints
Multiclass Classifier Fusion with Non-Linear Constraints
Extended Stacking
Experiments and Discussion
Pascal VOC 2007
Flower 17
Flower 102
Caltech101
Conclusions
References
A Spectral Learning Algorithm for Finite State Transducers
Introduction
Probabilistic Finite State Transducers
A Spectral Learning Algorithm
Recovering the Original FST Parameters
Theoretical Analysis
Learning Model
Results
Proofs
Synthetic Experiments
Experiments on Transliteration
Conclusions
References
An Analysis of Probabilistic Methods for Top-N Recommendation in Collaborative Filtering
Introduction
Evaluating Recommendations: A Review
Collaborative Filtering in a Probabilistic Framework
Modeling Preference Data
Item Ranking
Evaluation
Predicted Rating
Item Selection and Relevance
Discussion
Conclusion and Future Works
References
Learning Good Edit Similarities with Generalization Guarantees
Introduction
Notations and Related Work
Learning with Good Similarity Functions
String Edit Similarity Learning
Learning ($epsilon$, ?, t)-Good Edit Similarity Functions
An Exponential-Based Edit Similarity Function
Learning the Edit Costs: Problem Formulation
Theoretical Guarantees
Discussion on the Matching Function
Experimental Results
Conclusion and Future Work
References
Constrained Laplacian Score for Semi-supervised Feature Selection
Introduction and Motivation
Related Work
Laplacian Score
Constraint Score
Constrained Laplacian Score
Spectral Graph Based Formulation
SOM Algorithm
Results
Data Sets and Methods
Validation of Feature Selection
Comparison of the Feature Selection Quality
Results on Gene Expression Data Sets
Results on Face-Image Data Sets
Conclusion
References
COSNet: A Cost Sensitive Neural Network for Semi-supervised Learning in Graphs
Introduction
Semi-supervised Learning in Graphs
Hopfield Networks
Learning Issues in Hopfield Networks
Sub-network Property
COSNet
Generating a Temporary Solution
Finding the Optimal Parameters
Network Dynamics
COSNet Covers Hopfield Networks Learning Issues
Results and Discussion
Experimental Set-Up
Results
Conclusions
References
Regularized Sparse Kernel Slow Feature Analysis
Introduction
Slow Feature Analysis
Kernel SFA
The RSK-SFA Algorithm
Sparse Subset Selection
Matching Pursuit for Online MAH
Empirical Validation
Benchmark Data Sets
Algorithm Performance
Sparsity
Discussion
References
A Selecting-the-Best Method for Budgeted Model Selection
Introduction
Expected Greedy Reward for K=2 Alternatives
Extension to K&2 Alternatives
The Clark Approximation
Experiments
Conclusion
References
A Robust Ranking Methodology Based on Diverse Calibration of AdaBoost
Introduction
Related Work
Definition of the Ranking Problem
The Calibration of Multi-class Classification Models
Regression Based Pointwise Calibration
Class Probability Calibration and Its Implementation
Ensemble of Ensembles
DCG Bound for Class Probability Estimation
Experiments
Comparison to Standard Learning to Rank Methods
The Diversity of CPC Outputs
Conclusions
References
Active Learning of Model Parameters for Influence Maximization
Introduction
Preliminaries and Motivation
The Linear Threshold Model
Sensitivity of Model Parameters
Active Model Parameter Learning for Influence Maximization
The Weighted Sampling Algorithm
Experimental Evaluation
Network Datasets
Experimental Setting
Experimental Results
Related Work
Conclusions
References
Sampling Table Configurations for the Hierarchical Poisson-Dirichlet Process
Introduction
The Hierarchical Poisson-Dirichlet Process
Related Methods
New Table Representation of the HPDP
Block Gibbs Sampling Algorithm
Block Gibbs Sampler
Constraint Analysis
Application to the HDP-LDA Model
Experiments
Experiment Setup and Evaluation Criteria
Parameter Setting
Perplexities
Convergence Speed
Conclusion
References
Preference-Based Policy Iteration: Leveraging Preference Learning for Reinforcement Learning
Introduction
Approximate Policy Iteration
Preference-Based Reinforcement Learning
Label Ranking
Preference-Based Approximate Policy Iteration
Case Study I: Exploiting Action Preferences
Application Domains
Experimental Setup
Complete State Evaluations
Partial State Evaluations
Case Study II: Learning from Qualitative Feedback
Cancer Clinical Trials Domain
A Preference-Based Approach
Experimental Setup and Results
Conclusions
References
Learning Recommendations in Social Media Systems by Weighting Multiple Relations
Introduction
Prior Art
Relational Graph
Weight Learning
Evaluation
Conclusion
References
Clustering Rankings in the Fourier Domain
Introduction
Background and Preliminaries
Setup and First Notations
The Fourier Transform on Sn
Sparse Fourier Representations of Rank Data
Spectral Feature Selection and Sparse Clustering
Numerical Experiments
Conclusion
References
PerTurbo: A New Classification Algorithm Based on the Spectrum Perturbations of the Laplace-Beltrami Operator
Introduction
State-of-the Art
A New Classification Method
A Kernel Machine View on the Perturbation Measure
Perturbation Measure and Regularization Techniques
Active Learning
Experimental Assesment
Classification Performances
Active Learning Evaluation
Conclusion
References
Datum-Wise Classification: A Sequential Approach to Sparsity
Introduction
Datum-Wise Sparse Classifiers
Datum-Wise Sparsity
Datum-Wise Sparse Sequential Classification
Reward Maximization and Loss Minimization
Inference and Approximated Decision Processes
Learning
Preventing Overfitting in the Sequential Model
Complexity Analysis
Experiments
Results
Related Work
Conclusion
References
Manifold Coarse Graining for Online Semi-supervised Learning
Introduction
Basics and Notations
Spectral View of Label Propagation
Manifold Coarse Graining
Exact Coarse Graining
Approximate Coarse Graining
Preserving Manifold Structure
Experiments
Eigenvector Preservation
Online Classification
Manifold Structure Preservation
Outlier Robustness
Conclusion
References
Learning from Partially Annotated Sequences
Introduction
Related Work
Preliminaries
Transductive Loss-Augmented Perceptrons
The Structured Perceptron
Loss-Augmented Perceptrons
Transductive Perceptrons for Partially Labeled Data
Empirical Results
English CoNLL 2003
Wikipedia - Mono-Lingual Experiment
Wikipedia - Cross-Lingual Experiment
Execution Time
Conclusion
References
The Minimum Transfer Cost Principle for Model-Order Selection
Introduction
Minimum Transfer Costs
Notational Preliminaries
Minimum Transfer Costs
The Easy Case: Gaussian Mixture Models
Model Order Selection for Truncated SVD
Image Denoising with Rank-Limited SVD
Denoising Boolean Matrices with SVD
Minimum Transfer Costs for Boolean Matrix Factorization
Minimum Transfer Costs for Non-factorial Models
Transfer Costs for k-means Clustering
Related Work
Conclusion
References
A Geometric Approach to Find Nondominated Policies to Imprecise Reward MDPs
Introduction
Theoretic Background
Markov Decision Process
Imprecise Reward MDP
Reward Functions Based on Features
The $pi$Witness Algorithm
Witness Reward Functions
Efficient Small Subset of Policies
The $pi$WitnessBound Algorithm
A Geometric Approach to Find Nondominated Policies
Space of Feature Vector
Finding Nondominated Policies
Normal Vectors and Reward Constraints
Initial Polytope
The $pi$Hull Algorithm
Complexity Analysis
Experiments
Conclusion
References
Label Noise-Tolerant Hidden Markov Models for Segmentation: Application to ECGs
Introduction
Related Work
Hidden Markov Models for Segmentation
Hidden Markov Models
Algorithms for Inference
A Label Noise-Tolerant Algorithm
Label Noise Modelling
Finding the HMM Parameters with a Label Noise Model
ECG Segmentation
ECG Signals
State of the Art
Experimental Settings
Experimental Results
Noise-Free Results
Results with Horizontal Noise
Results with Uniform Noise
Conclusion
References
Building Sparse Support Vector Machines for Multi-Instance Classification
Introduction
``Label-Mean'' Formulation for MI Classification
Sparse SVM for MI Classification
Model
Optimization Strategy
Algorithm and Extension
Experimental Results
Synthetic Data Examples
Results on Real-World Data
Conclusions
References
Lagrange Dual Decomposition for Finite Horizon Markov Decision Processes
Markov Decision Processes
Non-stationary Policies
Stationary Policies
Dual Decomposition
Naive Dual Decomposition
Dynamic Dual Decomposition
The Slave Problem
The Master Problem
Algorithm Overview
Experiments
Chain Problem
Mountain Car Problem
Puddle World
Discussion
References
Unsupervised Modeling of Partially Observable Environments
Introduction
Topological Temporal Hebbian Self-Organizing Map
Network Activation
Learning
Temporal Network for Transitions
Network Activation
Learning
Aging the TNT
Varying the Parameters
Experiments
Setup
Results
Discussion
References
Tracking Concept Change with Incremental Boosting by Minimization of the Evolving Exponential Loss
Introduction
Preliminaries
Incremental Boosting (IBoost)
IBoost Flowchart
IBoost for Concept Change
Experiments
Data Sets
Algorithms
Results
Conclusion
References
Fast and Memory-Efficient Discovery of the Top-k Relevant Subgroups in a Reduced Candidate Space
Introduction
Preliminaries
Subgroup Discovery
Optimistic Estimate Pruning
The Theory of Relevance
Closure Operators and Their Connection to Relevance
Relevant Subgroup Discovery
An Illustrative Example
Existing Approaches, Challenges and Pitfalls
An Iterative Deepening Approach
A Relevance Check Based On The Top-k Subgroups Visited
The Algorithm
Complexity
Experimental Results
Implementation and Setup
Comparison with CPosSd
Comparison with Other Subgroup Miners
Conclusions
References
Linear Discriminant Dimensionality Reduction
Introduction
Notations
A Review of LDA and Fisher Score
Linear Discriminant Analysis
Fisher Score for Feature Selection
Linear Discriminant Dimensionality Reduction
Proximal Gradient Descent
Accelerated Proximal Gradient Descent
Related Work
Experiments
Data Sets
Parameter Settings
Recognition Results
Projection Matrices
Selected Features
Sensitivity to the Regularization Parameter
Conclusion
References
DB-CSC: A Density-Based Approach for Subspace Clustering in Graphs with Feature Vectors
Introduction
Related Work
A Density-Based Clustering Model for Combined Data
Cluster Model for a Single Subspace
Overall Subspace Clustering Model
The DB-CSC Algorithm
Finding Clusters in a Single Subspace
Finding Clusters in Different Subspaces
Experimental Evaluation
Conclusion
References
Learning the Parameters of Probabilistic Logic Programs from Interpretations
Introduction
Probabilistic Logic Programming Concepts
Learning from Interpretations
Full Observability
Partial Observability
The LFI-ProbLog Algorithm
Computing the BDD for an Interpretation
Automated Theory Splitting
Calculate Expected Counts
Experiments
WebKB
Smokers
Related Work
Conclusions
References
Feature Selection Stability Assessment Based on the Jensen-Shannon Divergence
Introduction
Problem Formulation
Feature Selection and Ranking
Similarity Measures
The Stability for a Set of Lists
Stability Based on the Jensen-Shannon Divergence
Extension to Partial Ranked Lists
Extension to Top-k Lists
Empirical Study
Illustration on Artificial Outcomes
Evaluation on an Spectral Dataset
Conclusions
References
Mining Actionable Partial Orders in Collections of Sequences
Introduction
Overview of the Method
Related Work and Contributions
Foundations
Reference Model of a Collection of Sequences
Probability of a Serial and Parallel Pattern
Partially Ordered Sets of Items
Mining Actionable Partial Orders
Expected Frequency of a Poset
Algorithm for Computing the Probability of a Poset
Pruning Non-significant and Redundant Patterns
Algorithm for Mining Actionable Partial Orders
Experiments
Conclusions
References
A Game Theoretic Framework for Data Privacy Preservation in Recommender Systems
Introduction
Related Work
Our Contribution
Model and Problem Definition
Ratings and Recommendation
Privacy Metric
Recommendation Quality
Problem Formulation
The Case of a Hybrid Recommendation System
Model Specifics
Game Theoretic Analysis
Special Case: N=2 Users
Numerical Results
Conclusion
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Machine Learning and Knowledge Discovery in Databases

Description

More details

Other editions

Additional editions

Content

System requirements