Advances in Knowledge Discovery and Data Mining, Part I

Name: Advances in Knowledge Discovery and Data Mining, Part I | 16th Pacific-Asia Conference, PAKDD 2012, Kuala Lumpur, Malaysia, May 29-June1, 2012, Proceedings, Part I
Brand: Springer
Price: 53.49 EUR
Availability: OnlineOnly

16th Pacific-Asia Conference, PAKDD 2012, Kuala Lumpur, Malaysia, May 29-June1, 2012, Proceedings, Part I

Pang-Ning Tan Sanjay Chawla Chin Kuan Ho James Bailey(Editor)

Springer (Publisher)

Published on 10. May 2012

XXIII, 619 pages

E-Book

PDF with digital watermarking

System requirements

978-3-642-30217-6 (ISBN)

€53.49incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Persons

Content

Title Page
Preface
Organization
Table of Contents - Part I
Supervised Learning: Active, Ensemble, Rare-Class and Online
Time-Evolving Relational Classification and Ensemble Methods
Introduction
Related Work
Temporal-Relational Classification Framework
Temporal Granularity
Temporal Influence: Links, Attributes, Nodes
Temporal-Relational Classifiers
Temporal Ensemble Methods
Methodology
Datasets
Temporal Models
Empirical Results
Single Models
Temporal-Ensemble Models
Conclusion
References
Active Learning for Hierarchical Text Classification
Introduction
A Novel Multi-oracle Setting
A New Framework of Hierarchical Active Learning
Unlabeled Pool Building Policy
Leveraging Oracle Answers
Experimental Configuration
Datasets
Performance Measure
Active Learning Setup
Empirical Study
Standard Hierarchical Active Learner
Leveraging Positive Examples in Hierarchy
Leveraging Negative Examples in Hierarchy
Conclusion
References
TeamSkill Evolved: Mixed Classification Schemes for Team-Based Multi-player Games
Introduction
Related Work
Proposed Approaches
TeamSkill-AllK-Ev-OL1
TeamSkill-AllK-Ev-OL2
TeamSkill-AllK-Ev-OL3
Using Game-Specific Data during Classification
TeamSkill-AllK-EVGen
TeamSkill-AllK-EVMixed
Evaluation
Dataset
Overall Results
Results over Time
Online Classification Variants
Discussion
Conclusions
References
A Novel Weighted Ensemble Technique for Time Series Forecasting
Introduction
Forecasts Combination Methods
The Proposed Ensemble Technique
Mathematical Description
Optimization of the Combination Weights
Approach for Weights Determination
Three Time Series Forecasting Models
Autoregressive Integrated Moving Average (ARIMA)
Artificial Neural Networks (ANNs)
Elman Artificial Neural Networks (EANNs)
Experiments and Discussions
Conclusions
References
Techniques for Efficient Learning without Search
Introduction
The AnDE Family of Algorithms
AODE
AnDE
Optimising Memory Consumption
Optimising Testing Time
Evaluation
Test Environment
Optimised Memory Consumption
Optimised Testing
The Evaluation of A3DE
A3DE Performance on Large Datasets
Conclusions
References
An Aggressive Margin-Based Algorithm for Incremental Learning
Introduction
Online Passive-Aggressive Algorithm
Incremental Passive-Aggressive Learning Algorithm
Experiments
Conclusion
References
Two-View Online Learning
Introduction
Related Work
Two-View Online Passive Aggressive Learning
Problem Setting
Relationship between Views
Two-View Passive Aggressive Algorithm
Performance Evaluation
View Difference Comparison
Ads Dataset
Product Review Dataset
WebKB Course Dataset
Conclusion and Open Problems
References
A Generic Classifier-Ensemble Approach for Biomedical Named Entity Recognition
Introduction
The Generic Genetic Classifier-Ensemble Approach
Feature Set and SVM Based Classifier
Generic Genetic Classifier-Ensemble Algorithm
Experiments and Results
Conclusion and Future Work
References
Neighborhood Random Classification
Introduction
Basic Concepts
Notations
Neighborhood Structure
Neighborhood Classifiers
Partition by Neighborhood Graphs
Ensemble Method Classifier Based on Neighborhood
Sampling Procedures
Aggregating Function
Evaluation
Implementation of RNC
Other Methods
The Test
Computational Analysis
Conclusion and Further Work
References
SRF: A Framework for the Study of Classifier Behavior under Training Set Mislabeling Noise
Introduction
Background and Related Work
The Sigmoid Rule Framework
Sigmoid Rule Framework (SRF) Dimensions
Comparing Algorithms
Experimental Evaluation
Using SRF
Statistical Analysis
Conclusions
References
Building Decision Trees for the Multi-class Imbalance Problem
Introduction
Methods
Decomposition Techniques
Decision Trees
Analysis of the Splitting Criteria
Experiments
Configuration
Statistical Tests
Results
Related Work
Conclusion and Discussion
References
Scalable Random Forests for Massive Data
Introduction
Related Work
Scalable Random Forest Algorithm
Breadth-First Random Forest Construction
Scalable Random Forest Algorithm
Mapper, Reducer and Controller
Experiments
Data Sets
Experiment Settings
Performance Results
Scalability
Conclusions
References
Hybrid Random Forests: Advantages of Mixed Trees in Classifying Text Data
Introduction
Hybrid Random Forests
Framework for Building Hybrid Random Forest
Decision Tree Algorithms
Algorithm
Evaluation Methods
Experiments
Datasets
Test Accuracy Improvement
Performance Comparisons of other Text Classification Method
Conclusion and Future Work
References
Learning Tree Structure of Label Dependency for Multi-label Learning
Introduction
Related Work
The Concept of Multi-label Learning
Learning a Tree Structure of Labels
Experiment Design and Analysis
The Description of Datasets
Evaluation Criteria
Algorithms and Settings
Experimental Results and Analysis
Conclusion
References
Multiple Instance Learning for Group Record Linkage
Introduction
Related Work
Group Linkage Using Multiple Instance Learning
Instance Selection and Classifier Learning
Instance Classification
Group Record Linkage
Experiments and Evaluation
Synthetic Data Results
Historical Census Data Results
Conclusion
References
Incremental Set Recommendation Based on Class Differences
Introduction
Definition
Set Recommendation Based on Class Differences
Example
ZDD and VSOP
Set Recommendation with ZDD Structure
Experiments
Performance Evaluation
Example : Internet Shopping Advertising
Example : AOL Search Logs
Summary and Future Works
References
Active Learning for Cross Language Text Categorization
Introduction
Related Work
Active Learning for CLTC
Cross Language Text Categorization
Apply Active Learning to CLTC
Double Viewed Active Learning
Two Views of the Problem
Double Viewed Active Learning
Evaluation
Experimental Setup
Results and Discussions
Conclusions and Future Works
References
Evasion Attack of Multi-class Linear Classifiers
Introduction
Problem Setup
Multi-class Linear Classifier
Attack of Adversary
Adversarial Cost
Disguised Instances
Theory of Evasion Attack
Algorithm for Approximating -IMAC
Experiments
Spam Disguising
Face Camouflage
Conclusions
References
Foundation of Mining Class-Imbalanced Data
Introduction
Upper Bounds
Error Rate on a Particular Class
Cost-Weighted Error
Empirical Results with Specific Learner
Datasets and Settings
Experimental Design and Results
Conclusions
References
Active Learning with c-Certainty
Introduction
Previous Works
c-Certainty Labeling
BMO (Best-Multiple-Oracle) with c-Certainty
Selecting the Best Oracle
Active Learning Process of BMO
Experiments
Results on Faithful Oracles
Results on Unfaithful Oracles
Conclusion
References
A Term Association Translation Model for Naive Bayes Text Classification
Introduction
Related Work
Terminology
Naive Bayes Classifier
Language Models for Information Retrieval
The Term Association Translation Models
Language Models for Text Classification
Translation Model Estimation Using Joint Probability Model
Translation Model Estimation Based on Mutual Information
Experiments
Corpora
Performance Measure
Experimental Results
Conclusion and Future Work
References
A Double-Ensemble Approach for Classifying Skewed Data Streams
Introduction
Background and Motivations
Performance Metrics
Classification Methods for Skewed Data
Classification Methods for Streaming Data
Motivations
Proposed Method
Framework of the Method
Multi-objective Optimization
Reliability Estimation
Experimental Evaluation
Datasets
Experimental Protocol
Results
Conclusions
References
Generating Balanced Classifier-Independent Training Samples from Unlabeled Data
Introduction
Related Work
Generating Balanced Training Data
Overview
Semi-supervised Clustering
Determine the Optimal Number to Samples from Each Cluster
Leveraging Domain Knowledge
Maximum Entropy Sampling
Experiments and Evaluation
Evaluation Setup
Comparison of Class Distribution in Training Samples
Comparison of Classification Performance
Impact of Domain Knowledge
Conclusion
References
Nyström Approximate Model Selection for LSSVM
Introduction
Least Squares Support Vector Machine
Approximating LSSVM Using Nyström Method
Error Analysis
Approximate Model Selection for LSSVM
Experiments
Experimental Scheme
Effectiveness
Conclusion
References
Exploiting Label Dependency for Hierarchical Multi-label Classification
Introduction
Our Contributions
Related Work
HiBLADE Algorithm
Training Scheme
Extending the Features
Label Correlation
Experimental Details
Evaluation Metrics
Experimental Results and Discussion
Conclusion
References
Diversity Analysis on Boosting Nominal Concepts
Introduction
Boosting of CNC
Nominal Concepts
Learning Concept Based Classifiers
Classifier Diversity
Experimental Study
Conclusions
References
Extreme Value Prediction for Zero-Inflated Data
Introduction
Related Work
Preliminaries
Generalized Linear Model(GLM) and 2-Step GLM (GLM-C)
Zero Inflated Poisson Regression(ZIP)
Quantile Linear Regression(QR) and 2-step QR(QR-C)
Framework for Integrated Classification and Regression
Integrated Classifier and Regression for Extreme Values(ICRE)
Experimental Evaluation
Data
Experimental setup
Baseline Algorithm
Evaluation Criteria
Experimental Results
Conclusions
References
Learning to Diversify Expert Finding with Subtopics
Introduction
Problem Definition
Model Framework
Overview
Topic Model Initialization
DivLearn: Learning to Diversify Expert Finding with Subtopics
Model Learning
Experiment
Experiment Setup
Performance Comparison
Analysis and Discussion
Related Work
Conclusion
References
An Associative Classifier for Uncertain Datasets
Introduction
Related Works
UAC Algorithm
Rule Extraction
Rule Filtering
Rule Selection
Experiments and Results
Conclusion
References
Unsupervised Learning: Clustering, Probabilistic Modeling
Neighborhood-Based Smoothing of External Cluster Validity Measures
Introduction
Preliminaries
Neighborhood-Based Smoothing of Validity Measures
Extension of Set-Based Cluster Validity Measures
Extension of Pairwise-Based Cluster Validity Indices
Weighting Function
Optimal Smoothing Radius
Evaluation of the Smoothed Validity Measures
Settings of Clustering and Neighborhood Relation
Datasets
Effect of Smoothing Radius - Finding the Optimal Radius
Effect of Prototype Number
Effect of Class Overlap
Real-World Data
Conclusion
References
Sequential Entity Group Topic Model for Getting Topic Flows of Entity Groups within One Document
Introduction
Terminology
Related Work
Sequential Entity Group Topic Model
Entity Group Topic Model
Sequential Entity Group Topic Model
Experiments
The Size of Power-Set of Entity Groups
Topic Discovery
Entity Prediction
Entity Pair Prediction
Entity Group Prediction
Topic Flow
Conclusion
References
Topological Comparisons of Proximity Measures
Introduction
Proximity Measures and Preordonnance
Proximity Measures
Preorder Equivalence
Topological Equivalence
Topological Graphs
Similarity between Proximity Measures in Topological Frameworks
Relationship between Topological and Preordonnance Equivalences
Theoretical Results
Empirical Comparisons
Conclusion
References
Quad-tuple PLSA: Incorporating Entity and Its Rating in Aspect Identification
Introduction
Problem Definition and Preliminary Knowledge
Problem Definition
Structured PLSA
QPLSA and EM Solution
QPLSA
Deriving the EM Solution
Incorporating Aspect Prior
Aspect Identification
Experiments
Data Sets
Implementation Details
Experimental Results
Related Work
Conclusion
References
Clustering-Based $k$ -Anonymity
Introduction
Motivation
Fundamental Definitions
Basic Concept
$K$ -means Clustering
Problem Definition
Clustering-Based Generalization Algorithm
Extension to l-Diversity
Related Work
Empirical Evaluation
Privacy Level $K$
QI-Attributes Dimensionality $d$
Cardinality of Data Set $n$
Efficiency
Conclusion
References
Unsupervised Ensemble Learning for Mining Top-n Outliers
Introduction
Methodologies
Framework and Notions of Ensemble Learning
Score-Based Aggregation Approach (SAG)
Order-Based Aggregation Approach (OAG)
Inference and Algorithm for OAG
Experiments
Aggregation on Real Data
Robustness of Two Aggregation Methods
Conclusions
References
Towards Personalized Context-Aware Recommendation by Mining Context Logs through Topic Models
Introduction
Related Work
Preliminary
Mining Common Context-Aware Preferences through Topic Models
Experiments
Data Set
Benchmark Methods
Evaluation Metrics
Overall Results of Recommendation
Robustness Analysis
Case Study
Concluding Remarks
References
Mining of Temporal Coherent Subspace Clusters in Multivariate Time Series Databases
Introduction
Related Work
A Model for Effective Subspace Clustering of Multivariate Time Series Data
Time Series Subspace Cluster Definition
Clustering Model: Redundancy Avoidance
Efficient Computation
Experiments
Evaluation w.r.t. Effectiveness
Evaluation w.r.t. Efficiency
Conclusion
References
A Vertex Similarity Probability Model for Finding Network Community Structure
Introduction
Vertex Similarity in Finding Community Structure
Common Neighbor Index in Unipartite Network
Common Neighbor Index in Bipartite Network
A VSP Model for Finding Community Structure
Experimental Results
Finding Community Structure in Unipartite Network
Finding Community Structure in Bipartite Network
Conclusion
References
Hybrid-e-greedy for Mobile Context-Aware Recommender System
Introduction
Background
Following the Evolution of User's Interests
Managing the User's Situation
The Proposed MCRS Algorithm
Terminology and Notations
The Bandit Algorithm
The Proposed Hybrid-e-greedy Algorithm
Experimental Evaluation
Experimental datasets
Finding the Optimal B Threshold Value
Experimental Datasets
Results for e Variation
Valuate Sparse Data
Conclusion
References
Unsupervised Multi-label Text Classification Using a World Knowledge Ontology
Introduction
Related Work
Unsupervised Multi-label Text Classification
World Ontology
Document Features
Initial Classification
Generalised Classification
Implementation
Evaluation
Results and Discussions
Conclusions
References
Semantic Social Network Analysis with Text Corpora
Introduction
Document-Entity-Topic Model
Dirichlet Priori on Document-Entity Distribution
Generative Process of DET Model
Learning the DET Model from Data
Gibbs Sampling
The Posterior on ? , Ø and F
Experiment Result
Perplexity Comparison between AT and DET
Semantic Social Network Analysis with DET
Conclusions
References
Visualizing Clusters in Parallel Coordinates for Visual Knowledge Discovery
Introduction
Related Works
Dimension Ordering for Knowledge Discovery
Inter-cluster and Intra-cluster Crossings
Optimization with Hamiltonian Path
Empirical Study on Real Datasets
Shaping Clusters against Visual Clutters by an Energy Reduction Model
Conclusion and Future Work
References
Feature Enriched Nonparametric Bayesian Co-clustering
Introduction
Related Work
Background: Dirichlet Process
Feature Enriched Dirichlet Process Co-clustering
Inference
Experimental Evaluation
Experimental Methodology and Feature Information
Results
Conclusion
References
Shape-Based Clustering for Time Series Data
Introduction
Background
$K$ -means Clustering
Dynamic Time Warping (DTW) Distance Measure
Global Constraint
Related Work
Shape-Based Clustering for Time Series (SCTS)
Experiments and Results
Conclusion
References
Privacy-Preserving EM Algorithm for Clustering on Social Network
Introduction
Related Work
Preliminaries
Probabilistic Mixture Model
Utilities of Privacy-Preserving Data Mining
Problem Statement
Assumptions
Private Variables and Public Variables
Secure Summation Protocols on Networks
Local Secure Summation Protocol
Global Secure Summation Protocol
Private Clustering on Networks
Private E-step
Private M-step
Performance
Experiments
Accuracy
Efficiency
Conclusion
References
Named Entity Recognition and Identification for Finding the Owner of a Home Page
Introduction
Related Work
Named Entity Recognition
Finding Named Entities of Web Page Owner
Entity Selection Framework
Baseline Entity Selection
Graph-Based Entity Selection
Learning to Select Named Entities
Experimental Results
Named Entity Recognition Evaluation
Evaluation and Experimental Results
Conclusions
References
Clustering and Understanding Documents via Discrimination Information Maximization
Introduction
Motivation and Related Work
CDIM - Our Document Clustering Method
Problem Statement
Clustering Objective Function
Term Discrimination Information
Relatedness of Terms to Clusters
Document Discrimination Information
Algorithm
Experimental Setup
Data Sets
Comparison Methods
Clustering Validation Measures
Results and Discussion
Clustering Quality
Cluster Understanding and Visualization
Conclusion and Future Work
References
A Semi-supervised Incremental Clustering Algorithm for Streaming Data
Introduction
Related Work
A Model for Constraint-Based Clustering on Streaming Data
Modeling a Stream of Constraints
Cost of Assigning Data Items to a Cluster
A Clustering Model for Streaming Data
Incremental Clustering towards a Constraint-Stream
Experimental Evaluation
Conclusions
References
Unsupervised Sparse Matrix Co-clustering for Marketing and Sales Intelligence
Introduction
Related Work
Overview of Our Approach
Preliminaries
Graph Partitioning
The Algorithm
Recursive Spectral Bi-partitioning
Eigengap-Based Termination
Discovering Off-Diagonal Clusters
Recommendations
Experiments
Comparison with other Techniques
Compression-Based Evaluation
Business Intelligence on Real Datasets
Conclusion
References
Expectation-Maximization Collaborative Filtering with Explicit and Implicit Feedback
Introduction
Preliminary
Formalization
Collaborative Filtering
Co-rating
Explicit and Implicit Feedback
Matrix Factorization with Explicit Feedback
Enhance Explicit Feedback with Implicit Feedback
Expectation-Maximization Collaborative Filtering
Experiments
Conclusion and Future Works
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Advances in Knowledge Discovery and Data Mining, Part I

Description

More details

Other editions

Additional editions

Persons

Content

System requirements