New Frontiers in Applied Data Mining

Name: New Frontiers in Applied Data Mining | PAKDD 2011 International Workshops, Shenzhen, China, May 24-27, 2011, Revised Selected Papers
Brand: Springer
Price: 53.49 EUR
Availability: OnlineOnly

PAKDD 2011 International Workshops, Shenzhen, China, May 24-27, 2011, Revised Selected Papers

Longbing Cao Joshua Zhexue Huang James Bailey Yun Sing Koh Jun Luo(Editor)

Springer (Publisher)

Published on 21. February 2012

XXX, 508 pages

E-Book

PDF with digital watermarking

System requirements

978-3-642-28320-8 (ISBN)

€53.49incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Persons

Content

Title page
Preface
Organization
Table of Contents
International Workshop on Behavior Informatics (BI 2011)
Evaluating the Regularity of Human Behavior from Mobile Phone Usage Logs
Introduction
Related Work
Data Preprocessing
Measures
Duplication Types
Accumulated Entropy
Evaluation of Survey Data
Conclusion
References
Explicit and Implicit User Preferences in Online Dating
Introduction
Domain Overview
User Preferences
Explicit User Preferences
Implicit User Preferences
Are the User Preferences Good Predictors of the Success of User Interactions?
Explicit User Preferences
Implicit User Preferences
Using User Preferences in Recommender Systems
Hybrid Content-Collaborative Reciprocal Recommender
Ranking Methods
Experimental Evaluation
Results and Discussion
Conclusions
References
Blogger-Link-Topic Model for Blog Mining
Introduction
Models for Blog Mining
Blogger-Link-Topic (BLT) Model
Blog Classification Framework
Experiments and Results
Blogger-Link-Topic Results
Blog Classification Results
Results on Co-occurrence of BLT and AT Models
Conclusion
References
A Random Indexing Approach for Web User Clustering and Web Prefetching
Introduction
Random Indexing (RI)
Random Indexing Based Web User Clustering
Data Preprocessing
User Modelling Based on Random Indexing
Single User Pattern Clustering
Clustering Validity Measures
Experiments
Preprocessing of Data Source
Parameter Setting Investigations
Common User Profile Creation
Prefetching for User Groups
Conclusions
References
Emotional Reactions to Real-World Events in Social Networks
Introduction
Sentiment Index and Event Indicators
Sentiment Index for Event Detection
Event Indicators
Mood-Based Burst Detection and Bursty Event Extraction
Bursty Event Detection
Experimental Results
Conclusion
References
Constructing Personal Knowledge Base: Automatic Key-Phrase Extraction from Multiple-Domain Web Pages
Introduction
System Framework
Preprocessor
Candidate Phrase Extractor
Feature Calculation
Refinement
Correlation Matrix Generator
Term Ranking
Semantic Graph Constructor
Learning Mechanism
Evaluation and Experiments
Datasets and Measures
Experiment 1: Evaluating Personal Knowledge Base
Experiment 2: Comparing with KEA
Conclusions
References
Discovering Valuable User Behavior Patterns in Mobile Commerce Environments
Introduction
Related Work
Preliminaries and Definitions
Proposed Method: UMSPL
Experimental Evaluations
Conclusions
References
A Novel Method for Community Detection in Complex Network Using New Representation for Communities
Introduction
Related Work
Proposed Method
Partitioning Vertex
Degree Entropy
The Method
Experiments
Zachary Karate Club
Java Compile-Time Dependency
HEP Literature and Stanford Web Graph
Conclusion
References
Link Prediction on Evolving Data Using Tensor Factorization
Introduction
Tensor Decomposition
Link Prediction
Evaluation
Conclusion and Future Work
References
Permutation Anonymization: Improving Anatomy for Privacy Preservation in Data Publication
Introduction
Motivations
Preliminaries
Basic Notations
Permutation Anonymization
Preserving Correlation
Problem Definition
Generalization Algorithm
The Partitioning Step
The Populating Step
Discussions and Related Work
Experiments
Accuracy
Efficiency
Conclusion
References
Efficient Mining Top-k Regular-Frequent Itemset Using Compressed Tidsets
Introduction
Top-k Regular-Frequent Itemsets Mining
TR-CT: Top-k Regular-Frequent Itemsets Mining Based on Compressed Tidsets
Compressed Tidset Representation
Top-k List Structure
TR-CT Algorithm Description
An Example
Performance Evaluation
Test Environment and Datasets
Execution Time
Space Usage
Conclusion
References
A Method of Similarity Measure and Visualization for Long Time Series Using Binary Patterns
Introduction
Background and Related Work
Binary Patterns Based Similarity and Visualization
Empirical Evaluation
Hierarchical Clustering
Visual Effects
Comparison of Computation Cost
Conclusions
References
A BIRCH-Based Clustering Method for Large Time Series Databases
Introduction
Background and Related Work
Dimensionality Reduction Using Multi-resolution Transforms
Related Work on Time Series Clustering
CF Tree and BIRCH Algorithm
The Proposed Approach - Combination of a Multi-resolution Transform and BIRCH
How to Determine the Appropriate Scale of a Multi-resolution Transform
Clustering with k-Means or I-k-Means in Phase 3
Experimental Evaluation
Clustering Quality Evaluation Criteria
Data Description
Experimental Results
Conclusion
References
Visualizing Cluster Structures and Their Changes over Time by Two-Step Application of Self-Organizing Maps
Introduction
Related Work
The Proposed Method
Batch Map
Two-Step Application of Self-Organizing Maps
Assigning Angles and Colors to Clusters
Visualization by Colors and Angles
Example: Visualization of Clusters in News Articles
Target Dataset
Keyword Extraction and Matrix Representation
Dimension Reduction by Random Projection and LSI
Final Matrix and Distance Function
Visualization Results
Conclusions and Future Work
References
Analysis of Cluster Migrations Using Self-Organizing Maps
Introduction
Self-Organizing Maps
Related Work
Visualizing Migrations
Transforming 2D Maps to 1D Maps
Visualizing Migrations
Analyzing Attribute Interestingness of Migrants
Application to the WDI datasets
Conclusion
Future Work
References
Quality Issues, Measures of Interestingness and Evaluation of Data Mining Models Workshop (QIMIE 2011)
ClasSi: Measuring Ranking Quality in the Presence of Object Classes with Similarity Information
Introduction and Related Work
Ranking Quality Measures for Objects in Classes
Preliminaries: Measuring Ranking Quality
Class Similarity Ranking Correlation Coefficient ClasSi
Properties of ClasSi
ClasSi on Prefixes of Rankings
Examples
Conclusion
References
The Instance Easiness of Supervised Learning for Cluster Validity
Introduction
Instance Easiness
Generic Definitions
Instance Easiness for Supervised Learning
Illustration
The Clustering-Quality Measure
Discussion
Summary
References
A New Efficient and Unbiased Approach for Clustering Quality Evaluation
Introduction
Unsupervised Recall Precision F-Measure Indexes
Overall Clustering Quality Estimation
Cluster Labeling and Content Validation
Experimentation and Results
Overall Analysis of the Results
Quality Indexes Validation
Conclusion
References
A Structure Preserving Flat Data Format Representation for Tree-Structured Data
Introduction
Background of the Problem
Proposed Tree-Structured to Flat Data Conversion
Experimental Evaluation
Conclusion and Future Work
References
A Fusion of Algorithms in Near Duplicate Document Detection
Introduction
Major Algorithms in Duplicate Document Detection
Shingling, Super Shingling, Mini-wise Independent Permutation Algorithms
I-Match, Multiple Random Lexicons Based I-Match Algorithms
Random Projection, Simhash Algorithms
Model Enhancements
Shingling Based Simhash Algorithm
Multiple Random Lexicons Based Simhash Algorithm
Experiments
Shingling Based Simhash Algorithm
Multiple Random Lexicons Based Simhash Algorithm
Conclusions
References
Searching Interesting Association Rules Based on Evolutionary Computation
Introduction
Related Work
Measuring the Similarity
Measuring the Similarity
Mining Association Rules by Genetic Network Programming
Simulations
Conclusions
References
An Efficient Approach to Mine Periodic-Frequent Patterns in Transactional Databases
Introduction
Background
Periodic-Frequent Pattern Model
Rare Item Problem
Minimum Constraints Model of Periodic-Frequent Pattern
Proposed Model
MaxCPF-Tree: Design, Construction and Mining
Structure of MaxCPF-Tree
Constructing MaxCPF-Tree
Mining of MaxCPF-Tree
Experimental Results
Experiment 1
Experiment 2
Conclusion
References
Algorithms to Discover Complete Frequent Episodes in Sequences
Introduction
Overview of Frequent Episodes Mining Framework
Principle for Discovering Minimum Occurrence of Serial Episode
Algorithms
An Apriori-Like Algorithm for Serial Episodes
An FP-Growth-Like Algorithm for Non-overlapped Serial Episodes with Gapmax
Experiments
Comparison of Algorithms Ap-epi and Minepi
Performance of NOE-WinMiner with Gapmax
Conclusions
References
Certainty upon Empirical Distributions
Introduction
Contributions
The Cardinality Scaling of Knowledge
Uncertainty about Unseen Events
A Measure of Certainty
Disjoint Dependent Events
Entropy Based Measures
Empirical Validation
Conclusions
References
Workshop on Biologically Inspired Techniques for Data Mining (BDM 2011)
A Measure Oriented Training Scheme for Imbalanced Classification Problems
Introduction
Techniques for Imbalanced Problems
Sampling Methods
Performance Measures
Measure Oriented Training Scheme
Experiment
Specification
Data Preprocessing
Results
Analysis
Conclusion
References
An SVM-Based Approach to Discover MicroRNA Precursors in Plant Genomes
Introduction
Preliminary Knowledge
Support Vector Machine
Related Work
MiR-PD Approach
Segmentation
Filter
Classification
Experimental Study
Data Source
The Filter Efficiency
Performance of SVM Features
SVM Training
Classification Testing
Cross Species Testing
Conclusion
References
Towards Recommender System Using Particle Swarm Optimization Based Web Usage Clustering
Introduction
Related Work
Swarm Intelligence
Particle Swarm Based Clustering
EPSO-Clustering
HPSO-Clustering
PSO Bases Web Usage Clustering
HPSO Based Outlier Detection
JAVA API Usage Log and Recommender System
Conclusion and Future Work
References
Weighted Association Rule Mining Using Particle Swarm Optimization
Introduction
Related Work
Related Concepts: Particle Swarm Optimization (PSO)
Weighted Association Rule Mining Using Particle Swarm Optimization (WARM SWARM)
Weighting Function
Weighted Association Rule Mining Using Particle Swarm Optimization (WARM SWARM)
Experimental Results
Synthetic Datasets
Real-World Datasets
Conclusions and Future Work
References
An Unsupervised Feature Selection Framework Based on Clustering
Introduction
Related Work
Basic Concept on Clustering
Preliminaries
Single-Pass Clustering Algorithm
Unsupervised Feature Selection Framework
Feature Importance Measure Scores
Feature Selection Framework
Empirical Results
Experimental Setup
Results on UCI Datasets
Conclusions and Future Work
References
Workshop on Advances and Issues in Traditional Chinese Medicine Clinical Data Mining (AI-TCM 2011)
Discovery of Regularities in the Use of Herbs in Traditional Chinese Medicine Prescriptions
Introduction
Latent Tree Models
The Clinical Data
The Results
Concluding Remarks
References
COW: A Co-evolving Memetic Wrapper for Herb-Herb Interaction Analysis in TCM Informatics
Introduction
The TCM Insomnia Dataset
Prior Work
Multifactor Dimensionality Reduction and Hierarchical Core Sub-networks
Feature Selection via Genetic Algorithms
A Closer Look at COW
Local Search in COW
Experimental Evaluation
Discussion
Conclusion and Future Work
References
Selecting an Appropriate Interestingness Measure to Evaluate the Correlation between Syndrome Elements and Symptoms
Introduction
Related Objective Interestingness Measures
Hypothesis for Choosing an Interestingness Measure
Selection of an Interestingness Measure Based on the Hypothesis
Dataset
Selection of Samples of Different Syndrome Elements
Calculate the Interestingness of the Syndrome Elements and Symptoms
Calculating the Distance between Han and Re
Calculating the Distance of the Same Syndrome Elements
Comparison of Different Interestingness Measures
Confirmation of the Best Interestingness Measure
Confirmation by Subjective Interestingness Analysis
Confirmation by Computational Complexity
Conclusion
References
The Impact of Feature Representation to the Biclustering of Symptoms-Herbs in TCM
Introduction
Symptom-Herb Biclustering Algorithm
General Representation Pattern for Symptom-Herb Biclustering Algorithm
Symptom-Herb Biclustering Algorithm
Dataset and Pre-processing
Experiments
Effective Count
Binary Value
Relative Success Ratio
Modified Relative Success Ratio
Results and Discussion
Results
Discussion of Biclusters in TCM Field
Conclusion and Future Work
References
Second Workshop on Data Mining for Healthcare Management (DMHM 2011)
Usage of Mobile Phones for Personalized Healthcare Solutions
Introduction
Related Work
System Functional Architecture
Mobile Phone-Based Healthcare Scenarios
System Implementation
Applications
Conclusion
References
Robust Learning of Mixture Models and Its Application on Trial Pruning for EEG Signal Analysis
Introduction
Deterministic Annealing for Robust Learning
Mixture Models, Trimmed Likelihood Estimator and FAST-TLE
Deterministic Annealing Outlier Detection
Experiments
Synthetic Data Sets
Real World Data Sets
EEG Data Set
Conclusion
References
An Integrated Approach to Multi-criteria-Based Health Care Facility Location Planning
Introduction
An Approach to Planning Health Care Facility Locations
Preventive Health Care Facility Location Planning Approach
Accessibility Estimation and Location Criteria
A Multi-Criteria Preventive Health Care Facility Location Model
Algorithm
Experiments
Computational Experiments on Synthetic Datasets
A Real Application
Conclusions
References
Medicinal Property Knowledge Extraction from Herbal Documents for Supporting Question Answering System
Introduction
Related Work
Problems of Medicinal Property Knowledge Extraction
Object Identification Problems
Medicinal Property Identification Problem
Medicinal Property Boundary Determination Problems
A Framework for Medicinal Property Knowledge Extraction
Corpus Preparation
Medicinal Property Learning
Medicinal Property Knowledge Extraction
Evaluation and Conclusion
References
First PAKDD Doctoral Symposium on Data Mining (DSDM 2011)
Age Estimation Using Bayesian Process
Introduction
Age Estimation Using Gaussian Process
Learning Model Parameters
Make Prediction
Discussion
Age Estimation Using t Process
Learning Model Parameters
Make Prediction
Discussion
Experiment
Experimental Setting
Experimental Results
Conclusion
References
Significant Node Identification in Social Networks
Introduction
Preliminaries
System Model of DblpNET
Related Works
Top-k Author Ranking in Co-author Network
Extracted Features
Ranking Algorithm DblpRank
Complexity Analysis
Experimental Evaluation
Author Ranking of DblpNET
Efficiency Issue of DblpRank
Discussion and Future Work
Conclusion
References
Improving Bagging Performance through Multi-algorithm Ensembles
Introduction
Diversity in Combinations of Heterogeneous Classifiers
Relationship between Diversity and Correlation in Ensembles
Bagging with Multi-algorithm Ensembles
Experimental Results and Findings
Impact of Using Heterogeneous Algorithms on Diversity
Simulations for Diversity and Correlation in Ensembles
Comparison of Bagging with Multi-algorithm Ensembles to Other Ensemble Methods
Conclusions and Future Research Directions
References
Mining Tourist Preferences with Twice-Learning
Introduction
Twice-Learning Framework
The Algorithm
Mining Practice
Data
General Mining Process
Results of Applicability Verification and Performance Evaluation
Discovering Important Variables
Obtaining Decision Rules
Conclusions
References
Towards Cost-Sensitive Learning for Real-World Applications
Introduction to Cost-Sensitive Learning
Unequal Costs
Formulation
Evaluation
Learning Methods
Towards Real-World Applications
Extending Rescaling to Multi-class Problems
Analysis
Rescalenew
Handling Imprecise Costs
Learning with Cost Intervals
Learning with Cost Distributions
Conclusion
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

New Frontiers in Applied Data Mining

Description

More details

Other editions

Additional editions

Persons

Content

System requirements