
Advances in Knowledge Discovery and Data Mining, Part I
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Persons
Content
- Title Page
- Preface
- Organization
- Table of Contents - Part I
- Supervised Learning: Active, Ensemble, Rare-Class and Online
- Time-Evolving Relational Classification and Ensemble Methods
- Introduction
- Related Work
- Temporal-Relational Classification Framework
- Temporal Granularity
- Temporal Influence: Links, Attributes, Nodes
- Temporal-Relational Classifiers
- Temporal Ensemble Methods
- Methodology
- Datasets
- Temporal Models
- Empirical Results
- Single Models
- Temporal-Ensemble Models
- Conclusion
- References
- Active Learning for Hierarchical Text Classification
- Introduction
- A Novel Multi-oracle Setting
- A New Framework of Hierarchical Active Learning
- Unlabeled Pool Building Policy
- Leveraging Oracle Answers
- Experimental Configuration
- Datasets
- Performance Measure
- Active Learning Setup
- Empirical Study
- Standard Hierarchical Active Learner
- Leveraging Positive Examples in Hierarchy
- Leveraging Negative Examples in Hierarchy
- Conclusion
- References
- TeamSkill Evolved: Mixed Classification Schemes for Team-Based Multi-player Games
- Introduction
- Related Work
- Proposed Approaches
- TeamSkill-AllK-Ev-OL1
- TeamSkill-AllK-Ev-OL2
- TeamSkill-AllK-Ev-OL3
- Using Game-Specific Data during Classification
- TeamSkill-AllK-EVGen
- TeamSkill-AllK-EVMixed
- Evaluation
- Dataset
- Overall Results
- Results over Time
- Online Classification Variants
- Discussion
- Conclusions
- References
- A Novel Weighted Ensemble Technique for Time Series Forecasting
- Introduction
- Forecasts Combination Methods
- The Proposed Ensemble Technique
- Mathematical Description
- Optimization of the Combination Weights
- Approach for Weights Determination
- Three Time Series Forecasting Models
- Autoregressive Integrated Moving Average (ARIMA)
- Artificial Neural Networks (ANNs)
- Elman Artificial Neural Networks (EANNs)
- Experiments and Discussions
- Conclusions
- References
- Techniques for Efficient Learning without Search
- Introduction
- The AnDE Family of Algorithms
- AODE
- AnDE
- Optimising Memory Consumption
- Optimising Testing Time
- Evaluation
- Test Environment
- Optimised Memory Consumption
- Optimised Testing
- The Evaluation of A3DE
- A3DE Performance on Large Datasets
- Conclusions
- References
- An Aggressive Margin-Based Algorithm for Incremental Learning
- Introduction
- Online Passive-Aggressive Algorithm
- Incremental Passive-Aggressive Learning Algorithm
- Experiments
- Conclusion
- References
- Two-View Online Learning
- Introduction
- Related Work
- Two-View Online Passive Aggressive Learning
- Problem Setting
- Relationship between Views
- Two-View Passive Aggressive Algorithm
- Performance Evaluation
- View Difference Comparison
- Ads Dataset
- Product Review Dataset
- WebKB Course Dataset
- Conclusion and Open Problems
- References
- A Generic Classifier-Ensemble Approach for Biomedical Named Entity Recognition
- Introduction
- The Generic Genetic Classifier-Ensemble Approach
- Feature Set and SVM Based Classifier
- Generic Genetic Classifier-Ensemble Algorithm
- Experiments and Results
- Conclusion and Future Work
- References
- Neighborhood Random Classification
- Introduction
- Basic Concepts
- Notations
- Neighborhood Structure
- Neighborhood Classifiers
- Partition by Neighborhood Graphs
- Ensemble Method Classifier Based on Neighborhood
- Sampling Procedures
- Aggregating Function
- Evaluation
- Implementation of RNC
- Other Methods
- The Test
- Computational Analysis
- Conclusion and Further Work
- References
- SRF: A Framework for the Study of Classifier Behavior under Training Set Mislabeling Noise
- Introduction
- Background and Related Work
- The Sigmoid Rule Framework
- Sigmoid Rule Framework (SRF) Dimensions
- Comparing Algorithms
- Experimental Evaluation
- Using SRF
- Statistical Analysis
- Conclusions
- References
- Building Decision Trees for the Multi-class Imbalance Problem
- Introduction
- Methods
- Decomposition Techniques
- Decision Trees
- Analysis of the Splitting Criteria
- Experiments
- Configuration
- Statistical Tests
- Results
- Related Work
- Conclusion and Discussion
- References
- Scalable Random Forests for Massive Data
- Introduction
- Related Work
- Scalable Random Forest Algorithm
- Breadth-First Random Forest Construction
- Scalable Random Forest Algorithm
- Mapper, Reducer and Controller
- Experiments
- Data Sets
- Experiment Settings
- Performance Results
- Scalability
- Conclusions
- References
- Hybrid Random Forests: Advantages of Mixed Trees in Classifying Text Data
- Introduction
- Hybrid Random Forests
- Framework for Building Hybrid Random Forest
- Decision Tree Algorithms
- Algorithm
- Evaluation Methods
- Experiments
- Datasets
- Test Accuracy Improvement
- Performance Comparisons of other Text Classification Method
- Conclusion and Future Work
- References
- Learning Tree Structure of Label Dependency for Multi-label Learning
- Introduction
- Related Work
- The Concept of Multi-label Learning
- Learning a Tree Structure of Labels
- Experiment Design and Analysis
- The Description of Datasets
- Evaluation Criteria
- Algorithms and Settings
- Experimental Results and Analysis
- Conclusion
- References
- Multiple Instance Learning for Group Record Linkage
- Introduction
- Related Work
- Group Linkage Using Multiple Instance Learning
- Instance Selection and Classifier Learning
- Instance Classification
- Group Record Linkage
- Experiments and Evaluation
- Synthetic Data Results
- Historical Census Data Results
- Conclusion
- References
- Incremental Set Recommendation Based on Class Differences
- Introduction
- Definition
- Set Recommendation Based on Class Differences
- Example
- ZDD and VSOP
- Set Recommendation with ZDD Structure
- Experiments
- Performance Evaluation
- Example : Internet Shopping Advertising
- Example : AOL Search Logs
- Summary and Future Works
- References
- Active Learning for Cross Language Text Categorization
- Introduction
- Related Work
- Active Learning for CLTC
- Cross Language Text Categorization
- Apply Active Learning to CLTC
- Double Viewed Active Learning
- Two Views of the Problem
- Double Viewed Active Learning
- Evaluation
- Experimental Setup
- Results and Discussions
- Conclusions and Future Works
- References
- Evasion Attack of Multi-class Linear Classifiers
- Introduction
- Problem Setup
- Multi-class Linear Classifier
- Attack of Adversary
- Adversarial Cost
- Disguised Instances
- Theory of Evasion Attack
- Algorithm for Approximating -IMAC
- Experiments
- Spam Disguising
- Face Camouflage
- Conclusions
- References
- Foundation of Mining Class-Imbalanced Data
- Introduction
- Upper Bounds
- Error Rate on a Particular Class
- Cost-Weighted Error
- Empirical Results with Specific Learner
- Datasets and Settings
- Experimental Design and Results
- Conclusions
- References
- Active Learning with c-Certainty
- Introduction
- Previous Works
- c-Certainty Labeling
- BMO (Best-Multiple-Oracle) with c-Certainty
- Selecting the Best Oracle
- Active Learning Process of BMO
- Experiments
- Results on Faithful Oracles
- Results on Unfaithful Oracles
- Conclusion
- References
- A Term Association Translation Model for Naive Bayes Text Classification
- Introduction
- Related Work
- Terminology
- Naive Bayes Classifier
- Language Models for Information Retrieval
- The Term Association Translation Models
- Language Models for Text Classification
- Translation Model Estimation Using Joint Probability Model
- Translation Model Estimation Based on Mutual Information
- Experiments
- Corpora
- Performance Measure
- Experimental Results
- Conclusion and Future Work
- References
- A Double-Ensemble Approach for Classifying Skewed Data Streams
- Introduction
- Background and Motivations
- Performance Metrics
- Classification Methods for Skewed Data
- Classification Methods for Streaming Data
- Motivations
- Proposed Method
- Framework of the Method
- Multi-objective Optimization
- Reliability Estimation
- Experimental Evaluation
- Datasets
- Experimental Protocol
- Results
- Conclusions
- References
- Generating Balanced Classifier-Independent Training Samples from Unlabeled Data
- Introduction
- Related Work
- Generating Balanced Training Data
- Overview
- Semi-supervised Clustering
- Determine the Optimal Number to Samples from Each Cluster
- Leveraging Domain Knowledge
- Maximum Entropy Sampling
- Experiments and Evaluation
- Evaluation Setup
- Comparison of Class Distribution in Training Samples
- Comparison of Classification Performance
- Impact of Domain Knowledge
- Conclusion
- References
- Nyström Approximate Model Selection for LSSVM
- Introduction
- Least Squares Support Vector Machine
- Approximating LSSVM Using Nyström Method
- Error Analysis
- Approximate Model Selection for LSSVM
- Experiments
- Experimental Scheme
- Effectiveness
- Conclusion
- References
- Exploiting Label Dependency for Hierarchical Multi-label Classification
- Introduction
- Our Contributions
- Related Work
- HiBLADE Algorithm
- Training Scheme
- Extending the Features
- Label Correlation
- Experimental Details
- Evaluation Metrics
- Experimental Results and Discussion
- Conclusion
- References
- Diversity Analysis on Boosting Nominal Concepts
- Introduction
- Boosting of CNC
- Nominal Concepts
- Learning Concept Based Classifiers
- Classifier Diversity
- Experimental Study
- Conclusions
- References
- Extreme Value Prediction for Zero-Inflated Data
- Introduction
- Related Work
- Preliminaries
- Generalized Linear Model(GLM) and 2-Step GLM (GLM-C)
- Zero Inflated Poisson Regression(ZIP)
- Quantile Linear Regression(QR) and 2-step QR(QR-C)
- Framework for Integrated Classification and Regression
- Integrated Classifier and Regression for Extreme Values(ICRE)
- Experimental Evaluation
- Data
- Experimental setup
- Baseline Algorithm
- Evaluation Criteria
- Experimental Results
- Conclusions
- References
- Learning to Diversify Expert Finding with Subtopics
- Introduction
- Problem Definition
- Model Framework
- Overview
- Topic Model Initialization
- DivLearn: Learning to Diversify Expert Finding with Subtopics
- Model Learning
- Experiment
- Experiment Setup
- Performance Comparison
- Analysis and Discussion
- Related Work
- Conclusion
- References
- An Associative Classifier for Uncertain Datasets
- Introduction
- Related Works
- UAC Algorithm
- Rule Extraction
- Rule Filtering
- Rule Selection
- Experiments and Results
- Conclusion
- References
- Unsupervised Learning: Clustering, Probabilistic Modeling
- Neighborhood-Based Smoothing of External Cluster Validity Measures
- Introduction
- Preliminaries
- Neighborhood-Based Smoothing of Validity Measures
- Extension of Set-Based Cluster Validity Measures
- Extension of Pairwise-Based Cluster Validity Indices
- Weighting Function
- Optimal Smoothing Radius
- Evaluation of the Smoothed Validity Measures
- Settings of Clustering and Neighborhood Relation
- Datasets
- Effect of Smoothing Radius - Finding the Optimal Radius
- Effect of Prototype Number
- Effect of Class Overlap
- Real-World Data
- Conclusion
- References
- Sequential Entity Group Topic Model for Getting Topic Flows of Entity Groups within One Document
- Introduction
- Terminology
- Related Work
- Sequential Entity Group Topic Model
- Entity Group Topic Model
- Sequential Entity Group Topic Model
- Experiments
- The Size of Power-Set of Entity Groups
- Topic Discovery
- Entity Prediction
- Entity Pair Prediction
- Entity Group Prediction
- Topic Flow
- Conclusion
- References
- Topological Comparisons of Proximity Measures
- Introduction
- Proximity Measures and Preordonnance
- Proximity Measures
- Preorder Equivalence
- Topological Equivalence
- Topological Graphs
- Similarity between Proximity Measures in Topological Frameworks
- Relationship between Topological and Preordonnance Equivalences
- Theoretical Results
- Empirical Comparisons
- Conclusion
- References
- Quad-tuple PLSA: Incorporating Entity and Its Rating in Aspect Identification
- Introduction
- Problem Definition and Preliminary Knowledge
- Problem Definition
- Structured PLSA
- QPLSA and EM Solution
- QPLSA
- Deriving the EM Solution
- Incorporating Aspect Prior
- Aspect Identification
- Experiments
- Data Sets
- Implementation Details
- Experimental Results
- Related Work
- Conclusion
- References
- Clustering-Based $k$ -Anonymity
- Introduction
- Motivation
- Fundamental Definitions
- Basic Concept
- $K$ -means Clustering
- Problem Definition
- Clustering-Based Generalization Algorithm
- Extension to l-Diversity
- Related Work
- Empirical Evaluation
- Privacy Level $K$
- QI-Attributes Dimensionality $d$
- Cardinality of Data Set $n$
- Efficiency
- Conclusion
- References
- Unsupervised Ensemble Learning for Mining Top-n Outliers
- Introduction
- Methodologies
- Framework and Notions of Ensemble Learning
- Score-Based Aggregation Approach (SAG)
- Order-Based Aggregation Approach (OAG)
- Inference and Algorithm for OAG
- Experiments
- Aggregation on Real Data
- Robustness of Two Aggregation Methods
- Conclusions
- References
- Towards Personalized Context-Aware Recommendation by Mining Context Logs through Topic Models
- Introduction
- Related Work
- Preliminary
- Mining Common Context-Aware Preferences through Topic Models
- Experiments
- Data Set
- Benchmark Methods
- Evaluation Metrics
- Overall Results of Recommendation
- Robustness Analysis
- Case Study
- Concluding Remarks
- References
- Mining of Temporal Coherent Subspace Clusters in Multivariate Time Series Databases
- Introduction
- Related Work
- A Model for Effective Subspace Clustering of Multivariate Time Series Data
- Time Series Subspace Cluster Definition
- Clustering Model: Redundancy Avoidance
- Efficient Computation
- Experiments
- Evaluation w.r.t. Effectiveness
- Evaluation w.r.t. Efficiency
- Conclusion
- References
- A Vertex Similarity Probability Model for Finding Network Community Structure
- Introduction
- Vertex Similarity in Finding Community Structure
- Common Neighbor Index in Unipartite Network
- Common Neighbor Index in Bipartite Network
- A VSP Model for Finding Community Structure
- Experimental Results
- Finding Community Structure in Unipartite Network
- Finding Community Structure in Bipartite Network
- Conclusion
- References
- Hybrid-e-greedy for Mobile Context-Aware Recommender System
- Introduction
- Background
- Following the Evolution of User's Interests
- Managing the User's Situation
- The Proposed MCRS Algorithm
- Terminology and Notations
- The Bandit Algorithm
- The Proposed Hybrid-e-greedy Algorithm
- Experimental Evaluation
- Experimental datasets
- Finding the Optimal B Threshold Value
- Experimental Datasets
- Results for e Variation
- Valuate Sparse Data
- Conclusion
- References
- Unsupervised Multi-label Text Classification Using a World Knowledge Ontology
- Introduction
- Related Work
- Unsupervised Multi-label Text Classification
- World Ontology
- Document Features
- Initial Classification
- Generalised Classification
- Implementation
- Evaluation
- Results and Discussions
- Conclusions
- References
- Semantic Social Network Analysis with Text Corpora
- Introduction
- Document-Entity-Topic Model
- Dirichlet Priori on Document-Entity Distribution
- Generative Process of DET Model
- Learning the DET Model from Data
- Gibbs Sampling
- The Posterior on ? , Ø and F
- Experiment Result
- Perplexity Comparison between AT and DET
- Semantic Social Network Analysis with DET
- Conclusions
- References
- Visualizing Clusters in Parallel Coordinates for Visual Knowledge Discovery
- Introduction
- Related Works
- Dimension Ordering for Knowledge Discovery
- Inter-cluster and Intra-cluster Crossings
- Optimization with Hamiltonian Path
- Empirical Study on Real Datasets
- Shaping Clusters against Visual Clutters by an Energy Reduction Model
- Conclusion and Future Work
- References
- Feature Enriched Nonparametric Bayesian Co-clustering
- Introduction
- Related Work
- Background: Dirichlet Process
- Feature Enriched Dirichlet Process Co-clustering
- Inference
- Experimental Evaluation
- Experimental Methodology and Feature Information
- Results
- Conclusion
- References
- Shape-Based Clustering for Time Series Data
- Introduction
- Background
- $K$ -means Clustering
- Dynamic Time Warping (DTW) Distance Measure
- Global Constraint
- Related Work
- Shape-Based Clustering for Time Series (SCTS)
- Experiments and Results
- Conclusion
- References
- Privacy-Preserving EM Algorithm for Clustering on Social Network
- Introduction
- Related Work
- Preliminaries
- Probabilistic Mixture Model
- Utilities of Privacy-Preserving Data Mining
- Problem Statement
- Assumptions
- Private Variables and Public Variables
- Secure Summation Protocols on Networks
- Local Secure Summation Protocol
- Global Secure Summation Protocol
- Private Clustering on Networks
- Private E-step
- Private M-step
- Performance
- Experiments
- Accuracy
- Efficiency
- Conclusion
- References
- Named Entity Recognition and Identification for Finding the Owner of a Home Page
- Introduction
- Related Work
- Named Entity Recognition
- Finding Named Entities of Web Page Owner
- Entity Selection Framework
- Baseline Entity Selection
- Graph-Based Entity Selection
- Learning to Select Named Entities
- Experimental Results
- Named Entity Recognition Evaluation
- Evaluation and Experimental Results
- Conclusions
- References
- Clustering and Understanding Documents via Discrimination Information Maximization
- Introduction
- Motivation and Related Work
- CDIM - Our Document Clustering Method
- Problem Statement
- Clustering Objective Function
- Term Discrimination Information
- Relatedness of Terms to Clusters
- Document Discrimination Information
- Algorithm
- Experimental Setup
- Data Sets
- Comparison Methods
- Clustering Validation Measures
- Results and Discussion
- Clustering Quality
- Cluster Understanding and Visualization
- Conclusion and Future Work
- References
- A Semi-supervised Incremental Clustering Algorithm for Streaming Data
- Introduction
- Related Work
- A Model for Constraint-Based Clustering on Streaming Data
- Modeling a Stream of Constraints
- Cost of Assigning Data Items to a Cluster
- A Clustering Model for Streaming Data
- Incremental Clustering towards a Constraint-Stream
- Experimental Evaluation
- Conclusions
- References
- Unsupervised Sparse Matrix Co-clustering for Marketing and Sales Intelligence
- Introduction
- Related Work
- Overview of Our Approach
- Preliminaries
- Graph Partitioning
- The Algorithm
- Recursive Spectral Bi-partitioning
- Eigengap-Based Termination
- Discovering Off-Diagonal Clusters
- Recommendations
- Experiments
- Comparison with other Techniques
- Compression-Based Evaluation
- Business Intelligence on Real Datasets
- Conclusion
- References
- Expectation-Maximization Collaborative Filtering with Explicit and Implicit Feedback
- Introduction
- Preliminary
- Formalization
- Collaborative Filtering
- Co-rating
- Explicit and Implicit Feedback
- Matrix Factorization with Explicit Feedback
- Enhance Explicit Feedback with Implicit Feedback
- Expectation-Maximization Collaborative Filtering
- Experiments
- Conclusion and Future Works
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.