
Advances in Knowledge Discovery and Data Mining
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Content
- Title
- Preface
- Organization
- Table of Contents
- Feature Extraction
- An Instance Selection Algorithm Based on Reverse Nearest Neighbor
- Introduction
- Relate Works
- Incremental Algorithms
- Decremental Algorithms
- The RNNR Algorithm
- The Choosing Strategy of RNNR-AL0 (Absorption, Larger Than ZERO)
- The Choosing Strategy of RNNR-AL1 (Absorption, Larger Than ONE)
- The Choosing Strategy of RNNR-L1 (Selecting All, Larger Than ONE)
- Experiments
- Conclusions and Future Work
- References
- A Game Theoretic Approach for Feature Clustering and Its Application to Feature Selection
- Introduction
- Related Work
- Coalitional Games Preliminaries
- NSP Computation via Integer Linear Program (ILP)
- Feature Clustering via Nash Stable Partition
- Feature Selection Approaches
- Handling of Large Feature Set Size
- Equivalence between a k-NSP and Minimum k-Cut
- Hierarchical Feature Clustering
- Experiments
- References
- Feature Selection Strategy in Text Classification
- Introduction
- Feature Selection
- An Overview
- Analysis
- Modeling
- Experiment
- Algorithms for Comparison
- Result and Discussion
- Related Work
- Summary and Conclusion
- References
- Unsupervised Feature Weighting Based on Local Feature Relatedness
- Introduction
- Preliminary
- Document Representation
- Relatedness Measure
- Feature Weighting
- Feature Weighting Based on Syntactic Information
- Feature Weighting Based on Local Feature Relatedness (LFR)
- Combination of Syntactic and Semantic Factors
- Related Work
- Feature Weighting Based on Global Feature Relatedness (GFR)
- Document Similarity Based on Inter-document Feature Relatedness (IFR)
- Experiments
- Datasets
- Methodology
- Evaluation Metrics
- Experiment Results
- Computational Complexity
- Conclusions
- References
- An Effective Feature Selection Method for Text Categorization
- Introduction
- Reviews of Feature Selection Methods in Text Categorization
- Definitions
- Three Baseline FS Methods
- Analysis of the Traditional Feature Selection Methods in Text Categorization
- Optimal Feature Selection for KNN
- K-Nearest Neighbor Classification(KNN) for Text Categorization
- Effective Feature Selection Criterion for KNN
- An Illustrating Example
- Experiments
- Datasets
- Classifiers
- Performance Measurement
- Experimental Results
- Discussion of Results
- Conclusion
- References
- Machine Learning
- A Subpath Kernel for Rooted Unordered Trees
- Introduction
- A Linear-Time Kernel for Rooted Unordered Trees
- A New Tree Kernel Based on Tree Subpaths
- Subpath Set
- A Subpath Tree Kernel
- An Efficient Algorithm for the Subpath Tree Kernel
- Experiments
- An XML Classification Dataset
- Glycan Classification Datasets
- Comparison of Execution Times of the Proposed Kernel and the Linear-Time Tree Kernel
- Related Work
- Conclusion
- References
- Classification Probabilistic PCA with Application in Domain Adaptation
- Introduction
- Classification Probabilistic PCA (CPPCA)
- Probabilistic PCA (PPCA) Revisited
- Classification Probabilistic PCA (CPPCA)
- EM Learning for CPPCA
- Experimental Results
- Product Review Adaptation Experiment
- Conclusions
- References
- Probabilistic Matrix Factorization Leveraging Contexts for Unsupervised Relation Extraction
- Introduction
- Related Work
- Unsupervised Relation Extraction
- Relation Discovery
- Feature Extraction
- CL-PMF for Dimension Reduction
- Experiments
- Annotated Corpus
- Evaluation
- Methods
- Results and Discussion
- Parameters for CL-PMF
- Conclusion
- References
- The Unsymmetrical-Style Co-training
- Introduction
- Preliminaries
- Co-training
- Co-EM
- Multiple-Learner
- Unsymmetrical Co-training
- Experiments
- Conclusion
- References
- Balance Support Vector Machines Locally Using the Structural Similarity Kernel
- Introduction
- Bibliographic Coupling Based Structural Similarity
- From Global to Local Balance
- From Hub Scores to Signed Authority Scores
- Related Research
- Experiments
- Setup
- Results
- Discussions of Limitations
- Concluding Remarks
- References
- Using Classifier-Based Nominal Imputation to Improve Machine Learning
- Introduction
- Framework
- Imputation for Nominal Data
- Classifier-Based Nominal Imputation
- Using CNI to Improve Classification Performance of Machine Learned Classifiers
- Experimental Design and Results
- Evaluation of CNI Imputation Algorithms
- The Impact of Nominal Imputers on the Classification Performance for Instance-Based Learning Algorithms
- The Impact of Nominal Imputers on Other Machine Learned Classifiers
- Conclusions
- References
- A Bayesian Framework for Learning Shared and Individual Subspaces from Multiple Data Sources
- Introduction
- Bayesian Shared Subspace Learning (BSSL)
- Bayesian Representation
- Gibbs Inference
- Subspace Dimensionality and Complexity Analysis
- Social Media Applications
- BSSL Based Social Media Retrieval
- BSSL Based Cross-Social Media Retrieval
- Experiments
- Dataset
- Subspace Learning and Parameter Setting
- Experiment 1: Social Media Retrieval Using Auxiliary Sources
- Experiment 2: Cross Media Retrieval
- Conclusion
- References
- Are Tensor Decomposition Solutions Unique? On the Global Convergence HOSVD and ParaFac Algorithms
- Introduction
- Tensor Decomposition
- High Order SVD (HOSVD)
- ParaFac Decomposition
- Unique Solution
- A Natural Starting Point for W: The T1 Decomposition and the PCA Solution
- Initialization
- Run Statistics and Validation
- Eigenvalue Distributions
- Datasets
- Image Randomization
- Main Results
- Eigenvalue-Base Uniqueness Prediction
- Theoretical Analysis
- Summary
- References
- Improved Spectral Hashing
- Introduction
- Spectral Hashing
- Formulation and Algorithm
- Discussion
- Proposed Methods
- SH with Probability Transform
- Generalized SH
- Toward a More Efficient Code
- Experiments
- Datasets and Evaluation Measures
- Experimental Results
- Computational Cost
- Conclusion and Future Works
- References
- Clustering
- High-Order Co-clustering Text Data on Semantics-Based Representation Model
- Introduction
- High-Order Representation Structure
- High-Order Co-clustering Method
- Experimental Results and Discussion
- Dataset
- Clustering Results and Discussion
- Conclusions and Future Work
- References
- The Role of Hubness in Clustering High-Dimensional Data
- Introduction
- Related Work
- The Hubness Phenomenon
- The Emergence of Hubs
- Relation of Hubs to Data Clusters
- Hub-Based Clustering
- Deterministic Approach
- Probabilistic Approach
- Experiments and Evaluation
- Synthetic Data: Gaussian Mixtures
- Clustering in the Presence of High Noise Levels
- Experiments on Real-World Data
- Conclusions and Future Work
- References
- Spatial Entropy-Based Clustering for Mining Data with Spatial Correlation
- Introduction
- Related Work
- Spatial Entropy-Based Clustering
- Spatial Entropy
- Using Spatial Entropy in Spatial Clustering
- A Spatial Entropy-Based Spatial Clustering Method
- Experiments
- Conclusions
- References
- Self-adjust Local Connectivity Analysis for Spectral Clustering
- Introduction
- Methodology
- Local Connectivity-Based Scaling
- Eigenvector Selection
- Experimental Evaluation
- Conclusions
- References
- An Effective Density-Based Hierarchical Clustering Technique to Identify Coherent Patterns from Gene Expression Data
- Introduction
- GeneClusTree
- Regulation Information Extraction
- Performance Evaluation
- Conclusions and Future Work
- References
- Nonlinear Discriminative Embedding for Clustering via Spectral Regularization
- Introduction
- Spectral Regularization on Manifold
- The Proposed Method
- Problem Formulation and Its Solution
- Convergence Analysis
- Experimental Results
- Data Sets
- Clustering Evaluation and Parameter Selection
- Clustering Performance
- Clustering Performance vs. Parameters
- Comparison on the Embeddings
- Conclusions
- References
- An Adaptive Fuzzy k-Nearest Neighbor Method Based on Parallel Particle Swarm Optimization for Bankruptcy Prediction
- Introduction
- Background Materials
- Fuzzy k-Nearest Neighbor Algorithm (FKNN)
- Time Variant Particle Swarm Optimization (TVPSO)
- Proposed PTVPSO-FKNN Prediction Model
- TVPSO-FKNN Model Based on the Serial PSO Algorithm
- Parallel Implementation of the TVPSO-FKNN Model on the Multi-core Platform (PTVPSO-FKNN)
- Experimental Design
- Data Description
- Experimental Setup
- Measure for Performance Evaluation
- Experimental Results and Discussion
- Experiment I: Classification in the Whole Original Feature Space
- Experiment II: Classification Using the PTVPSO-FKNN Model with Feature Selection
- Experiment III: Comparison between the Parallel TVPSO-FKNN Model and the Serial One
- Conclusions
- References
- Semi-supervised Parameter-Free Divisive Hierarchical Clustering of Categorical Data
- Introduction
- Instance-Level Constraints
- The Algorithm
- Initialization
- Refinement
- Alleviation of Cannot-Link Violation
- Experimental Results
- Data Sets
- Results and Discussion
- Conclusions
- References
- Classification
- Identifying Hidden Contexts in Classification
- Introduction
- Problem Set-Up
- Positioning within Related Work
- Three Techniques for Identifying Hidden Contexts
- Experimental Evaluation
- Evaluation Criteria
- Datasets and Experimental Protocol
- Results
- Case Study
- Conclusion
- References
- Cross-Lingual Sentiment Classification via Bi-view Non-negative Matrix Tri-Factorization
- Introduction
- Related Work
- Problem Setting
- Bi-view Non-negative Matrix Tri-Factorization
- Basic Idea
- Mathematical Formulation and Brief Analysis
- Experiments
- Datasets
- Baselines
- Overall Comparison Results
- Influence of Parameters
- Conclusion and Future Work
- References
- A Sequential Dynamic Multi-class Model and Recursive Filtering by Variational Bayesian Methods
- Introduction
- Sequential Dynamic Multi-class Model
- Recursive Filtering by Variational Bayes
- Variational Bayes Approximation
- Summary
- Variational Predictive Distributions of New Inputs
- Experiment Results
- Synthetic Problem
- Four-Class Motor Imagery EEG Data for the BCI-Competition 2005
- Waveform Data Set from the UCI Machine Learning Repository
- Conclusions
- References
- Random Ensemble Decision Trees for Learning Concept-Drifting Data Streams
- Introduction
- Related Work
- The EDTC Algorithm
- Experiments
- Conclusion
- References
- Collaborative Data Cleaning for Sentiment Classification with Noisy Training Corpus
- Introduction
- Related Work
- Sentiment Classification
- Data Cleaning
- Problem Statement
- The Data Cleaning Algorithms
- Overview
- Self-cleaning
- Co-cleaning
- Tri-cleaning
- Empirical Evaluation
- Evaluation Setup
- Evaluation Results
- Conclusion and Future Work
- References
- Pattern Mining
- Using Constraints to Generate and Explore Higher Order Discriminative Patterns
- Introduction
- Discriminative Patterns
- Definitions
- Previous Work in Discriminative Pattern Mining
- Defining Higher Order Patterns with Constraints
- A More General Approach
- Experimental Results
- Formal Analysis
- Conclusion and Future Work
- References
- Mining Maximal Co-located Event Sets
- Introduction
- Problem Statement and Related Work
- Basic Concepts of Spatial Co-location Mining
- Problem Statement
- Related Work
- Algorithm
- Preprocess
- Candidate Generation
- Candidate Pruning
- Candidate Instance Filtering
- Algorithm and Analysis
- Experimental Results
- Conclusion
- References
- Pattern Mining for a Two-Stage Information Filtering System
- Introduction
- Related Work
- Rough Set-Based Topic Filtering
- Discovery of R-Patterns
- Rough Threshold Model
- Pattern Taxonomy Mining
- Experiments
- Discussion
- Conclusions
- References
- Efficiently Retrieving Longest Common Route Patterns of Moving Objects By Summarizing Turning Regions
- Introduction
- Problem Definition
- Mining Algorithm for Longest Common Route Patterns Based on Turning Regions
- Discovering Turning Regions
- Retrieving Longest Common Route Patterns
- Performance Evaluations
- Optimal eps and DL angle
- Efficiency and Accuracy
- Conclusions
- References
- Automatic Assignment of Item Weights for Pattern Mining on Data Streams
- Introduction
- Background and Related Work
- Valency Model
- Our Weight Adaptation Methodology
- Data Structure: Inverted Index Matrix
- Distance Function
- Evaluation
- Precision
- Execution Time
- Evaluating Drift
- Real World Dataset: Accident
- Conclusion
- References
- Prediction
- Predicting Private Company Exits Using Qualitative Data
- Introduction
- Data Extraction and Representation
- Social Network Ranking
- Mapping Companies to N-tuples
- Missing Entries
- Model Development and Results
- Resampling and Cross-Validation
- Results
- Discussion and Conclusion
- References
- A Rule-Based Method for Customer Churn Prediction in Telecommunication Services
- Introduction
- Related Work
- Algorithm of CRL
- Basic Concepts
- Rule Learning
- Pruning Rules
- Classification
- Experiments and Discussion
- Evaluation
- Discussion
- Conclusion and Future Works
- References
- Text Mining
- Adaptive and Effective Keyword Search for XML
- Introduction and Motivation
- Result Model
- Algorithms
- Matrix Algorithm
- Content-Information-First (CIF) Algorithm
- Structure-Information-First (SIF) Algorithm
- Experiments
- Conclusion
- References
- Steering Time-Dependent Estimation of Posteriors with Hyperparameter Indexing in Bayesian Topic Models
- Introduction
- Previous Works
- Method
- Model Construction
- Posterior Inference
- Experiments
- Datasets
- Settings
- Evaluation Measure
- Preliminary Experiments
- Main Experiment
- Conclusion
- References
- Constrained LDA for Grouping Product Features in Opinion Mining
- Introduction
- Related Work
- The Proposed Algorithm
- Introduction to LDA
- Constrained-LDA
- Constraint Extraction
- Experimental Evaluation
- Data Sets
- Gold Standard
- Evaluation Measure
- Compared with LDA
- Comparing with mLSA
- Influence of Parameters
- Conclusions
- References
- Semantic Dependent Word Pairs Generative Model for Fine-Grained Product Feature Mining
- Introduction
- Data Representation and Problem Definition
- Problem Definition:
- Semantic Dependent Word Pair Generative Model
- Inference and Parameter Estimation
- Latent Variable Inference
- Parameter Estimation
- Hyper-parameter Estimation
- Evaluation
- Perplexity
- Average Cluster Entropy
- Normalized Mutual Information Index
- Experiments
- Conclusion
- References
- Grammatical Dependency-Based Relations for Term Weighting in Text Classification
- Introduction
- The Proposed Term Weighting Framework
- Relation Extraction
- Graph Construction: Constructing, Weighting and Ranking Graph
- Constructing Graph
- Weighting Graph
- Ranking Graph
- Applying Graph-Based Document Representation to Text Classification
- Proposed Term Class Dependence (TCD)
- Proposed Hybrid Term Weighting Methods Based on TCD
- Experiments
- Classifier and Data Sets
- Performance and Discussion
- Conclusion and Future Work
- References
- XML Documents Clustering Using a Tensor Space Model
- Introduction
- Related Work
- The Proposed XCT Method
- Problem Definition and Preliminaries
- Generation of Structure Features for TSM
- Generation of Content Features for TSM
- The TSM Representation, Decomposition and Clustering
- Experiments and Discussion
- Datasets
- Experimental Design
- Evaluation Measures
- Empirical Analysis
- Conclusion
- References
- An Efficient Pre-processing Method to Identify Logical Components from PDF Documents
- Introduction
- Related Works
- The Sparse-Line Property
- Machine Learning Methods
- Component Boundary Detection
- Experiments and Results
- Data Set
- Performance of Sparse Line Detection
- Performance of Noise Line Removal
- Table/Equation Boundary Detection
- Conclusions
- References
- Combining Proper Name-Coreference with Conditional Random Fields for Semi-supervised Named Entity Recognition in Vietnamese Text
- Introduction
- Related Works
- Conditional Random Field
- Named Entity Recognition in Vietnamese Text
- Characteristics of Vietnamese Proper Names
- Semi-supervised Learning Algorithm
- Experiments and Discussion
- Conclusions
- References
- Topic Analysis of Web User Behavior Using LDA Model on Proxy Logs
- Introduction
- LDA Formulation
- Cross-Hierarchical Directory Matching
- LDA-Based Topic Modeling
- Symbolizing URLs from Proxy Log
- Description of Proxy log
- Basic Idea of Labeling Words to User Session
- Cross-Hierarchical Directory matching
- Experiments and Results
- Data Sets and Evaluation Settings
- Evaluation Metrics
- Optimality Analysis of LDA Model
- Visualizing 24 Topics and Student Characterization
- Conclusion
- References
- SizeSpotSigs: An Effective Deduplicate Algorithm Considering the Size of Page Content
- Introduction
- Contribution
- Related Work
- Relation between Noise-Content Ratio and Similarity
- Concepts and Notation
- Theoretical Analysis
- AF_SpotSigs and SizeSpotSigs Algorithm
- Experiment
- Data Set
- Choice of Stopwords
- AF_SpotSigs vs. SpotSigs
- SizeSpotSigs over SpotSigs and AF_SpotSigs
- Conclusions and Future Works
- References
- Knowledge Transfer across Multilingual Corpora via Latent Topics
- Introduction
- Problem Definition
- Our Approach
- Latent Dirichlet Allocation
- Bilingual Latent Dirichlet Allocation
- Cross-Lingual Document Classification
- Experimental Results
- Experimental Setup
- Perplexity
- Classification Accuracy
- Topic Smoothing
- Related Work
- Conclusion
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.