
New Frontiers in Applied Data Mining
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Persons
Content
- Title page
- Preface
- Organization
- Table of Contents
- International Workshop on Behavior Informatics (BI 2011)
- Evaluating the Regularity of Human Behavior from Mobile Phone Usage Logs
- Introduction
- Related Work
- Data Preprocessing
- Measures
- Duplication Types
- Accumulated Entropy
- Evaluation of Survey Data
- Conclusion
- References
- Explicit and Implicit User Preferences in Online Dating
- Introduction
- Domain Overview
- User Preferences
- Explicit User Preferences
- Implicit User Preferences
- Are the User Preferences Good Predictors of the Success of User Interactions?
- Explicit User Preferences
- Implicit User Preferences
- Using User Preferences in Recommender Systems
- Hybrid Content-Collaborative Reciprocal Recommender
- Ranking Methods
- Experimental Evaluation
- Results and Discussion
- Conclusions
- References
- Blogger-Link-Topic Model for Blog Mining
- Introduction
- Models for Blog Mining
- Blogger-Link-Topic (BLT) Model
- Blog Classification Framework
- Experiments and Results
- Blogger-Link-Topic Results
- Blog Classification Results
- Results on Co-occurrence of BLT and AT Models
- Conclusion
- References
- A Random Indexing Approach for Web User Clustering and Web Prefetching
- Introduction
- Random Indexing (RI)
- Random Indexing Based Web User Clustering
- Data Preprocessing
- User Modelling Based on Random Indexing
- Single User Pattern Clustering
- Clustering Validity Measures
- Experiments
- Preprocessing of Data Source
- Parameter Setting Investigations
- Common User Profile Creation
- Prefetching for User Groups
- Conclusions
- References
- Emotional Reactions to Real-World Events in Social Networks
- Introduction
- Sentiment Index and Event Indicators
- Sentiment Index for Event Detection
- Event Indicators
- Mood-Based Burst Detection and Bursty Event Extraction
- Bursty Event Detection
- Experimental Results
- Conclusion
- References
- Constructing Personal Knowledge Base: Automatic Key-Phrase Extraction from Multiple-Domain Web Pages
- Introduction
- System Framework
- Preprocessor
- Candidate Phrase Extractor
- Feature Calculation
- Refinement
- Correlation Matrix Generator
- Term Ranking
- Semantic Graph Constructor
- Learning Mechanism
- Evaluation and Experiments
- Datasets and Measures
- Experiment 1: Evaluating Personal Knowledge Base
- Experiment 2: Comparing with KEA
- Conclusions
- References
- Discovering Valuable User Behavior Patterns in Mobile Commerce Environments
- Introduction
- Related Work
- Preliminaries and Definitions
- Proposed Method: UMSPL
- Experimental Evaluations
- Conclusions
- References
- A Novel Method for Community Detection in Complex Network Using New Representation for Communities
- Introduction
- Related Work
- Proposed Method
- Partitioning Vertex
- Degree Entropy
- The Method
- Experiments
- Zachary Karate Club
- Java Compile-Time Dependency
- HEP Literature and Stanford Web Graph
- Conclusion
- References
- Link Prediction on Evolving Data Using Tensor Factorization
- Introduction
- Tensor Decomposition
- Link Prediction
- Evaluation
- Conclusion and Future Work
- References
- Permutation Anonymization: Improving Anatomy for Privacy Preservation in Data Publication
- Introduction
- Motivations
- Preliminaries
- Basic Notations
- Permutation Anonymization
- Preserving Correlation
- Problem Definition
- Generalization Algorithm
- The Partitioning Step
- The Populating Step
- Discussions and Related Work
- Experiments
- Accuracy
- Efficiency
- Conclusion
- References
- Efficient Mining Top-k Regular-Frequent Itemset Using Compressed Tidsets
- Introduction
- Top-k Regular-Frequent Itemsets Mining
- TR-CT: Top-k Regular-Frequent Itemsets Mining Based on Compressed Tidsets
- Compressed Tidset Representation
- Top-k List Structure
- TR-CT Algorithm Description
- An Example
- Performance Evaluation
- Test Environment and Datasets
- Execution Time
- Space Usage
- Conclusion
- References
- A Method of Similarity Measure and Visualization for Long Time Series Using Binary Patterns
- Introduction
- Background and Related Work
- Binary Patterns Based Similarity and Visualization
- Empirical Evaluation
- Hierarchical Clustering
- Visual Effects
- Comparison of Computation Cost
- Conclusions
- References
- A BIRCH-Based Clustering Method for Large Time Series Databases
- Introduction
- Background and Related Work
- Dimensionality Reduction Using Multi-resolution Transforms
- Related Work on Time Series Clustering
- CF Tree and BIRCH Algorithm
- The Proposed Approach - Combination of a Multi-resolution Transform and BIRCH
- How to Determine the Appropriate Scale of a Multi-resolution Transform
- Clustering with k-Means or I-k-Means in Phase 3
- Experimental Evaluation
- Clustering Quality Evaluation Criteria
- Data Description
- Experimental Results
- Conclusion
- References
- Visualizing Cluster Structures and Their Changes over Time by Two-Step Application of Self-Organizing Maps
- Introduction
- Related Work
- The Proposed Method
- Batch Map
- Two-Step Application of Self-Organizing Maps
- Assigning Angles and Colors to Clusters
- Visualization by Colors and Angles
- Example: Visualization of Clusters in News Articles
- Target Dataset
- Keyword Extraction and Matrix Representation
- Dimension Reduction by Random Projection and LSI
- Final Matrix and Distance Function
- Visualization Results
- Conclusions and Future Work
- References
- Analysis of Cluster Migrations Using Self-Organizing Maps
- Introduction
- Self-Organizing Maps
- Related Work
- Visualizing Migrations
- Transforming 2D Maps to 1D Maps
- Visualizing Migrations
- Analyzing Attribute Interestingness of Migrants
- Application to the WDI datasets
- Conclusion
- Future Work
- References
- Quality Issues, Measures of Interestingness and Evaluation of Data Mining Models Workshop (QIMIE 2011)
- ClasSi: Measuring Ranking Quality in the Presence of Object Classes with Similarity Information
- Introduction and Related Work
- Ranking Quality Measures for Objects in Classes
- Preliminaries: Measuring Ranking Quality
- Class Similarity Ranking Correlation Coefficient ClasSi
- Properties of ClasSi
- ClasSi on Prefixes of Rankings
- Examples
- Conclusion
- References
- The Instance Easiness of Supervised Learning for Cluster Validity
- Introduction
- Instance Easiness
- Generic Definitions
- Instance Easiness for Supervised Learning
- Illustration
- The Clustering-Quality Measure
- Discussion
- Summary
- References
- A New Efficient and Unbiased Approach for Clustering Quality Evaluation
- Introduction
- Unsupervised Recall Precision F-Measure Indexes
- Overall Clustering Quality Estimation
- Cluster Labeling and Content Validation
- Experimentation and Results
- Overall Analysis of the Results
- Quality Indexes Validation
- Conclusion
- References
- A Structure Preserving Flat Data Format Representation for Tree-Structured Data
- Introduction
- Background of the Problem
- Proposed Tree-Structured to Flat Data Conversion
- Experimental Evaluation
- Conclusion and Future Work
- References
- A Fusion of Algorithms in Near Duplicate Document Detection
- Introduction
- Major Algorithms in Duplicate Document Detection
- Shingling, Super Shingling, Mini-wise Independent Permutation Algorithms
- I-Match, Multiple Random Lexicons Based I-Match Algorithms
- Random Projection, Simhash Algorithms
- Model Enhancements
- Shingling Based Simhash Algorithm
- Multiple Random Lexicons Based Simhash Algorithm
- Experiments
- Shingling Based Simhash Algorithm
- Multiple Random Lexicons Based Simhash Algorithm
- Conclusions
- References
- Searching Interesting Association Rules Based on Evolutionary Computation
- Introduction
- Related Work
- Measuring the Similarity
- Measuring the Similarity
- Mining Association Rules by Genetic Network Programming
- Simulations
- Conclusions
- References
- An Efficient Approach to Mine Periodic-Frequent Patterns in Transactional Databases
- Introduction
- Background
- Periodic-Frequent Pattern Model
- Rare Item Problem
- Minimum Constraints Model of Periodic-Frequent Pattern
- Proposed Model
- MaxCPF-Tree: Design, Construction and Mining
- Structure of MaxCPF-Tree
- Constructing MaxCPF-Tree
- Mining of MaxCPF-Tree
- Experimental Results
- Experiment 1
- Experiment 2
- Conclusion
- References
- Algorithms to Discover Complete Frequent Episodes in Sequences
- Introduction
- Overview of Frequent Episodes Mining Framework
- Principle for Discovering Minimum Occurrence of Serial Episode
- Algorithms
- An Apriori-Like Algorithm for Serial Episodes
- An FP-Growth-Like Algorithm for Non-overlapped Serial Episodes with Gapmax
- Experiments
- Comparison of Algorithms Ap-epi and Minepi
- Performance of NOE-WinMiner with Gapmax
- Conclusions
- References
- Certainty upon Empirical Distributions
- Introduction
- Contributions
- The Cardinality Scaling of Knowledge
- Uncertainty about Unseen Events
- A Measure of Certainty
- Disjoint Dependent Events
- Entropy Based Measures
- Empirical Validation
- Conclusions
- References
- Workshop on Biologically Inspired Techniques for Data Mining (BDM 2011)
- A Measure Oriented Training Scheme for Imbalanced Classification Problems
- Introduction
- Techniques for Imbalanced Problems
- Sampling Methods
- Performance Measures
- Measure Oriented Training Scheme
- Experiment
- Specification
- Data Preprocessing
- Results
- Analysis
- Conclusion
- References
- An SVM-Based Approach to Discover MicroRNA Precursors in Plant Genomes
- Introduction
- Preliminary Knowledge
- Support Vector Machine
- Related Work
- MiR-PD Approach
- Segmentation
- Filter
- Classification
- Experimental Study
- Data Source
- The Filter Efficiency
- Performance of SVM Features
- SVM Training
- Classification Testing
- Cross Species Testing
- Conclusion
- References
- Towards Recommender System Using Particle Swarm Optimization Based Web Usage Clustering
- Introduction
- Related Work
- Swarm Intelligence
- Particle Swarm Based Clustering
- EPSO-Clustering
- HPSO-Clustering
- PSO Bases Web Usage Clustering
- HPSO Based Outlier Detection
- JAVA API Usage Log and Recommender System
- Conclusion and Future Work
- References
- Weighted Association Rule Mining Using Particle Swarm Optimization
- Introduction
- Related Work
- Related Concepts: Particle Swarm Optimization (PSO)
- Weighted Association Rule Mining Using Particle Swarm Optimization (WARM SWARM)
- Weighting Function
- Weighted Association Rule Mining Using Particle Swarm Optimization (WARM SWARM)
- Experimental Results
- Synthetic Datasets
- Real-World Datasets
- Conclusions and Future Work
- References
- An Unsupervised Feature Selection Framework Based on Clustering
- Introduction
- Related Work
- Basic Concept on Clustering
- Preliminaries
- Single-Pass Clustering Algorithm
- Unsupervised Feature Selection Framework
- Feature Importance Measure Scores
- Feature Selection Framework
- Empirical Results
- Experimental Setup
- Results on UCI Datasets
- Conclusions and Future Work
- References
- Workshop on Advances and Issues in Traditional Chinese Medicine Clinical Data Mining (AI-TCM 2011)
- Discovery of Regularities in the Use of Herbs in Traditional Chinese Medicine Prescriptions
- Introduction
- Latent Tree Models
- The Clinical Data
- The Results
- Concluding Remarks
- References
- COW: A Co-evolving Memetic Wrapper for Herb-Herb Interaction Analysis in TCM Informatics
- Introduction
- The TCM Insomnia Dataset
- Prior Work
- Multifactor Dimensionality Reduction and Hierarchical Core Sub-networks
- Feature Selection via Genetic Algorithms
- A Closer Look at COW
- Local Search in COW
- Experimental Evaluation
- Discussion
- Conclusion and Future Work
- References
- Selecting an Appropriate Interestingness Measure to Evaluate the Correlation between Syndrome Elements and Symptoms
- Introduction
- Related Objective Interestingness Measures
- Hypothesis for Choosing an Interestingness Measure
- Selection of an Interestingness Measure Based on the Hypothesis
- Dataset
- Selection of Samples of Different Syndrome Elements
- Calculate the Interestingness of the Syndrome Elements and Symptoms
- Calculating the Distance between Han and Re
- Calculating the Distance of the Same Syndrome Elements
- Comparison of Different Interestingness Measures
- Confirmation of the Best Interestingness Measure
- Confirmation by Subjective Interestingness Analysis
- Confirmation by Computational Complexity
- Conclusion
- References
- The Impact of Feature Representation to the Biclustering of Symptoms-Herbs in TCM
- Introduction
- Symptom-Herb Biclustering Algorithm
- General Representation Pattern for Symptom-Herb Biclustering Algorithm
- Symptom-Herb Biclustering Algorithm
- Dataset and Pre-processing
- Experiments
- Effective Count
- Binary Value
- Relative Success Ratio
- Modified Relative Success Ratio
- Results and Discussion
- Results
- Discussion of Biclusters in TCM Field
- Conclusion and Future Work
- References
- Second Workshop on Data Mining for Healthcare Management (DMHM 2011)
- Usage of Mobile Phones for Personalized Healthcare Solutions
- Introduction
- Related Work
- System Functional Architecture
- Mobile Phone-Based Healthcare Scenarios
- System Implementation
- Applications
- Conclusion
- References
- Robust Learning of Mixture Models and Its Application on Trial Pruning for EEG Signal Analysis
- Introduction
- Deterministic Annealing for Robust Learning
- Mixture Models, Trimmed Likelihood Estimator and FAST-TLE
- Deterministic Annealing Outlier Detection
- Experiments
- Synthetic Data Sets
- Real World Data Sets
- EEG Data Set
- Conclusion
- References
- An Integrated Approach to Multi-criteria-Based Health Care Facility Location Planning
- Introduction
- An Approach to Planning Health Care Facility Locations
- Preventive Health Care Facility Location Planning Approach
- Accessibility Estimation and Location Criteria
- A Multi-Criteria Preventive Health Care Facility Location Model
- Algorithm
- Experiments
- Computational Experiments on Synthetic Datasets
- A Real Application
- Conclusions
- References
- Medicinal Property Knowledge Extraction from Herbal Documents for Supporting Question Answering System
- Introduction
- Related Work
- Problems of Medicinal Property Knowledge Extraction
- Object Identification Problems
- Medicinal Property Identification Problem
- Medicinal Property Boundary Determination Problems
- A Framework for Medicinal Property Knowledge Extraction
- Corpus Preparation
- Medicinal Property Learning
- Medicinal Property Knowledge Extraction
- Evaluation and Conclusion
- References
- First PAKDD Doctoral Symposium on Data Mining (DSDM 2011)
- Age Estimation Using Bayesian Process
- Introduction
- Age Estimation Using Gaussian Process
- Learning Model Parameters
- Make Prediction
- Discussion
- Age Estimation Using t Process
- Learning Model Parameters
- Make Prediction
- Discussion
- Experiment
- Experimental Setting
- Experimental Results
- Conclusion
- References
- Significant Node Identification in Social Networks
- Introduction
- Preliminaries
- System Model of DblpNET
- Related Works
- Top-k Author Ranking in Co-author Network
- Extracted Features
- Ranking Algorithm DblpRank
- Complexity Analysis
- Experimental Evaluation
- Author Ranking of DblpNET
- Efficiency Issue of DblpRank
- Discussion and Future Work
- Conclusion
- References
- Improving Bagging Performance through Multi-algorithm Ensembles
- Introduction
- Diversity in Combinations of Heterogeneous Classifiers
- Relationship between Diversity and Correlation in Ensembles
- Bagging with Multi-algorithm Ensembles
- Experimental Results and Findings
- Impact of Using Heterogeneous Algorithms on Diversity
- Simulations for Diversity and Correlation in Ensembles
- Comparison of Bagging with Multi-algorithm Ensembles to Other Ensemble Methods
- Conclusions and Future Research Directions
- References
- Mining Tourist Preferences with Twice-Learning
- Introduction
- Twice-Learning Framework
- The Algorithm
- Mining Practice
- Data
- General Mining Process
- Results of Applicability Verification and Performance Evaluation
- Discovering Important Variables
- Obtaining Decision Rules
- Conclusions
- References
- Towards Cost-Sensitive Learning for Real-World Applications
- Introduction to Cost-Sensitive Learning
- Unequal Costs
- Formulation
- Evaluation
- Learning Methods
- Towards Real-World Applications
- Extending Rescaling to Multi-class Problems
- Analysis
- Rescalenew
- Handling Imprecise Costs
- Learning with Cost Intervals
- Learning with Cost Distributions
- Conclusion
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.