
String Processing and Information Retrieval
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Content
- Title page
- Preface
- Organization
- Table of Contents
- Constructing Strings at the Nano Scale via Staged Self-assembly
- Discounted Cumulative Gain and User Decision Models
- Introduction
- Discounted Cumulative Gain
- Deterministic Click User Model
- Probabilistic Click User Model
- Expected Utility
- Parameters Estimation
- NumericalExperiments
- Discussion
- References
- Cross-Lingual Text Fragment Alignment Using Divergence from Randomness
- Introduction
- Related Work
- Text Fragment Alignment
- Similarity Measures and Divergence from Randomness
- Extraction of Fragments
- Experimental Study
- Document-Summary Association
- Text Fragment Alignment Evaluation
- Conclusion and Future Work
- References
- Enhancing Document Snippets Using Temporal Information
- Introduction
- Temporal Order
- Enhancing Snippets with Temporal Information
- Experimental Results
- Conclusions and Outlook
- References
- Spaced Seeds Design Using Perfect Rulers
- Introduction
- Notation
- From Rulers to Spaced Seeds
- Lower Bounds on the Minimum Pattern Length m*P
- Wichmann Rulers
- Restricted vs. Unrestricted Rulers
- Conclusions
- References
- Weighted Shortest Common Supersequence
- Introduction
- Motivation
- Weighted Sequences
- Related Work
- Shortest Common Weighted Supersequence
- Shortest Common Weighted Supersequence with Two Thresholds
- Conclusions and Open Problems
- References
- Approximate Regular Expression Matching with Multi-strings
- Introduction
- Notations and Definitions
- Notation
- Definitions
- Thompson's Automaton
- Approximate Regular Expression Matching
- Incremental String Comparisons
- Bille and Thorup's Algorithm
- A New O(kdn) Algorithm
- Preprocessing
- Matching Algorithm
- References
- Persistency in Suffix Trees with Applications to String Interval Problems
- Introduction
- Definitions and Preliminaries
- Some Persistent Data Structures
- Problems
- The Persistent Suffix Tree
- Using Pointer Arrays
- Using Balanced Search Trees
- The General Framework
- Snapshots
- Using Persistent Arrays
- Renaming
- Renaming for Any Size Pattern
- Applications
- PRI-Report
- PRI-Count
- Substring Select
- References
- Approximate Point Set Pattern Matching with Lp-Norm
- Introduction
- Problem Definitions
- Lp-APSPM for Arbitrary p&
- Lp-APSPM for p Approaching Infinity
- Computing the DP-Table with RMQ
- Determining the Queried Intervals
- Proof of Lemma 3
- Concluding Remarks
- References
- Detecting Health Events on the Social Web to Enable Epidemic Intelligence
- Introduction
- Related Work
- Unsupervised Public Health Event Detection
- Named Entity Feature Representation
- Feature Analysis
- Detecting Public Health Events
- Experiments and Evaluations
- Experiment I: Feature Pruning
- Experiment II: Selection of k
- Experiment III: Cluster Quality
- Experiment IV: Detected Public Health Events
- Experiment V: Efficiency Comparison
- Conclusions and Future Work
- References
- A Learned Approach for Ranking News in Real-Time Using the Blogosphere
- Introduction
- News Story Ranking
- Learning to Rank News Stories
- Experimental Setup
- Results
- Story Ranking Performance
- Strongest Story Ranking Features
- Story Ranking Components
- Predictable vs. Unpredictable Events
- Conclusions
- References
- Attribute Retrieval from Relational Web Tables
- Introduction
- Related Work
- Attribute Retrieval
- Relational Tables and Headers
- Attribute Line Filter
- Relevance
- Experimental Setup
- Evaluation Results
- Filtering
- Attribute Retrieval
- Conclusions
- References
- Query-Sets++: A Scalable Approach for Modeling Web Sites
- Introduction
- Related Work
- General Concepts
- Generic Web Site Vectorization
- Query-Based Feature Spaces for Web Sites
- Model Evaluation and Results
- Conclusions
- References
- Indexing with Gaps
- Introduction
- Our Results
- Problem Definitions and Preliminaries
- Preliminary Definitions and Notations
- G-Bounded Queries with k Gaps
- Speeding Up the Search
- Better Tradeoffs
- Final Speedup of the Query
- References
- Fast Computation of a String Duplication History under No-Breakpoint-Reuse (Extended Abstract)
- Introduction
- Motivation and Related Work
- Relationship to Breakpoint Graphs
- Candidate Duplications
- Finding a Candidate Duplication in Linear Time
- Quasi-Linear Algorithm for History Reconstruction
- Conclusion and Open Problems
- References
- Near Real-Time Suffix Tree Construction via the Fringe Marked Ancestor Problem
- Introduction
- The Fringe Marked Ancestor Problem
- Suffix Trees and Suffix Links
- Right-to-Left Construction
- Left-to-Right and Bi-directional Construction
- References
- Approximations and Partial Solutions for the Consensus Sequence Problem
- Introduction and Related Work
- An Approximation Algorithm for the Consensus Problem
- Integer Programs
- Linear Program Relaxation
- Empirical Results
- References
- Fixed Block Compression Boosting in FM-Indexes
- Introduction
- Basic Algorithmic Machinery
- Compression Boosting
- Fixed Block Compression Boosting
- Experimental Results
- Concluding Remarks
- References
- Space Efficient Wavelet Tree Construction
- Introduction
- Encoding Scheme
- Partitioning
- Merging
- Extension to Generalized Wavelet Trees
- Construction by Permuting Bits
- Overview of Permutations
- Chopping the Most Significant Bits
- Partitioning the Truncated Letters
- Overall Requirements
- Experiments
- References
- Computing the Longest Common Prefix Array Based on the Burrows-Wheeler Transform
- Introduction
- Related Work
- Wavelet Tree
- A LACA Based on the BWT
- Experimental Results
- Other Applications
- References
- A Succinct Index for Hypertext
- Introduction
- Preliminaries
- Compressed Suffix Arrays
- Orthogonal Range Query Structures
- Hypertext
- Construction of the Hypertext Index
- Indexing Node Text
- Storing Graph Topology
- Auxiliary Data Structures
- Pattern Matching in the Hypertext Index
- Preprocessing the Pattern
- Matching within a Node
- Matching across a Single Edge
- Matching across Multiple Edges
- Considering Path Constraints in Hypertext
- Conclusions
- References
- When Was It Written? Automatically Determining Publication Dates
- Introduction
- State of the Art
- Methodology
- Corpus Description
- Corpus Pre-processing
- Evaluation Score
- Description of the Methods
- Chronological Methods
- Named Entities
- Neologisms and Archaisms
- French Spelling Reforms
- Intermediate Conclusion
- Classification Methods
- Cosine Similarity-Based Classification
- Support Vector Machines (SVM)
- Scoring Combination
- Results
- Results for Classification Methods
- Conclusions and Future Work
- References
- A New Approach for Verifying URL Uniqueness in Web Crawlers
- Introduction
- Related Work
- Crawler Architecture
- The Baseline Algorithm (DRUM)
- Algorithm VEUNIQ
- Comparison between Methods
- Computational Cost Model
- A Crawling Simulation
- Conclusions
- References
- External Query Reformulation for Text-Based Image Retrieval
- Introduction
- Background and Related Work
- Definition Documents Based Relevance Feedback
- Pseudo Relevance Feedback
- Identifying Definition Documents by Key-Term Title Matching
- Feedback Term Weighting
- Experimental Setup and Results
- Experimental Setup
- Evaluation on Definition Documents
- Parameters Setting
- Comparing DRF with PRF
- Comparing DRF with Feedback from DDs
- Discussion
- Conclusion and Future Work
- References
- A Knowledge-Based Semantic Kernel for Text Classification
- Introduction
- Semantics in Text Mining and Information Retrieval
- Semantic Relatedness and the Omiotis Measure
- Omiotis-Based Semantic Kernel
- Semantic Smoothing Matrix and Semantic Kernel Design
- Computational Aspects
- Empirical Evaluation
- Conclusions and Future Work
- References
- Compressed Text Indexing with Wildcards
- Introduction
- Preliminaries
- Bit Vectors with Rank/Select
- Suffix Trees and Suffix Arrays
- Compressed Text Indexes
- Compressed Index for Dictionary Matching
- Orthogonal Range Reporting
- Sparse Suffix Trees
- Matching with Wildcards in Compressed Text
- Type-1 Matching
- Type-2 Matching
- Type-3 Matching
- References
- Fast q-gram Mining on SLP Compressed Strings
- Introduction
- Preliminaries
- Straight Line Programs
- Suffix Arrays and LCP Arrays
- Algorithm
- Computing q-gram Frequencies on Uncompressed Strings
- Computing q-gram Frequencies on SLP
- Applications and Extensions
- q-gram Spectrum Kernel
- Optimal Substring Patterns of Length q
- Different Lengths
- Computational Experiments
- Fibonacci Strings
- Pizza and Chili Corpus
- Conclusion
- References
- Succinct Gapped Suffix Arrays
- Introduction
- Definitions
- Computing the Gapped LF Function in Succinct Space
- References
- Finding Frequent Elements in Compressed 2D Arrays and Strings
- Introduction
- Two-Dimensional Arrays
- Improvements for Strings
- References
- On Suffix Extensions in Suffix Trees
- Introduction
- Suffix Trees and Ukkonen's Algorithm
- The Locations of Implicit Nodes
- Maintaining Implicit Nodes Efficiently
- Conclusions
- References
- COCA Filters: Co-occurrence Aware Bloom Filters
- Introduction
- Previous Work and Background
- COCA Filters
- Experimental Results
- Conclusion and Future Work
- References
- On-Line Construction of Position Heaps
- Introduction
- Definition of Position Heap
- Properties of Position Heap
- On-Line Construction Algorithm
- Augmented Position Heap
- Concluding Remarks
- References
- Computing All Subtree Repeats in Ordered Ranked Trees
- Introduction
- Preliminaries
- Properties of Ranked Trees in Postfix Notation
- Algorithms
- Preprocessing
- Finding Subtree Repeats
- Conclusion
- References
- Sparse Spatial Selection for Novelty-Based Search Result Diversification
- Introduction
- Background and Related Work
- Search Result Diversification
- Search in Metric Spaces
- Sparse Spatial Selection Diversification
- Experimental Setup
- Experimental Results
- Conclusions
- References
- Candidate Document Retrieval for Web-Scale Text Reuse Detection
- Introduction
- User over Ranking
- Related Work
- Notation and Basic Definitions
- Baseline: Maximal Termset Query Formulation
- Heuristic Search Strategy
- Experimental Analysis
- Number of Submitted Queries
- Candidate Document Quality
- Conclusion and Outlook
- References
- A Multi-faceted Approach to Query Intent Classification
- Introduction
- Related Work
- Experimental Design
- Predicting Individual Facets
- Combining Multiple Facets
- Genre-Objective Combination
- Genre-Task-Topic Combination
- Conclusions
- References
- Navigating the User Query Space
- Introduction
- Background and Related Research
- Experimental Analysis
- Sampling the User Query Space
- Correlation of User Extracted Queries
- Usability of Predictors for Query Selection
- Conclusion
- References
- Improved Compressed Indexes for Full-Text Document Retrieval
- Introduction and Related Work
- Document Listing with Frequencies
- Top-k Document Retrieval
- Range Color Listing with Frequencies
- Faster Top-k Retrieval
- Lowering the lgD Factor to lgk
- Computing Arbitrary Frequencies
- Using Mmphfs for Top-k Retrieval
- Bounding the Number of Valid Candidates
- Top-k Most Important Document Retrieval
- Final Remarks
- References
- ESP-Index: A Compressed Index Based on Edit-Sensitive Parsing
- Introduction
- Pattern Matching on ESP
- Edit-Sensitive Parsing (ESP)
- Pattern Embedding Problem
- Algorithms and Data Structures
- Finding Evidence of Pattern
- Finding Pattern Occurrence
- Data Structures
- Experiments
- Discussion
- References
- Compressed Indexes for Aligned Pattern Matching
- Introduction and Related Work
- Preliminaries
- Suffix Trees and Suffix Arrays
- Sparse Suffix Trees
- Orthogonal Range Reporting
- A Simple Framework
- Interleaving Technique
- A Compressed Index for Long Query Patterns
- Hardness of the Problem
- A Compressed Index for All Query Patterns
- Query Answering
- Concluding Remarks
- References
- Reference Sequence Construction for Relative Compression of Genomes
- Introduction
- Reference Sequence Construction
- Experimental Results
- Concluding Remarks
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.