
String Processing and Information Retrieval
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Content
- Intro
- Preface
- Organization
- Invited Talks
- Computational Problems in Mining Urban Data
- Journey into Evaluation: From Retrieval Effectiveness to User Engagement
- Encodings = (Data Structures) - (Data)
- Contents
- Faster Exact Search Using Document Clustering
- 1 Introduction
- 2 Preliminaries
- 3 Clustering Based Search
- 3.1 Developing an Objective Function
- 3.2 The Clustering Algorithm
- 3.3 Query Algorithms
- 4 Implementation Details
- 5 Experiments
- 6 Conclusions
- References
- Fast Online Lempel-Ziv Factorization in Compressed Space
- 1 Introduction and Related Work
- 2 Notation
- 3 Fast Online LZ-Factorization in Compressed Space
- 3.1 Dynamic FM Index
- 3.2 Main Algorithm
- 4 Conclusions
- References
- Adaptive Computation of the Swap-Insert Correction Distance
- 1 Introduction
- 2 Background
- 3 Algorithm
- 3.1 High Level Description
- 3.2 Recursive Computation of DIST(i,j,c)
- 3.3 Complexity Analysis
- 4 Discussion
- 4.1 Implicit Results
- 4.2 Perspectives
- References
- Transforming XML Streams with References
- 1 Introduction
- 2 Streams with References
- 3 Top-Down Tree Transducer to Stream Processor Translation
- 4 Conclusion
- Efficient Term Set Prediction Using the Bell-Wigner Inequality
- 1 Introduction
- 2 Methodology
- 3 Validation
- 4 Related Work
- 5 Concluding Remarks
- References
- On Prefix/Suffix-Square Free Words
- 1 Introduction
- 2 Preliminaries
- 3 Prefix/Suffix-Square Free Words
- 4 Duplication and Related Factorisation Problems
- 5 Conclusions
- References
- Temporal Analysis of CHAVE Collection
- 1 Introduction
- 2 The CHAVE Collection
- 3 Documents and Segments
- 4 Topics and Their Relevant Documents
- 5 Experiments with Temporal Query Expansion
- 6 Conclusions
- References
- DeShaTo: Describing the Shape of Cumulative Topic Distributions to Rank Retrieval Systems Without Relevance Judgments
- 1 Introduction
- 2 Describing the Shape of Cumulative Topic Distributions
- 3 Experiments
- 3.1 Data Sets Description
- 3.2 Empirical Validation of the Hypothesis
- 3.3 Parameter Tuning
- 3.4 Ranking Retrieval Systems Results
- 4 Conclusion
- References
- Induced Sorting Suffixes in External Memory with Better Design and Less Space
- 1 Introduction
- 2 Preliminaries
- 2.1 Preceding Cache Item and Preceding Cache Array
- 2.2 Priority Queue in STXXL
- 3 The Algorithms
- 3.1 Algorithmic Framework
- 3.2 Retrieving a Preceding Character
- 3.3 Naming Substrings
- 3.4 Further Improvements
- 4 Performance Evaluation
- 5 Conclusions
- References
- Efficient Algorithms for Longest Closed Factor Array
- 1 Introduction
- 2 Preliminaries
- 3 Reconstruction Algorithm
- 3.1 Uniqueness of Reconstruction
- 3.2 Efficient Reconstruction
- 4 LCF Array Computation and Verification
- 5 Final Remarks
- References
- A Compact RDF Store Using Suffix Arrays
- 1 Introduction
- 2 Basic Concepts
- 2.1 State of the Art: K2Triples
- 2.2 Compressed Suffix Arrays
- 3 RDFCSA: A Compressed Suffix Array for RDF
- 3.1 Structure
- 3.2 Supporting Basic Graph Pattern Queries in RDFCSA
- 4 Experimental Evaluation
- 5 Conclusions and Future Work
- Chaining Fragments in Sequences: To Sweep or Not (Extended Abstract)
- 1 Introduction
- 2 Preliminaries
- 3 An Hybrid Algorithm
- References
- A Faster Algorithm for Computing Maximal -gapped Repeats in a String
- 1 Introduction
- 2 Preliminaries
- 3 O(2 n + occ)-time Algorithm by Kolpakov et al.
- 4 Computing FMGRmid(w) in O(n + occmid) Time
- 4.1 When Copies are the Longest Suffix u0 w.r.t. j'
- 4.2 When u is a Proper Suffix of the Longest Suffix u0 w.r.t. j'
- 4.3 Finding Maximal -gapped Repeats Starting at k = di
- 4.4 Complexity Analysis of Our Algorithm
- Selective Labeling and Incomplete Label Mitigation for Low-Cost Evaluation
- 1 Introduction
- 2 Technical Background and Related Work
- 2.1 Selective Labeling
- 2.2 Incomplete Label Mitigation
- 3 Selecting Representative Documents to Label
- 4 Experimental Evaluation
- 4.1 Datasets
- 4.2 Methods
- 4.3 Approximation of System Ranking and MAP Scores
- 4.4 Selective Labeling under Label Prediction
- 5 Conclusion
- References
- Relative Select
- 1 Introduction
- 2 Design
- 3 Experiments
- References
- A de Bruijn Graphs
- Temporal Query Classification at Different Granularities
- 1 Introduction
- 2 Related Work
- 3 Preliminaries
- 4 Temporal Class Taxonomy
- 5 Bayesian Analysis
- 6 Feature Design
- 7 Experimental Evaluation
- 7.1 Datasets
- 7.2 Setup
- 7.3 Experimental Results
- 8 Conclusion and Future Work
- References
- Prefix and Suffix Reversals on Strings
- 1 Introduction and Notations
- 2 A Detour by Prefix Reversals
- 3 Grouping Strings by Prefix and Suffix Reversals
- 4 Sorting Strings by Prefix and Suffix Reversals
- 5 Rearranging Strings by Prefix and Suffix Reversals
- 6 Conclusion and Open Questions
- References
- Filtration Algorithms for Approximate Order-Preserving Matching
- 1 Introduction
- 2 Preliminaries
- 3 Previous Solution
- 4 Our Solutions
- 5 Analysis
- 6 Experiments
- 7 Concluding Remarks
- Fishing in Read Collections: Memory Efficient Indexing for Sequence Assembly
- 1 Introduction
- 2 CR-Index Overview
- 3 Representing k-guide Superstring and Auxiliary Data Structures
- 4 Finding the k-guide Superstring
- 5 Experiments
- 6 Conclusion
- References
- How Big is that Genome? Estimating Genome Size and Coverage from k-mer Abundance Spectra
- 1 Introduction
- 2 Models and Algorithms
- 3 Experimental Evaluation
- 4 Conclusion
- References
- Assessing the Efficiency of Suffix Stripping Approaches for Portuguese Stemming
- 1 Introduction
- 2 Related Work
- 3 Suffix Stripping Approaches
- 3.1 The List-Based Approach
- 3.2 The Hash-Based Approach
- 3.3 The Automata-Based Approach
- 3.4 Complexity Analysis
- 4 Experiments
- 4.1 Experimental Setup
- 4.2 Experimental Results
- 5 Conclusion
- References
- Space-Efficient Detection of Unusual Words
- 1 Introduction
- 2 Preliminaries
- 2.1 Strings
- 2.2 Enumerating Maximal Repeats and Minimal Rare Words
- 3 Computing the Border of all Right-Maximal Substrings
- 4 Detecting Unusual Words in Small Space
- References
- Parallel Construction of Succinct Representations of Suffix Tree Topologies
- 1 Introduction
- 2 Preliminaries and Related Work
- 3 Sequential Construction of the Enhanced BPR
- 4 Parallel Construction of the Enhanced BPR
- 5 A Heuristic Parallel Algorithm
- 6 Implementation and Experimental Results
- References
- Computing the Longest Unbordered Substring
- 1 Introduction
- 2 Preliminaries
- 2.1 Borders and Periods
- 2.2 Suffix Trees and Auxiliary Data Structures
- 3 Algorithm A
- 4 Algorithm B
- 4.1 Candidates
- 4.2 Long Border Check
- 4.3 Pseudocode and the Bounds
- References
- Online Self-Indexed Grammar Compression
- 1 Introduction
- 2 Preliminaries
- 2.1 Basic Notations
- 2.2 Straight-Line Program (SLP)
- 2.3 Phrase Dictionary and Reverse Dictionary
- 2.4 Succinct Data Structures
- 3 Edit Sensitive Parsing (ESP) and Fully-Online LCA (FOLCA)
- 4 Index Structure of OESP-Index
- 4.1 Dynamic Wavelet Tree (DWT)
- 4.2 Complexity for Building OESP-Index
- 4.3 Query Search and Substring Extraction
- 5 Experiments
- 6 Conclusion
- References
- Tight Bound for the Number of Distinct Palindromes in a Tree
- 1 Introduction
- 2 Preliminaries
- 3 Palindromes in Spine-Trees
- 4 Palindromes in General Deterministic Double Trees
- 5 Main Result
- References
- Beyond the Runs Theorem
- 1 Introduction
- 2 Runs and Lyndon Roots
- 3 Idle Positions
- 3.1 Idle Positions that are Resistant to Extensions
- 3.2 Idle Positions that are Resistant to Left Extensions
- 4 Computer Search
- 5 Upper Bound for Finite Words
- 6 Conclusion
- References
- Sampling the Suffix Array with Minimizers
- 1 Introduction
- 2 Our Algorithm
- 2.1 The Idea
- 2.2 Parameter Selection
- 2.3 Faster Verification with Previous Minimizer Position
- 2.4 Faster Verification with Text Symbol Sketches
- 2.5 SamSAMi-hash
- 3 Sparse Suffix Sorting in O(n) Time and O(n') Space
- 4 Experimental Results
- 5 Conclusions and Future Work
- References
- Longest Common Prefix with Mismatches
- 1 Introduction
- 2 Notation and Problem Definition
- 3 Auxiliary Data Structures
- 4 A Simple Algorithm for Computing PLCP1
- 5 A Greedy Algorithm for PLCP1
- 6 Computation of 1-mappability
- References
- Evaluating Geographical Knowledge Re-Ranking, Linguistic Processing and Query Expansion Techniques for Geographical Information Retrieval
- 1 Introduction
- 2 Related Work
- 3 System Description
- 3.1 Textual and Geographical Indexing
- 3.2 Geographical Information Retrieval
- Linguistic and Geographical Knowledge Processing of the Topics.
- Textual Document Retrieval.
- Geographical Document Retrieval.
- Geographical Knowledge Re-Ranking.
- 4 Experiments
- 5 Results
- 6 Conclusions
- References
- Improved Practical Compact Dynamic Tries
- 1 Introduction
- 2 Preliminaries
- 3 m-Bonsai
- 3.1 Overview
- 3.2 Representing the Displacement Array
- 3.3 Alternate Representation of the Displacement Array
- 4 Experimental Evaluation
- 4.1 Implementation
- 4.2 Experimental Analysis
- 5 Conclusion
- References
- ShRkC: Shard Rank Cutoff Prediction for Selective Search
- 1 Introduction
- 2 Related Work
- 3 Shard Rank Cutoff Prediction (ShRkC)
- 4 ShRkC Estimator Features
- 4.1 Shard Ranking Based Features
- 4.2 CSI Based Features
- 4.3 Collection Statistics Based Features
- 5 Experimental Methodology
- 6 Results and Discussion
- 7 Feature Set Ablation
- 8 Conclusions
- References
- Range LCP Queries Revisited
- 1 Introduction
- 2 Preliminaries: Suffix Arrays, Trees, LCA, and LCP
- 3 Data Structure for General k
- 4 A Space Efficient Framework
- 4.1 The Framework
- 5 Data Structure for ILCP1 Queries
- 5.1 Structures for HLCP1 Queries
- 5.2 Query Time
- 5.3 Structure Di
- 5.4 Structure E(u)
- 5.5 Query Answering
- 6 The Hardness Results
- References
- Feasibility of Word Difficulty Prediction
- 1 Introduction
- 2 Background
- 3 A Machine Learning Solution
- 3.1 Dataset
- 3.2 Feature Selection
- 3.3 Algorithm and Results
- 4 Finding Similar Words
- 4.1 Algorithms
- 4.2 Experimental Comparison
- 5 Conclusions
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.