String Processing and Information Retrieval

Name: String Processing and Information Retrieval | 22nd International Symposium, SPIRE 2015, London, UK, September 1-4, 2015, Proceedings
Brand: Springer
Price: 53.49 EUR
Availability: OnlineOnly

22nd International Symposium, SPIRE 2015, London, UK, September 1-4, 2015, Proceedings

Costas Iliopoulos Simon Puglisi Emine Yilmaz(Editor)

Springer (Publisher)

Published on 4. September 2015

XVII, 376 pages

E-Book

PDF with digital watermarking

System requirements

978-3-319-23826-5 (ISBN)

€53.49incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Intro
Preface
Organization
Invited Talks
Computational Problems in Mining Urban Data
Journey into Evaluation: From Retrieval Effectiveness to User Engagement
Encodings = (Data Structures) - (Data)
Contents
Faster Exact Search Using Document Clustering
1 Introduction
2 Preliminaries
3 Clustering Based Search
3.1 Developing an Objective Function
3.2 The Clustering Algorithm
3.3 Query Algorithms
4 Implementation Details
5 Experiments
6 Conclusions
References
Fast Online Lempel-Ziv Factorization in Compressed Space
1 Introduction and Related Work
2 Notation
3 Fast Online LZ-Factorization in Compressed Space
3.1 Dynamic FM Index
3.2 Main Algorithm
4 Conclusions
References
Adaptive Computation of the Swap-Insert Correction Distance
1 Introduction
2 Background
3 Algorithm
3.1 High Level Description
3.2 Recursive Computation of DIST(i,j,c)
3.3 Complexity Analysis
4 Discussion
4.1 Implicit Results
4.2 Perspectives
References
Transforming XML Streams with References
1 Introduction
2 Streams with References
3 Top-Down Tree Transducer to Stream Processor Translation
4 Conclusion
Efficient Term Set Prediction Using the Bell-Wigner Inequality
1 Introduction
2 Methodology
3 Validation
4 Related Work
5 Concluding Remarks
References
On Prefix/Suffix-Square Free Words
1 Introduction
2 Preliminaries
3 Prefix/Suffix-Square Free Words
4 Duplication and Related Factorisation Problems
5 Conclusions
References
Temporal Analysis of CHAVE Collection
1 Introduction
2 The CHAVE Collection
3 Documents and Segments
4 Topics and Their Relevant Documents
5 Experiments with Temporal Query Expansion
6 Conclusions
References
DeShaTo: Describing the Shape of Cumulative Topic Distributions to Rank Retrieval Systems Without Relevance Judgments
1 Introduction
2 Describing the Shape of Cumulative Topic Distributions
3 Experiments
3.1 Data Sets Description
3.2 Empirical Validation of the Hypothesis
3.3 Parameter Tuning
3.4 Ranking Retrieval Systems Results
4 Conclusion
References
Induced Sorting Suffixes in External Memory with Better Design and Less Space
1 Introduction
2 Preliminaries
2.1 Preceding Cache Item and Preceding Cache Array
2.2 Priority Queue in STXXL
3 The Algorithms
3.1 Algorithmic Framework
3.2 Retrieving a Preceding Character
3.3 Naming Substrings
3.4 Further Improvements
4 Performance Evaluation
5 Conclusions
References
Efficient Algorithms for Longest Closed Factor Array
1 Introduction
2 Preliminaries
3 Reconstruction Algorithm
3.1 Uniqueness of Reconstruction
3.2 Efficient Reconstruction
4 LCF Array Computation and Verification
5 Final Remarks
References
A Compact RDF Store Using Suffix Arrays
1 Introduction
2 Basic Concepts
2.1 State of the Art: K2Triples
2.2 Compressed Suffix Arrays
3 RDFCSA: A Compressed Suffix Array for RDF
3.1 Structure
3.2 Supporting Basic Graph Pattern Queries in RDFCSA
4 Experimental Evaluation
5 Conclusions and Future Work
Chaining Fragments in Sequences: To Sweep or Not (Extended Abstract)
1 Introduction
2 Preliminaries
3 An Hybrid Algorithm
References
A Faster Algorithm for Computing Maximal -gapped Repeats in a String
1 Introduction
2 Preliminaries
3 O(2 n + occ)-time Algorithm by Kolpakov et al.
4 Computing FMGRmid(w) in O(n + occmid) Time
4.1 When Copies are the Longest Suffix u0 w.r.t. j'
4.2 When u is a Proper Suffix of the Longest Suffix u0 w.r.t. j'
4.3 Finding Maximal -gapped Repeats Starting at k = di
4.4 Complexity Analysis of Our Algorithm
Selective Labeling and Incomplete Label Mitigation for Low-Cost Evaluation
1 Introduction
2 Technical Background and Related Work
2.1 Selective Labeling
2.2 Incomplete Label Mitigation
3 Selecting Representative Documents to Label
4 Experimental Evaluation
4.1 Datasets
4.2 Methods
4.3 Approximation of System Ranking and MAP Scores
4.4 Selective Labeling under Label Prediction
5 Conclusion
References
Relative Select
1 Introduction
2 Design
3 Experiments
References
A de Bruijn Graphs
Temporal Query Classification at Different Granularities
1 Introduction
2 Related Work
3 Preliminaries
4 Temporal Class Taxonomy
5 Bayesian Analysis
6 Feature Design
7 Experimental Evaluation
7.1 Datasets
7.2 Setup
7.3 Experimental Results
8 Conclusion and Future Work
References
Prefix and Suffix Reversals on Strings
1 Introduction and Notations
2 A Detour by Prefix Reversals
3 Grouping Strings by Prefix and Suffix Reversals
4 Sorting Strings by Prefix and Suffix Reversals
5 Rearranging Strings by Prefix and Suffix Reversals
6 Conclusion and Open Questions
References
Filtration Algorithms for Approximate Order-Preserving Matching
1 Introduction
2 Preliminaries
3 Previous Solution
4 Our Solutions
5 Analysis
6 Experiments
7 Concluding Remarks
Fishing in Read Collections: Memory Efficient Indexing for Sequence Assembly
1 Introduction
2 CR-Index Overview
3 Representing k-guide Superstring and Auxiliary Data Structures
4 Finding the k-guide Superstring
5 Experiments
6 Conclusion
References
How Big is that Genome? Estimating Genome Size and Coverage from k-mer Abundance Spectra
1 Introduction
2 Models and Algorithms
3 Experimental Evaluation
4 Conclusion
References
Assessing the Efficiency of Suffix Stripping Approaches for Portuguese Stemming
1 Introduction
2 Related Work
3 Suffix Stripping Approaches
3.1 The List-Based Approach
3.2 The Hash-Based Approach
3.3 The Automata-Based Approach
3.4 Complexity Analysis
4 Experiments
4.1 Experimental Setup
4.2 Experimental Results
5 Conclusion
References
Space-Efficient Detection of Unusual Words
1 Introduction
2 Preliminaries
2.1 Strings
2.2 Enumerating Maximal Repeats and Minimal Rare Words
3 Computing the Border of all Right-Maximal Substrings
4 Detecting Unusual Words in Small Space
References
Parallel Construction of Succinct Representations of Suffix Tree Topologies
1 Introduction
2 Preliminaries and Related Work
3 Sequential Construction of the Enhanced BPR
4 Parallel Construction of the Enhanced BPR
5 A Heuristic Parallel Algorithm
6 Implementation and Experimental Results
References
Computing the Longest Unbordered Substring
1 Introduction
2 Preliminaries
2.1 Borders and Periods
2.2 Suffix Trees and Auxiliary Data Structures
3 Algorithm A
4 Algorithm B
4.1 Candidates
4.2 Long Border Check
4.3 Pseudocode and the Bounds
References
Online Self-Indexed Grammar Compression
1 Introduction
2 Preliminaries
2.1 Basic Notations
2.2 Straight-Line Program (SLP)
2.3 Phrase Dictionary and Reverse Dictionary
2.4 Succinct Data Structures
3 Edit Sensitive Parsing (ESP) and Fully-Online LCA (FOLCA)
4 Index Structure of OESP-Index
4.1 Dynamic Wavelet Tree (DWT)
4.2 Complexity for Building OESP-Index
4.3 Query Search and Substring Extraction
5 Experiments
6 Conclusion
References
Tight Bound for the Number of Distinct Palindromes in a Tree
1 Introduction
2 Preliminaries
3 Palindromes in Spine-Trees
4 Palindromes in General Deterministic Double Trees
5 Main Result
References
Beyond the Runs Theorem
1 Introduction
2 Runs and Lyndon Roots
3 Idle Positions
3.1 Idle Positions that are Resistant to Extensions
3.2 Idle Positions that are Resistant to Left Extensions
4 Computer Search
5 Upper Bound for Finite Words
6 Conclusion
References
Sampling the Suffix Array with Minimizers
1 Introduction
2 Our Algorithm
2.1 The Idea
2.2 Parameter Selection
2.3 Faster Verification with Previous Minimizer Position
2.4 Faster Verification with Text Symbol Sketches
2.5 SamSAMi-hash
3 Sparse Suffix Sorting in O(n) Time and O(n') Space
4 Experimental Results
5 Conclusions and Future Work
References
Longest Common Prefix with Mismatches
1 Introduction
2 Notation and Problem Definition
3 Auxiliary Data Structures
4 A Simple Algorithm for Computing PLCP1
5 A Greedy Algorithm for PLCP1
6 Computation of 1-mappability
References
Evaluating Geographical Knowledge Re-Ranking, Linguistic Processing and Query Expansion Techniques for Geographical Information Retrieval
1 Introduction
2 Related Work
3 System Description
3.1 Textual and Geographical Indexing
3.2 Geographical Information Retrieval
Linguistic and Geographical Knowledge Processing of the Topics.
Textual Document Retrieval.
Geographical Document Retrieval.
Geographical Knowledge Re-Ranking.
4 Experiments
5 Results
6 Conclusions
References
Improved Practical Compact Dynamic Tries
1 Introduction
2 Preliminaries
3 m-Bonsai
3.1 Overview
3.2 Representing the Displacement Array
3.3 Alternate Representation of the Displacement Array
4 Experimental Evaluation
4.1 Implementation
4.2 Experimental Analysis
5 Conclusion
References
ShRkC: Shard Rank Cutoff Prediction for Selective Search
1 Introduction
2 Related Work
3 Shard Rank Cutoff Prediction (ShRkC)
4 ShRkC Estimator Features
4.1 Shard Ranking Based Features
4.2 CSI Based Features
4.3 Collection Statistics Based Features
5 Experimental Methodology
6 Results and Discussion
7 Feature Set Ablation
8 Conclusions
References
Range LCP Queries Revisited
1 Introduction
2 Preliminaries: Suffix Arrays, Trees, LCA, and LCP
3 Data Structure for General k
4 A Space Efficient Framework
4.1 The Framework
5 Data Structure for ILCP1 Queries
5.1 Structures for HLCP1 Queries
5.2 Query Time
5.3 Structure Di
5.4 Structure E(u)
5.5 Query Answering
6 The Hardness Results
References
Feasibility of Word Difficulty Prediction
1 Introduction
2 Background
3 A Machine Learning Solution
3.1 Dataset
3.2 Feature Selection
3.3 Algorithm and Results
4 Finding Similar Words
4.1 Algorithms
4.2 Experimental Comparison
5 Conclusions
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

String Processing and Information Retrieval

Description

More details

Other editions

Additional editions

Content

System requirements