String Processing and Information Retrieval

Name: String Processing and Information Retrieval | 18th International Symposium, SPIRE 2011, Pisa, Italy, October 17-21, 2011, Proceedings
Brand: Springer
Price: 53.49 EUR
Availability: OnlineOnly

18th International Symposium, SPIRE 2011, Pisa, Italy, October 17-21, 2011, Proceedings

Roberto Grossi Fabrizio Silvestri Fabrizio Sebastiani(Editor)

Springer (Publisher)

Published on 2. October 2011

XIV, 428 pages

E-Book

PDF with digital watermarking

System requirements

978-3-642-24583-1 (ISBN)

€53.49incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Title page
Preface
Organization
Table of Contents
Constructing Strings at the Nano Scale via Staged Self-assembly
Discounted Cumulative Gain and User Decision Models
Introduction
Discounted Cumulative Gain
Deterministic Click User Model
Probabilistic Click User Model
Expected Utility
Parameters Estimation
NumericalExperiments
Discussion
References
Cross-Lingual Text Fragment Alignment Using Divergence from Randomness
Introduction
Related Work
Text Fragment Alignment
Similarity Measures and Divergence from Randomness
Extraction of Fragments
Experimental Study
Document-Summary Association
Text Fragment Alignment Evaluation
Conclusion and Future Work
References
Enhancing Document Snippets Using Temporal Information
Introduction
Temporal Order
Enhancing Snippets with Temporal Information
Experimental Results
Conclusions and Outlook
References
Spaced Seeds Design Using Perfect Rulers
Introduction
Notation
From Rulers to Spaced Seeds
Lower Bounds on the Minimum Pattern Length m*P
Wichmann Rulers
Restricted vs. Unrestricted Rulers
Conclusions
References
Weighted Shortest Common Supersequence
Introduction
Motivation
Weighted Sequences
Related Work
Shortest Common Weighted Supersequence
Shortest Common Weighted Supersequence with Two Thresholds
Conclusions and Open Problems
References
Approximate Regular Expression Matching with Multi-strings
Introduction
Notations and Definitions
Notation
Definitions
Thompson's Automaton
Approximate Regular Expression Matching
Incremental String Comparisons
Bille and Thorup's Algorithm
A New O(kdn) Algorithm
Preprocessing
Matching Algorithm
References
Persistency in Suffix Trees with Applications to String Interval Problems
Introduction
Definitions and Preliminaries
Some Persistent Data Structures
Problems
The Persistent Suffix Tree
Using Pointer Arrays
Using Balanced Search Trees
The General Framework
Snapshots
Using Persistent Arrays
Renaming
Renaming for Any Size Pattern
Applications
PRI-Report
PRI-Count
Substring Select
References
Approximate Point Set Pattern Matching with Lp-Norm
Introduction
Problem Definitions
Lp-APSPM for Arbitrary p&
Lp-APSPM for p Approaching Infinity
Computing the DP-Table with RMQ
Determining the Queried Intervals
Proof of Lemma 3
Concluding Remarks
References
Detecting Health Events on the Social Web to Enable Epidemic Intelligence
Introduction
Related Work
Unsupervised Public Health Event Detection
Named Entity Feature Representation
Feature Analysis
Detecting Public Health Events
Experiments and Evaluations
Experiment I: Feature Pruning
Experiment II: Selection of k
Experiment III: Cluster Quality
Experiment IV: Detected Public Health Events
Experiment V: Efficiency Comparison
Conclusions and Future Work
References
A Learned Approach for Ranking News in Real-Time Using the Blogosphere
Introduction
News Story Ranking
Learning to Rank News Stories
Experimental Setup
Results
Story Ranking Performance
Strongest Story Ranking Features
Story Ranking Components
Predictable vs. Unpredictable Events
Conclusions
References
Attribute Retrieval from Relational Web Tables
Introduction
Related Work
Attribute Retrieval
Relational Tables and Headers
Attribute Line Filter
Relevance
Experimental Setup
Evaluation Results
Filtering
Attribute Retrieval
Conclusions
References
Query-Sets++: A Scalable Approach for Modeling Web Sites
Introduction
Related Work
General Concepts
Generic Web Site Vectorization
Query-Based Feature Spaces for Web Sites
Model Evaluation and Results
Conclusions
References
Indexing with Gaps
Introduction
Our Results
Problem Definitions and Preliminaries
Preliminary Definitions and Notations
G-Bounded Queries with k Gaps
Speeding Up the Search
Better Tradeoffs
Final Speedup of the Query
References
Fast Computation of a String Duplication History under No-Breakpoint-Reuse (Extended Abstract)
Introduction
Motivation and Related Work
Relationship to Breakpoint Graphs
Candidate Duplications
Finding a Candidate Duplication in Linear Time
Quasi-Linear Algorithm for History Reconstruction
Conclusion and Open Problems
References
Near Real-Time Suffix Tree Construction via the Fringe Marked Ancestor Problem
Introduction
The Fringe Marked Ancestor Problem
Suffix Trees and Suffix Links
Right-to-Left Construction
Left-to-Right and Bi-directional Construction
References
Approximations and Partial Solutions for the Consensus Sequence Problem
Introduction and Related Work
An Approximation Algorithm for the Consensus Problem
Integer Programs
Linear Program Relaxation
Empirical Results
References
Fixed Block Compression Boosting in FM-Indexes
Introduction
Basic Algorithmic Machinery
Compression Boosting
Fixed Block Compression Boosting
Experimental Results
Concluding Remarks
References
Space Efficient Wavelet Tree Construction
Introduction
Encoding Scheme
Partitioning
Merging
Extension to Generalized Wavelet Trees
Construction by Permuting Bits
Overview of Permutations
Chopping the Most Significant Bits
Partitioning the Truncated Letters
Overall Requirements
Experiments
References
Computing the Longest Common Prefix Array Based on the Burrows-Wheeler Transform
Introduction
Related Work
Wavelet Tree
A LACA Based on the BWT
Experimental Results
Other Applications
References
A Succinct Index for Hypertext
Introduction
Preliminaries
Compressed Suffix Arrays
Orthogonal Range Query Structures
Hypertext
Construction of the Hypertext Index
Indexing Node Text
Storing Graph Topology
Auxiliary Data Structures
Pattern Matching in the Hypertext Index
Preprocessing the Pattern
Matching within a Node
Matching across a Single Edge
Matching across Multiple Edges
Considering Path Constraints in Hypertext
Conclusions
References
When Was It Written? Automatically Determining Publication Dates
Introduction
State of the Art
Methodology
Corpus Description
Corpus Pre-processing
Evaluation Score
Description of the Methods
Chronological Methods
Named Entities
Neologisms and Archaisms
French Spelling Reforms
Intermediate Conclusion
Classification Methods
Cosine Similarity-Based Classification
Support Vector Machines (SVM)
Scoring Combination
Results
Results for Classification Methods
Conclusions and Future Work
References
A New Approach for Verifying URL Uniqueness in Web Crawlers
Introduction
Related Work
Crawler Architecture
The Baseline Algorithm (DRUM)
Algorithm VEUNIQ
Comparison between Methods
Computational Cost Model
A Crawling Simulation
Conclusions
References
External Query Reformulation for Text-Based Image Retrieval
Introduction
Background and Related Work
Definition Documents Based Relevance Feedback
Pseudo Relevance Feedback
Identifying Definition Documents by Key-Term Title Matching
Feedback Term Weighting
Experimental Setup and Results
Experimental Setup
Evaluation on Definition Documents
Parameters Setting
Comparing DRF with PRF
Comparing DRF with Feedback from DDs
Discussion
Conclusion and Future Work
References
A Knowledge-Based Semantic Kernel for Text Classification
Introduction
Semantics in Text Mining and Information Retrieval
Semantic Relatedness and the Omiotis Measure
Omiotis-Based Semantic Kernel
Semantic Smoothing Matrix and Semantic Kernel Design
Computational Aspects
Empirical Evaluation
Conclusions and Future Work
References
Compressed Text Indexing with Wildcards
Introduction
Preliminaries
Bit Vectors with Rank/Select
Suffix Trees and Suffix Arrays
Compressed Text Indexes
Compressed Index for Dictionary Matching
Orthogonal Range Reporting
Sparse Suffix Trees
Matching with Wildcards in Compressed Text
Type-1 Matching
Type-2 Matching
Type-3 Matching
References
Fast q-gram Mining on SLP Compressed Strings
Introduction
Preliminaries
Straight Line Programs
Suffix Arrays and LCP Arrays
Algorithm
Computing q-gram Frequencies on Uncompressed Strings
Computing q-gram Frequencies on SLP
Applications and Extensions
q-gram Spectrum Kernel
Optimal Substring Patterns of Length q
Different Lengths
Computational Experiments
Fibonacci Strings
Pizza and Chili Corpus
Conclusion
References
Succinct Gapped Suffix Arrays
Introduction
Definitions
Computing the Gapped LF Function in Succinct Space
References
Finding Frequent Elements in Compressed 2D Arrays and Strings
Introduction
Two-Dimensional Arrays
Improvements for Strings
References
On Suffix Extensions in Suffix Trees
Introduction
Suffix Trees and Ukkonen's Algorithm
The Locations of Implicit Nodes
Maintaining Implicit Nodes Efficiently
Conclusions
References
COCA Filters: Co-occurrence Aware Bloom Filters
Introduction
Previous Work and Background
COCA Filters
Experimental Results
Conclusion and Future Work
References
On-Line Construction of Position Heaps
Introduction
Definition of Position Heap
Properties of Position Heap
On-Line Construction Algorithm
Augmented Position Heap
Concluding Remarks
References
Computing All Subtree Repeats in Ordered Ranked Trees
Introduction
Preliminaries
Properties of Ranked Trees in Postfix Notation
Algorithms
Preprocessing
Finding Subtree Repeats
Conclusion
References
Sparse Spatial Selection for Novelty-Based Search Result Diversification
Introduction
Background and Related Work
Search Result Diversification
Search in Metric Spaces
Sparse Spatial Selection Diversification
Experimental Setup
Experimental Results
Conclusions
References
Candidate Document Retrieval for Web-Scale Text Reuse Detection
Introduction
User over Ranking
Related Work
Notation and Basic Definitions
Baseline: Maximal Termset Query Formulation
Heuristic Search Strategy
Experimental Analysis
Number of Submitted Queries
Candidate Document Quality
Conclusion and Outlook
References
A Multi-faceted Approach to Query Intent Classification
Introduction
Related Work
Experimental Design
Predicting Individual Facets
Combining Multiple Facets
Genre-Objective Combination
Genre-Task-Topic Combination
Conclusions
References
Navigating the User Query Space
Introduction
Background and Related Research
Experimental Analysis
Sampling the User Query Space
Correlation of User Extracted Queries
Usability of Predictors for Query Selection
Conclusion
References
Improved Compressed Indexes for Full-Text Document Retrieval
Introduction and Related Work
Document Listing with Frequencies
Top-k Document Retrieval
Range Color Listing with Frequencies
Faster Top-k Retrieval
Lowering the lgD Factor to lgk
Computing Arbitrary Frequencies
Using Mmphfs for Top-k Retrieval
Bounding the Number of Valid Candidates
Top-k Most Important Document Retrieval
Final Remarks
References
ESP-Index: A Compressed Index Based on Edit-Sensitive Parsing
Introduction
Pattern Matching on ESP
Edit-Sensitive Parsing (ESP)
Pattern Embedding Problem
Algorithms and Data Structures
Finding Evidence of Pattern
Finding Pattern Occurrence
Data Structures
Experiments
Discussion
References
Compressed Indexes for Aligned Pattern Matching
Introduction and Related Work
Preliminaries
Suffix Trees and Suffix Arrays
Sparse Suffix Trees
Orthogonal Range Reporting
A Simple Framework
Interleaving Technique
A Compressed Index for Long Query Patterns
Hardness of the Problem
A Compressed Index for All Query Patterns
Query Answering
Concluding Remarks
References
Reference Sequence Construction for Relative Compression of Genomes
Introduction
Reference Sequence Construction
Experimental Results
Concluding Remarks
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

String Processing and Information Retrieval

Description

More details

Other editions

Additional editions

Content

System requirements