An Introduction to String Algorithms

Name: An Introduction to String Algorithms
Brand: LAPA Publishers
Price: 68.49 EUR
Availability: OnlineOnly

Carl Kingsford(Author)

LAPA Publishers

Published on 19. May 2026

440 pages

E-Book

PDF with digital watermarking

System requirements

978-0-691-27454-6 (ISBN)

€68.49incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Person

Content

Cover
Contents
Preface
1. Introduction
1.1 What are strings?
1.2 Why study string algorithms?
1.3 What are our goals?
1.4 A roadmap
1.5 A Primer
1.5.1 String notations and definitions
1.5.2 Graph theory
1.5.3 Running times
1.5.4 Sets and combinatorics
1.5.5 Numbers
1.5.6 Data structures
1.5.7 Probability
I. Exact Matching
2. The Z Algorithm
2.1 The exact matching problem
2.2 Simple (slow) solution
2.3 The Z algorithm
2.4 Computing the Z values
2.5 Summary and notes
2.6 Exercises
3. Boyer-Moore
3.1 High-level description
3.1.1 First rule: Next Matching Character
3.1.2 Second rule: Good Shift Rule
3.1.3 Complete algorithm
3.2 Computing the Ri values for the next matching character rule
3.3 Formalizing the Good Shift Rule
3.4 Implementing the Good Shift Rule
3.4.1 Computing the L(i) values
3.4.2 Computing the l(i) values
3.5 Summary and notes
3.6 Exercises
4. Knuth-Morris-Pratt
4.1 KMP via deterministic finite automata
4.1.1 The KMP DFA
4.1.2 Using the memo array for string search
4.1.3 Correctness and running time
4.1.4 Computing memo
4.2 KMP via the Z-values
4.2.1 Running time of the spmi version of KMP
4.2.2 Computing the spmi array
4.3 Summary and notes
4.4 Exercises
5. Seminumerical String Matching
5.1 Rabin-Karp fingerprinting
5.1.1 Computing p
5.1.2 Computing ts for every position s
5.1.3 Time for addition, product, and comparison
5.1.4 Quantifying and reducing false positives
5.1.5 Reducing the error probability
5.2 The Shift-And algorithm
5.2.1 Space and runtime
5.2.2 Extension to approximate matching
5.3 Summary and notes
5.4 Exercises
6. Searching for Multiple Patterns
6.1 Aho-Corasick-a prefix-based approach
6.1.1 Search
6.1.2 Handling patterns contained in other patterns
6.1.3 Running time
6.1.4 Computing f
6.2 Wu-Manber-a suffix-based approach
6.3 Summary and notes
6.4 Exercises
II. Edit Distance
7. Edit Distance for Inexact Matching
7.1 Edit distance and alignments
7.2 The string alignment problem
7.2.1 Algorithm for minimum cost alignments
7.2.2 Implementing the recursive algorithm
7.3 Dynamic programming
7.4 Finding the actual alignment: traceback
7.5 Local and semi-global alignment
7.5.1 Local alignment
7.5.2 Semi-global alignment
7.6 Summary and notes
7.7 Exercises
8. Edit Distance in Linear Space
8.1 Using linear space to compute edit distance values
8.2 Finding the actual alignment in linear space
8.2.1 Hirschberg's algorithm
8.2.2 Proof of running time
8.3 Summary and notes
8.4 Exercises
9. Faster Edit Distance via the "Four Russians" Trick
9.1 Dynamic programming blocks
9.2 Precomputing f
9.2.1 Offset encoding
9.2.2 Number of possible inputs to f
9.2.3 Storing f for quick access
9.3 Total running time
9.4 Finding the alignment
9.5 Summary and notes
9.6 Exercises
10. General and Affine Gap Penalties
10.1 General gap penalties
10.2 Affine gap penalties
10.3 Why do we "need" 3 matrices?
10.4 Summary and notes
10.5 Exercises
11. Output-Sensitive Algorithms for Edit Distance
11.1 Banded alignment
11.2 Myers' Minimum Edit Script algorithm
11.3 Landau-Vishkin*
11.3.1 The k differences problem
11.3.2 An alternative dynamic program
11.3.3 The Landau-Vishkin algorithm
11.4 Summary and notes
11.5 Exercises
12. Aligning Multiple Strings
12.1 Multiple sequence alignments
12.2 Progressive alignment heuristic
12.3 Approximation algorithm for one MSA formulation
12.4 Summary and notes
12.5 Exercises
III. Data Structures
13. Suffix Trees
13.1 The fundamental string data structure
13.2 Tries
13.3 Defining suffix trees
13.4 Applications of suffix trees
13.4.1 Simple queries
13.4.2 Longest common substring of two strings
13.4.3 Generalized suffix trees
13.4.4 All pairs suffix-prefix matches
13.5 Summary and notes
13.6 Exercises
14. More Applications of Suffix Trees
14.1 Lempel-Ziv compression
14.2 Longest common extension
14.3 Finding palindromes
14.4 Finding all k-mismatch occurrences of a pattern
14.5 Longest common substring with online queries
14.6 Summary and notes
14.7 Exercises
15. Suffix Tree Construction
15.1 Constructing suffix tries
15.2 Ukkonen's algorithm
15.2.1 Updating the tree
15.2.2 Starting each phase
15.2.3 Example run of Ukkonen's algorithm
15.2.4 Running time of Ukkonen's algorithm
15.3 Summary and notes
15.4 Exercises
16. Suffix Arrays
16.1 Suffix tree operations in less space
16.2 Suffix array suffix tree
16.3 Searching for substrings using the suffix array
16.4 Faster search
16.4.1 Obtaining the lcp(i, j) values
16.5 Summary and notes
16.6 Exercises
17. Suffix Array Construction
17.1 An O(|T| log |T|) algorithm for suffix array construction
17.1.1 Radix sort
17.2 A linear-time construction algorithm
17.2.1 Running time
17.3 Summary and notes
17.4 Exercises
18. Burrows-Wheeler Transform
18.1 Definition of the BWT
18.2 Why is this useful for compression
18.3 Recovering the original string from its BWT
18.3.1 The LF mapping
18.3.2 Faster inversion algorithm
18.4 Faster computation of the BWT
18.5 Applications of the BWT
18.5.1 Search
18.5.2 Compression
18.6 Summary and notes
18.7 Exercises
19. RRR Compressed Bit Vectors
19.1 Rank and select operations
19.2 RRR bit vector data structure
19.3 Computing rank1 (S, i)
19.4 Compactly encoding the RRR information
19.4.1 Prefix-sum data structure
19.4.2 Space to store the vectors of integers
19.4.3 Space for the tables
19.4.4 Space for the p values
19.5 Supporting the select operation*
19.5.1 The select data structures
19.5.2 Answering select queries
19.6 Summary and notes
19.7 Exercises
20. Wavelet Trees
20.1 The wavelet tree data structure
20.1.1 The access S[i] operation
20.1.2 The rankc(S, i) operation
20.1.3 The selectc(S, j) operation
20.1.4 Running times
20.2 Applications of wavelet trees
20.2.1 Inverted indices
20.2.2 Document retrieval
20.2.3 Storing graphs
20.3 Summary and notes
20.4 Exercises
21. The FM Index for Compressed Searching
21.1 BWT and wavelet trees
21.2 A more compression-focused approach
21.2.1 Computing rank on the compressed string
21.2.2 Space usage for count arrays
21.2.3 Computing occurrence counts
21.2.4 Computing counts within a block
21.3 Summary and notes
21.4 Exercises
22. Storing Text for Editing
22.1 Gap buffers
22.2 Piece tables
22.3 Ropes
22.4 Summary and notes
22.5 Exercises
IV. Sketching
23. Locality Sensitive Hashing
23.1 Estimating similarity between strings
23.2 An LSH family for the Jaccard similarity
23.3 An LSH family for the weighted Jaccard
23.4 An LSH family for a cosine-related similarity
23.5 A gapped LSH family for edit distance
23.6 Summary and notes
23.7 Exercises
24. Bloom Filters
24.1 How do Bloom filters work
24.2 How should we set the filter parameters
24.3 Notes on and extensions to Bloom filters
24.4 Cascading Bloom filters
24.5 Summary and notes
24.6 Exercises
25. Sketching with Minimizers
25.1 Accelerating read overlap computation
25.2 Definition of minimizers
25.3 Properties of minimizers
25.4 Expected density
25.5 Random minimizers
25.6 Selecting minimizers
25.7 De Bruijn sequences and de Bruijn graphs
25.8 Set of unavoidable words
25.9 Summary and notes
25.10 Exercises
V. Graphs and Generative Models
26. Regular Grammars and the Chomsky Hierarchy
26.1 Regular grammars
26.1.1 Finite state automata
26.1.2 Deterministic vs. non-deterministic FSAs
26.1.3 Regular grammars and regular expressions
26.1.4 Summary of regular grammars
26.2 Context-free grammars
26.2.1 CFG parse trees
26.2.2 Chomsky normal form
26.3 Context-sensitive and unrestricted grammars
26.4 Summary and notes
26.5 Exercises
27. Hidden Markov Models
27.1 Definition of hidden Markov models
27.2 HMM decoding problems
27.3 Viterbi algorithm
27.4 The Forward-Backward algorithm
27.4.1 Forward algorithm
27.4.2 Backward algorithm
27.5 Estimating HMM parameters
27.5.1 Labeled training
27.5.2 Viterbi training
27.5.3 Baum-Welch algorithm
27.6 Summary and notes
27.7 Exercises
28. Stochastic Context-Free Grammars
28.1 The Inside algorithm
28.2 Cocke-Younger-Kasami (CYK) algorithm
28.3 Estimating the production probabilities
28.3.1 The Outside algorithm
28.4 Summary and notes
28.5 Exercises
29. Genome Graphs
29.1 Some types of genome graphs
29.1.1 De Bruijn graphs
29.1.2 Colored de Bruijn graphs
29.1.3 Variation graphs
29.2 Constructing genome graphs via compression
29.3 Alignment of strings to genome graphs
29.3.1 Alignment graph
29.4 Comparing genome graphs
29.5 Summary and notes
29.6 Exercises
30. BWT on Labeled Trees
30.1 String trees
30.2 String tree BWT
30.2.1 The interval corresponding to the ith node
30.2.2 A child interval
30.3 Extending to substring intervals
30.4 Backwards search
30.5 Summary and notes
30.6 Exercises
31. Transformers
31.1 Embeddings
31.2 Encoder-decoder architectures
31.3 Feedforward neural networks
31.4 Attention
31.5 Transformers
31.5.1 The transformer encoder
31.5.2 The transformer decoder
31.6 Training and using transformers
31.7 Summary and notes
31.8 Exercises
VI. Miscellaneous
32. Huffman Codes
32.1 Prefix-free codes
32.1.1 UTF-8 encodings
32.2 Huffman coding algorithm
32.2.1 Proof of optimality
32.2.2 Linear-time code construction
32.3 Summary and notes
32.4 Exercises
33. Shortest Superstring
33.1 Relationship to the traveling salesman problem
33.2 The Cycle Cover algorithm
33.3 Some lemmas about repeated strings
33.4 The Cycle Cover algorithm gives a 4-approximation
33.5 Summary and notes
33.6 Exercises
34. Limits on What's Possible
34.1 A brief introduction to NP-completeness
34.2 Shortest superstring is NP-complete
34.2.1 The reduction
34.2.2 The proof
34.2.3 Additional notes
34.3 Shortest supersequence is NP-complete
34.4 MSA with SP-score is NP-complete
34.5 Summary and notes
34.6 Exercises
35. Conclusion
Bibliography
List of Algorithms
Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

An Introduction to String Algorithms

Description

More details

Other editions

Additional editions

Person

Content

System requirements