Advanced Algorithms and Data Structures

Name: Advanced Algorithms and Data Structures
Brand: Manning
Price: 49.44 EUR
Availability: OnlineOnly

Marcello La Rocca(Author)

Manning (Publisher)

1st Edition

Published on 10. August 2021

768 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-63835-022-4 (ISBN)

€49.44incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

"An accessible introduction to the fundamental algorithms used to run the world." - Richard Vaughan, Purple Monkey Collective

Advanced Algorithms and Data Structures introduces a collection of algorithms for complex programming challenges in data analysis, machine learning, and graph computing.

Summary
As a software engineer, you'll encounter countless programming challenges that initially seem confusing, difficult, or even impossible. Don't despair! Many of these "new" problems already have well-established solutions. Advanced Algorithms and Data Structures teaches you powerful approaches to a wide range of tricky coding challenges that you can adapt and apply to your own applications. Providing a balanced blend of classic, advanced, and new algorithms, this practical guide upgrades your programming toolbox with new perspectives and hands-on techniques.

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the technology
Can you improve the speed and efficiency of your applications without investing in new hardware? Well, yes, you can: Innovations in algorithms and data structures have led to huge advances in application performance. Pick up this book to discover a collection of advanced algorithms that will make you a more effective developer.

About the book
Advanced Algorithms and Data Structures introduces a collection of algorithms for complex programming challenges in data analysis, machine learning, and graph computing. You'll discover cutting-edge approaches to a variety of tricky scenarios. You'll even learn to design your own data structures for projects that require a custom solution.

What's inside
Build on basic data structures you already know
Profile your algorithms to speed up application
Store and query strings efficiently
Distribute clustering algorithms with MapReduce
Solve logistics problems using graphs and optimization algorithms

About the reader
For intermediate programmers.

About the author
Marcello La Rocca is a research scientist and a full-stack engineer. His focus is on optimization algorithms, genetic algorithms, machine learning, and quantum computing.

Table of Contents

1 Introducing data structures
PART 1 IMPROVING OVER BASIC DATA STRUCTURES
2 Improving priority queues: d-way heaps
3 Treaps: Using randomization to balance binary search trees
4 Bloom filters: Reducing the memory for tracking content
5 Disjoint sets: Sub-linear time processing
6 Trie, radix trie: Efficient string search
7 Use case: LRU cache
PART 2 MULTIDEMENSIONAL QUERIES
8 Nearest neighbors search
9 K-d trees: Multidimensional data indexing
10 Similarity Search Trees: Approximate nearest neighbors search for image retrieval
11 Applications of nearest neighbor search
12 Clustering
13 Parallel clustering: MapReduce and canopy clustering
PART 3 PLANAR GRAPHS AND MINIMUM CROSSING NUMBER
14 An introduction to graphs: Finding paths of minimum distance
15 Graph embeddings and planarity: Drawing graphs with minimal edge intersections
16 Gradient descent: Optimization problems (not just) on graphs
17 Simulated annealing: Optimization beyond local minima
18 Genetic algorithms: Biologically inspired, fast-converging optimization

More details

Other editions

Person

Content

Intro
inside front cover
Advanced Algorithms and Data Structures
Copyright
dedication
contents
front matter
foreword
preface
Welcome to Advanced Algorithms and Data Structures
acknowledgments
about this book
Who should read this book?
How this book is organized: a roadmap
About the code
liveBook discussion forum
about the author
about the cover illustration
1 Introducing data structures
1.1 Data structures
1.1.1 Defining a data structure
1.1.2 Describing a data structure
1.1.3 Algorithms and data structures: Is there a difference?
1.2 Setting goals: Your expectations after reading this book
1.3 Packing your knapsack: Data structures meet the real world
1.3.1 Abstracting the problem away
1.3.2 Looking for solutions
1.3.3 Algorithms to the rescue
1.3.4 Thinking (literally) outside of the box
1.3.5 Happy ending
Summary
Part 1. Improving over basic data structures
2 Improving priority queues: d-way heaps
2.1 Structure of this chapter
2.2 The problem: Handling priority
2.2.1 Priority in practice: Bug tracking
2.3 Solutions at hand: Keeping a sorted list
2.3.1 From sorted lists to priority queues
2.4 Describing the data structure API: Priority queues
2.4.1 Priority queue at work
2.4.2 Priority matters: Generalize FIFO
2.5 Concrete data structures
2.5.1 Comparing performance
2.5.2 What's the right concrete data structure?
2.5.3 Heap
2.5.4 Priority, min-heap, and max-heap
2.5.5 Advanced variant: d-ary heap
2.6 How to implement a heap
2.6.1 BubbleUp
2.6.2 PushDown
2.6.3 Insert
2.6.4 Top
2.6.5 Update
2.6.6 Dealing with duplicates
2.6.7 Heapify
2.6.8 Beyond API methods: Contains
2.6.9 Performance recap
2.6.10 From pseudo-code to implementation
2.7 Use case: Find the k largest elements
2.7.1 The right data structure . . .
2.7.2 . . . and the right use
2.7.3 Coding it up
2.8 More use cases
2.8.1 Minimum distance in graphs: Dijkstra
2.8.2 More graphs: Prim's algorithm
2.8.3 Data compression: Huffman codes
2.9 Analysis of branching factor25
2.9.1 Do we need d-ary heaps?
2.9.2 Running time
2.9.3 Finding the optimal branching factor
2.9.4 Branching factor vs memory
2.10 Performance analysis: Finding the best branching factor
2.10.1 Please welcome profiling
2.10.2 Interpreting results
2.10.3 The mystery with heapify
2.10.4 Choosing the best branching factor
Summary
3 Treaps: Using randomization to balance binary search trees
3.1 Problem: Multi-indexing
3.1.1 The gist of the solution
3.2 Solution: Description and API
3.3 Treap
3.3.1 Rotations
3.3.2 A few design questions
3.3.3 Implementing search
3.3.4 Insert
3.3.5 Delete
3.3.6 Top, peek, and update
3.3.7 Min, max
3.3.8 Performance recap
3.4 Applications: Randomized treaps
3.4.1 Balanced trees
3.4.2 Introducing randomization
3.4.3 Applications of Randomized Treaps
3.5 Performance analysis and profiling
3.5.1 Theory: Expected height
3.5.2 Profiling height
3.5.3 Profiling running time
3.5.4 Profiling memory usage
3.5.5 Conclusions
Summary
4 Bloom filters: Reducing the memory for tracking content
4.1 The dictionary problem: Keeping track of things
4.2 Alternatives to implementing a dictionary
4.3 Describing the data structure API: Associative array
4.4 Concrete data structures
4.4.1 Unsorted array: Fast insertion, slow search
4.4.2 Sorted arrays and binary search: Slow insertion, fast(-ish) search
4.4.3 Hash table: Constant-time on average, unless you need ordering
4.4.4 Binary search tree: Every operation is logarithmic
4.4.5 Bloom filter: As fast as hash tables, but saves memory (with a catch)
4.5 Under the hood: How do Bloom filters work?
4.6 Implementation
4.6.1 Using a Bloom filter
4.6.2 Reading and writing bits
4.6.3 Find where a key is stored
4.6.4 Generating hash functions
4.6.5 Constructor
4.6.6 Checking a key
4.6.7 Storing a key
4.6.8 Estimating accuracy
4.7 Applications
4.7.1 Cache
4.7.2 Routers
4.7.3 Crawler
4.7.4 IO fetcher
4.7.5 Spell checker
4.7.6 Distributed databases and file systems
4.8 Why Bloom filters work21
4.8.1 Why there are no false negatives . . .
4.8.2 . . . But there are false positives
4.8.3 Bloom filters as randomized algorithms
4.9 Performance analysis
4.9.1 Running time
4.9.2 Constructor
4.9.3 Storing an element
4.9.4 Looking up an element
4.10 Estimating Bloom filter precision25
4.10.1 Explanation of the false-positive ratio formula
4.11 Improved variants
4.11.1 Bloomier filter
4.11.2 Combining Bloom filters
4.11.3 Layered Bloom filter
4.11.4 Compressed Bloom filter
4.11.5 Scalable Bloom filter
Summary
5 Disjoint sets: Sub-linear time processing
5.1 The distinct subsets problem
5.2 Reasoning on solutions
5.3 Describing the data structure API: Disjoint set
5.4 Naïve solution
5.4.1 Implementing naïve solution
5.5 Using a tree-like structure
5.5.1 From list to trees
5.5.2 Implementing the tree version
5.6 Heuristics to improve the running time
5.6.1 Path compression
5.6.2 Implementing balancing and path compression
5.7 Applications
5.7.1 Graphs: Connected components
5.7.2 Graphs:15 Kruskal's algorithm for minimum spanning tree
5.7.3 Clustering
5.7.4 Unification
Summary
6 Trie, radix trie: Efficient string search
6.1 Spell-check
6.1.1 A prncess, a Damon, and an elf walkz into a bar
6.1.2 Compression is the key
6.1.3 Description and API
6.2 Trie
6.2.1 Why is it better again?
6.2.2 Search
6.2.3 Insert
6.2.4 Remove
6.2.5 Longest prefix
6.2.6 Keys matching a prefix
6.2.7 When should we use tries?
6.3 Radix tries
6.3.1 Nodes and edges
6.3.2 Search
6.3.3 Insert
6.3.4 Remove
6.3.5 Longest common prefix
6.3.6 Keys starting with a prefix
6.4 Applications
6.4.1 Spell-checker
6.4.2 String similarity
6.4.3 String sorting
6.4.4 T9
6.4.5 Autocomplete
Summary
7 Use case: LRU cache
7.1 Don't compute things twice
7.2 First attempt: Remembering values
7.2.1 Description and API
7.2.2 Fresh data, please
7.2.3 Handling asynchronous calls
7.2.4 Marking cache values as "Loading"
7.3 Memory is not enough (literally)
7.4 Getting rid of stale data: LRU cache
7.4.1 Sometimes you have to double down on problems
7.4.2 Temporal ordering
7.4.3 Performance
7.5 When fresher data is more valuable: LFU
7.5.1 So how do we choose?
7.5.2 What makes LFU different
7.5.3 Performance
7.5.4 Problems with LFU
7.6 How to use cache is just as important
7.7 Introducing synchronization
7.7.1 Solving concurrency (in Java)
7.7.2 Introducing locks
7.7.3 Acquiring a lock
7.7.4 Reentrant locks
7.7.5 Read locks
7.7.6 Other approaches to concurrency
7.8 Cache applications
Summary
Part 2. Multidimensional queries
8 Nearest neighbors search
8.1 The nearest neighbors search problem
8.2 Solutions
8.2.1 First attempts
8.2.2 Sometimes caching is not the answer
8.2.3 Simplifying things to get a hint
8.2.4 Carefully choose a data structure
8.3 Description and API
8.4 Moving to k-dimensional spaces
8.4.1 Unidimensional binary search
8.4.2 Moving to higher dimensions
8.4.3 Modeling 2-D partitions with a data structure
Summary
9 K-d trees: Multidimensional data indexing
9.1 Right where we left off
9.2 Moving to k-D spaces: Cycle through dimensions
9.2.1 Constructing the BST
9.2.2 Invariants
9.2.2 The importance of being balanced
9.3 Methods
9.3.1 Search
9.3.2 Insert
9.3.3 Balanced tree
9.3.4 Remove
9.3.5 Nearest neighbor
9.3.6 Region search
9.3.7 A recap of all methods
9.4 Limits and possible improvements
Summary
10 Similarity Search Trees: Approximate nearest neighbors search for image retrieval
10.1 Right where we left off
10.1.1 A new (more complex) example
10.1.2 Overcoming k-d trees' flaws
10.2 R-tree
10.2.1 A step back: Introducing B-trees
10.2.2 From B-Tree to R-tree
10.2.3 Inserting points in an R-tree
10.2.4 Search
10.3 Similarity search tree
10.3.1 SS-tree search
10.3.2 Insert
10.3.3 Insertion: Variance, means, and projections
10.3.4 Insertion: Split nodes
10.3.5 Delete
10.4 Similarity Search
10.4.1 Nearest neighbor search
10.4.2 Region search
10.4.3 Approximated similarity search
10.5 SS+-tree18
10.5.1 Are SS-trees better?
10.5.2 Mitigating hyper-sphere limitations
10.5.3 Improved split heuristic
10.5.4 Reducing overlap
Summary
11 Applications of nearest neighbor search
11.1 An application: Find nearest hub
11.1.1 Sketching a solution
11.1.2 Trouble in paradise
11.2 Centralized application
11.2.1 Filtering points
11.2.2 Complex decisions
11.3 Moving to a distributed application
11.3.1 Issues handling HTTP communication
11.3.2 Keeping the inventory in sync
11.3.3 Lessons learned
11.4 Other applications
11.4.1 Color reduction
11.4.2 Particle interaction
11.4.3 Multidimensional DB queries optimization
11.4.4 Clustering
Summary
12 Clustering
12.1 Intro to clustering
12.1.1 Types of learning
12.1.2 Types of clustering
12.2 K-means
12.2.1 Issues with k-means
12.2.2 The curse of dimensionality strikes again
12.2.3 K-means performance analysis
12.2.4 Boosting k-means with k-d trees
12.2.5 Final remarks on k-means
12.3 DBSCAN
12.3.1 Directly vs density-reachable
12.3.2 From definitions to an algorithm
12.3.3 And finally, an implementation
12.3.4 Pros and cons of DBSCAN
12.4 OPTICS
12.4.1 Definitions
12.4.2 OPTICS algorithm
12.4.3 From reachability distance to clustering
12.4.4 Hierarchical clustering
12.4.5 Performance analysis and final considerations
12.5 Evaluating clustering results: Evaluation metrics
12.5.1 Interpreting the results
Summary
13 Parallel clustering: MapReduce and canopy clustering
13.1 Parallelization
13.1.1 Parallel vs distributed
13.1.2 Parallelizing k-means
13.1.3 Canopy clustering
13.1.4 Applying canopy clustering
13.2 MapReduce
13.2.1 Imagine you are Donald Duck . . .
13.2.2 First map, then reduce
13.2.3 There is more under the hood
13.3 MapReduce k-means
13.3.1 Parallelizing canopy clustering
13.3.2 Centroid initialization with canopy clustering
13.3.3 MapReduce canopy clustering
13.4 MapReduce DBSCAN
Summary
Part 3. Planar graphs and minimum crossing number
14 An introduction to graphs: Finding paths of minimum distance
14.1 Definitions
14.1.1 Implementing graphs
14.1.2 Graphs as algebraic types
14.1.3 Pseudo-code
14.2 Graph properties
14.2.1 Undirected
14.2.2 Connected
14.2.3 Acyclic
14.3 Graph traversal: BFS and DFS
14.3.1 Optimizing delivery routes
14.3.2 Breadth first search
14.3.3 Reconstructing the path to target
14.3.4 Depth first search
14.3.5 It's queue vs stack again
14.3.6 Best route to deliver a parcel
14.4 Shortest path in weighted graphs: Dijkstra
14.4.1 Differences with BFS
14.4.2 Implementation
14.4.3 Analysis
14.4.4 Shortest route for deliveries
14.5 Beyond Dijkstra's algorithm: A*
14.5.1 How good is A* search?
14.5.2 Heuristics as a way to balance real-time data
Summary
15 Graph embeddings and planarity: Drawing graphs with minimal edge intersections
15.1 Graph embeddings
15.1.1 Some basic definitions
15.1.2 Complete and bipartite graphs
15.2 Planar graphs
15.2.1 Using Kuratowski's theorem in practice
15.2.2 Planarity testing
15.2.3 A naïve algorithm for planarity testing
15.2.4 Improving performance
15.2.5 Efficient algorithms
15.3 Non-planar graphs
15.3.1 Finding the crossing number
15.3.2 Rectilinear crossing number
15.4 Edge intersections
15.4.1 Straight-line segments
15.4.2 Polylines
15.4.3 Bézier curves
15.4.4 Intersections between quadratic Bézier curves
15.4.5 Vertex-vertex and edge-vertex intersections
Summary
16 Gradient descent: Optimization problems (not just) on graphs
16.1 Heuristics for the crossing number
16.1.1 Did you just say heuristics?
16.1.2 Extending to curve-line edges
16.2 How optimization works
16.2.1 Cost functions
16.2.2 Step functions and local minima
16.2.3 Optimizing random sampling
16.3 Gradient descent
16.3.1 The math of gradient descent
16.3.2 Geometrical interpretation
16.3.3 When is gradient descent appliable?
16.3.4 Problems with gradient descent
16.4 Applications of gradient descent
16.4.1 An example: Linear regression
16.5 Gradient descent for graph embedding
16.5.1 A different criterion
16.5.2 Implementation
Summary
17 Simulated annealing: Optimization beyond local minima
17.1 Simulated annealing
17.1.1 Sometimes you need to climb up to get to the bottom
17.1.2 Implementation
17.1.3 Why simulated annealing works
17.1.4 Short-range vs long-range transitions
17.1.5 Variants
17.1.6 Simulated annealing vs gradient descent: Which one should I use?
17.2 Simulated annealing + traveling salesman
17.2.1 Exact vs approximated solutions
17.2.2 Visualizing cost
17.2.3 Pruning the domain
17.2.4 State transitions
17.2.5 Adjacent vs random swaps
17.2.6 Applications of TSP
17.3 Simulated annealing and graph embedding
17.3.1 Minimum edge crossing
17.3.2 Force-directed drawing
Summary
18 Genetic algorithms: Biologically inspired, fast-converging optimization
18.1 Genetic algorithms
18.1.1 Inspired by nature
18.1.2 Chromosomes
18.1.3 Population
18.1.4 Fitness
18.1.5 Natural selection
18.1.6 Selecting individuals for mating
18.1.7 Crossover
18.1.8 Mutations
18.1.9 The genetic algorithm template
18.1.10 When does the genetic algorithm work best?
18.2 TSP
18.2.1 Fitness, chromosomes, and initialization
18.2.2 Mutations
18.2.3 Crossover
18.2.4 Results and parameters tuning
18.2.5 Beyond TSP: Optimizing the routes of the whole fleet
18.3 Minimum vertex cover
18.3.1 Applications of vertex cover
18.3.2 Implementing a genetic algorithm
18.4 Other applications of the genetic algorithm
18.4.1 Maximum flow
18.4.2 Protein folding
18.4.3 Beyond genetic algorithms
18.4.4 Algorithms, beyond this book
Summary
appendix A. A quick guide to pseudo-code
A.1 Variables and basics
A.2 Arrays
A.3 Conditional instructions
A.3.1 Else-if
A.3.2 Switch
A.4 Loops
A.4.1 For loop
A.4.2 While loop
A.4.3 Break and continue
A.5 Blocks and indent
A.6 Functions
A.6.1 Overloading and default arguments
A.6.2 Tuples
A.6.3 Tuples and destructuring objects
appendix B. Big-O notation
B.1 Algorithms and performance
B.2 The RAM model
B.3 Order of magnitude
B.4 Notation
B.5 Examples
appendix C. Core data structures
C.1 Core data structures
C.2 Array
C.3 Linked List
C.4 Tree
C.4.1 Binary search trees
C.5 Hash table
C.5.1 Storing key-value pairs
C.5.2 Hashing
C.5.3 Conflicts resolution in hashing
C.5.4 Performance
C.6 Comparative analysis of core data structures
appendix D. Containers as priority queues
D.1 Bag
D.2 Stack
D.3 Queue
D.4 A comparative analysis of containers
appendix E. Recursion
E.1 Simple recursion
E.1.1 Pitfalls
E.1.2 Good recursion
E.2 Tail recursion
E.3 Mutual recursion
appendix F. Classification problems and randomnized algorithm metrics
F.1 Decision problems
F.2 Las Vegas algorithms
F.3 Monte Carlo algorithms
F.4 Classification metrics
F.4.1 Accuracy
F.4.2 Precision and recall
F.4.3 Other metrics and recap
index
inside back cover

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Advanced Algorithms and Data Structures

Description

More details

Other editions

Additional editions

Person

Content

System requirements