
Similarity Search and Applications
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
The 23 full papers presented were carefully reviewed and selected from 53 submissions. The papers deal with issues surrounding the theory, design, analysis, practice, and application of content-based and feature-based similarity search. They are organized in the following topical sections: approximate similarity search; improving similarity search methods and applications; distances for complex objects; outlier detection; indexing and applications; and applications and specific domains.
The paper 'A New Perspective on the Tree Edit Distance' is published open access under a CC BY 4.0 license at link.springer.com.
More details
Other editions
Additional editions

Content
- Intro
- Preface
- Organization
- Contents
- Approximate Similarity Search
- The Power of Distance Distributions: Cost Models and Scheduling Policies for Quality-Controlled Similarity Queries
- 1 Introduction
- 1.1 Preliminaries
- 2 Performance of Schedules
- 2.1 Understanding the Behavior of Scheduling Policies
- 3 A Basic Cost Model for Quality-Controlled Queries
- 4 The Query-Sensitive Cost Model
- 4.1 Storing the NN Distance Distributions
- 5 Optimal Schedules
- 6 Conclusions
- A Guaranteeing Quality of Results
- References
- Cache and Priority Queue Based Approximation Technique for a Stream of Similarity Search Queries
- 1 Introduction
- 2 Related Work
- 3 Problem Definition
- 4 Approximation Technique
- 5 Score Function
- 5.1 Knapsack Problem Analysis
- 5.2 Minimal Density
- 6 Experiments
- 6.1 Setup
- 6.2 Answer Probability
- 6.3 Position Limits
- 6.4 Cache Size
- 6.5 Optimal Position Limits
- 6.6 Adaptive Minimal Density Limit
- 7 Conclusion
- References
- ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms
- 1 Introduction
- 2 Problem Definition and Quality Measures
- 2.1 Quality Measures
- 2.2 Performance Measures
- 3 System Design
- 3.1 Installing Algorithms and Datasets
- 3.2 Loading Datasets and Computing Ground Truth
- 3.3 Creating Algorithm Instances
- 3.4 The Experiment Loop
- 3.5 Results and Metrics
- 3.6 Frontend
- 4 Evaluation
- 4.1 Objectives of the Experiments
- 4.2 Discussion
- 5 Conclusion and Further Work
- References
- Improving Similarity Search Methods and Applications
- Sketches with Unbalanced Bits for Similarity Search
- 1 Introduction
- 2 Background
- 2.1 Observations on GHP Sketches
- 2.2 Related Work
- 3 Analysis of Bits Balanced to b
- 4 Evaluation
- 4.1 Searching for Negatively Correlated Bits
- 4.2 Quality of Sketches
- 4.3 Indexability of Sketches
- 5 Conclusions
- References
- Local Intrinsic Dimensionality I: An Extreme-Value-Theoretic Foundation for Similarity Applications
- 1 Introduction
- 1.1 Characterizations of Intrinsic Dimensionality
- 1.2 Local Intrinsic Dimensionality and Extreme Value Theory
- 1.3 Contributions
- 2 Background and Preliminaries
- 2.1 Expansion Dimension
- 2.2 Intrinsic Dimensionality of Distance Distributions
- 3 ID-Based Representation of Smooth Functions
- 4 Second-Order Local ID
- 4.1 Second-Order ID Representation
- 4.2 Inlierness, Outlierness and LID
- 5 Local ID and Extreme Value Theory
- 5.1 First-Order EVT
- 5.2 Second-Order EVT
- 6 Conclusion
- References
- Local Intrinsic Dimensionality II: Multivariate Analysis and Distributional Support
- 1 Introduction
- 1.1 Discriminability and Dimensionality
- 1.2 Contributions
- 2 Intrinsic Dimensionality and Indiscriminability
- 2.1 Multivariate Formulation of Local ID
- 2.2 Equivalence of Indiscriminability and Intrinsic Dimensionality
- 2.3 Transformations of Variable
- 3 Support-Weighted Local Intrinsic Dimensionality
- 4 Composition Rules for Local ID
- 5 Convolution and Support-Weighted ID
- 6 Local ID and the Correlation Dimension
- 7 Conclusion
- References
- High-Dimensional Simplexes for Supermetric Search
- 1 Introduction
- 2 Related Work
- 3 Upper and Lower Bounds from Simplexes
- 4 Constructing Simplexes from Edge Lengths
- 4.1 Simplex Construction
- 4.2 Bounds
- 5 Measuring Distortion
- 6 Exact Search: Indexing with n-Simplex
- 6.1 Experiment - SISAP colors
- 7 Conclusions and Further Work
- References
- Improving k-NN Graph Accuracy Using Local Intrinsic Dimensionality
- 1 Introduction
- 2 Related Work
- 2.1 Supervised Feature Selection and k-NN Graph Construction
- 2.2 Unsupervised Feature Selection
- 3 Overview of NNF-Descent
- 3.1 Local Laplacian Score, Feature Ranking, and Sparsification
- 3.2 NNF-Descent
- 4 Improving NN-Descent Graph with Weighted ID
- 4.1 Support-Weighted Local Intrinsic Dimensionality
- 4.2 Defining Support-weighted ID (wID) for each Feature
- 4.3 NNWID-Descent
- 5 Experiments
- 5.1 Datasets
- 5.2 Competing Methods
- 5.3 Performance Measure
- 5.4 Default Parameters
- 5.5 Effects of Varying the Sparsification Rate Z
- 5.6 Effects of Varying the Neighbor List Size K
- 6 Conclusion and Future Work
- References
- Distances for Complex Objects
- Dynamic Time Warping and the (Windowed) Dog-Keeper Distance
- 1 Introduction
- 2 Preliminaries and Concepts
- 3 Windowed Dog-Keeper Distance
- 4 Experimental Evaluation
- 5 Conclusion and Future Work
- References
- Fast Similarity Search with the Earth Mover's Distance via Feasible Initialization and Pruning
- 1 Introduction
- 2 Preliminaries
- 2.1 Earth Mover's Distance (EMD)
- 2.2 Filter-and-refine Framework
- 3 Related Work
- 4 Speeding up Refinement Phase
- 4.1 Feasible Initialization Technique (INIT)
- 4.2 Early Pruning (EP)
- 5 Experimental Evaluation
- 5.1 Experimental Setup
- 5.2 Experimental Results
- 6 Conclusion
- References
- A New Perspective on the Tree Edit Distance
- 1 Introduction
- 2 Chen's Algorithm and Zhang Decompositions
- 2.1 Representing Relevant Subproblems
- 2.2 Comparing Recursions
- 2.3 Comparing Relevant Subproblems
- 3 Revisiting the Runtime Complexity
- 4 Reducing the Memory Complexity
- 5 Experimental Evaluation
- 6 Conclusion
- References
- Outlier Detection
- Good and Bad Neighborhood Approximations for Outlier Detection Ensembles
- 1 Introduction
- 2 Related Work
- 3 Outlier Detection Ensembles Based on Approximate Neighborhoods
- 4 Experiments
- 5 Conclusion
- References
- Intrinsic t-Stochastic Neighbor Embedding for Visualization and Outlier Detection
- 1 Introduction
- 2 Related Work
- 2.1 The Curse of Dimensionality
- 2.2 Intrinsic Dimensionality
- 2.3 Outlier Detection
- 2.4 Stochastic Neighbor Embedding
- 3 Intrinsic Stochastic Neighbors
- 3.1 Distance Power Transform for the Curse of Intrinsic Dimensionality
- 3.2 Consensus Affinity Combination
- 3.3 Intrinsic Stochastic Outlier Selection
- 4 Experiments
- 4.1 ISOS Outlier Detection
- 4.2 it-SNE Visualization
- 5 Conclusions
- References
- Indexing and Applications
- Scalable Similarity Search for Molecular Descriptors
- 1 Introduction
- 2 Similarity Search Problem for Molecular Descriptors
- 3 Method
- 3.1 Database Partitioning
- 3.2 Intervals-Splitting Tree for Efficient Similarity Search
- 3.3 Pruning the Search Space Using Summary Descriptors
- 3.4 Search Time and Memory
- 3.5 Space Reduction Using Inverted Index
- 3.6 Rank Dictionaries and RMQ Data Structures
- 3.7 Similarity Search Using Rank Dictionaries and RMQs
- 4 Experiments
- 4.1 Setup
- 4.2 Results
- 5 Conclusion
- References
- Self-indexed Motion Planning
- 1 Introduction
- 2 Related Work
- 3 Background
- 4 Approximate Proximity Graph
- 5 Our Work
- 6 Experiments
- 7 Conclusions and Future Work
- References
- Practical Space-Efficient Data Structures for High-Dimensional Orthogonal Range Searching
- 1 Introduction
- 1.1 Orthogonal Range Searching
- 1.2 Existing Work
- 1.3 Our Contribution
- 2 Wavelet Trees
- 2.1 Construction
- 2.2 Range Search Algorithm
- 3 Proposed Scheme
- 3.1 Index Construction
- 3.2 Range Search Algorithm
- 3.3 Extension from [n]d Space to [U]d Space
- 3.4 Complexity Analyses
- 4 Experimental Evaluation
- 4.1 Machines and Compilers
- 4.2 Space Usage
- 4.3 Query Time
- 5 Conclusion
- References
- Semantic Similarity Group By Operators for Metric Data
- 1 Introduction
- 2 Background
- 3 Related Works
- 4 Proposed Methods
- 4.1 The SGB-Vote Operator
- 4.2 Nearest Neighbors-Based SGB: The KNN-SGB Operators
- 4.3 Algorithms
- 5 Evaluation
- 5.1 The Datasets
- 5.2 Results
- 6 Conclusions
- References
- Succinct Quadtrees for Road Data
- 1 Introduction
- 1.1 Related Work
- 1.2 Our Contribution
- 2 Preliminaries
- 2.1 Data Structures for Nearest Queries
- 2.2 Succinct Data Structures
- 3 Succinct Quadtrees
- 3.1 Storing Points
- 3.2 Storing Polygonal Chains
- 4 Digital Road Map
- 5 Experimental Results
- 5.1 The Effect of Leaf Cache
- 5.2 Comparison with ANN
- 6 Concluding Remarks
- References
- Applications and Specific Domains
- On Competitiveness of Nearest-Neighbor-Based Music Classification: A Methodological Critique
- 1 Introduction
- 1.1 Trends in High-Dimensional Indexing
- 1.2 Critique of Methodology
- 2 Music Classification: A (Very Brief) Literature Review
- 3 Case Study: Genre Classification
- 3.1 Experimental Setup
- 3.2 Exact Classification: Impact of Feature Parameters
- 3.3 Approximate Classification: Impact of eCP Parameters
- 3.4 Discussion
- 4 Conclusions
- References
- DS-Prox: Dataset Proximity Mining for Governing the Data Lake
- 1 Introduction
- 2 Problem Statement
- 3 Related Work
- 4 The DS-Prox Approach
- 4.1 The Meta-Features Distance Measures
- 4.2 The Approach
- 5 Experimental Evaluation
- 5.1 Datasets
- 5.2 Experimental Setup
- 5.3 Results
- 5.4 Discussion
- 6 Conclusion and Future Work
- References
- DeepBrowse: Similarity-Based Browsing Through Large Lists (Extended Abstract)
- 1 Introduction
- 2 Related Work
- 3 The DeepBrowse Paradigm
- 3.1 Formulation
- 3.2 Implementation
- 3.3 Search Complexity Analysis
- 4 Constructing the Permutations
- 4.1 Ranking Construction
- 4.2 Similarity Permutation Construction
- 4.3 k-robust TSPs
- 4.4 Heuristic Optimization for k-robust TSP
- 4.5 k-robust TSP: Experimental Results
- 5 User Study
- 6 Conclusion
- References
- Malware Discovery Using Behaviour-Based Exploration of Network Traffic
- 1 Introduction
- 2 Retrieval Model
- 2.1 Communication Fingerprints
- 2.2 Descriptors on Fingerprints
- 2.3 Distance on Descriptors
- 3 Exploration Framework (demo)
- 3.1 Browsing Interface
- 3.2 Similarity Analysis
- 4 Conclusion and Future Work
- References
- Visual Analytics and Similarity Search: Concepts and Challenges for Effective Retrieval Considering Users, Tasks, and Data
- 1 Introduction
- 2 Foundations
- 3 Research Aspects
- 4 Methodology
- 5 Conclusion
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.