
Similarity Search and Applications
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
This book constitutes the proceedings of the 8th International Conference on Similarity Search and Applications, SISAP 2015, held in Glasgow, UK, in October 2015.
The 19 full papers, 12 short and 9 demo and poster papers presented in this volume were carefully reviewed and selected from 68 submissions. They are organized in topical sections named: improving similarity search methods and techniques; metrics and evaluation; applications and specific domains; implementation and engineering solutions; posters; demo papers.
More details
Other editions
Additional editions

Content
- Intro
- Preface
- Organization
- Keynotes
- Large-Scale Similarity Joins with GuaranteesRasmus PaghIT University of Copenhagen, Copenhagen, Denmark The ability to handle noisy or imprecise data is becoming increasingly important in computing. In the information retrieval community the notion of similarity join has been studied extensively, yet existing solutions have offered weak performance guarantees. Either they are based on deterministic filtering techniques that often, but not always, succeed in reducing computational costs, or they
- Directions for Similarity Search in Television Recommender SystemsBilly WallaceFounding Developer, Think Analytics, Glasgow, UK Recommender systems require similarity search in order to find a movie or tv show that is similar to another. There are interesting constraints however, that differentiate this application from a pure similarity search. Just finding similar content does not give good recommendations, as we are trying to fulfil a business use-case such as up-selling paid-for content or e
- Deep Learning and Similarity SearchBobby JarosYahoo Labs, San Francisco, California, USA Deep Learning has received tremendous attention recently thanks to its impressive results in computer vision, speech, medicine, robotics, and beyond. Although many of the highly visible results have been in a classification setting, a prime motivation for deep learning has been to learn rich feature vectors that are useful across a wide array tasks. One goal of such features --- for example, in perception-or
- Contents
- Improving Similarity Search Methods and Techniques
- Approximate Furthest Neighbor in High Dimensions
- 1 Introduction
- 1.1 Related Work
- 2 Algorithms and Analysis
- 2.1 Provably Good Furthest Neighbor Data Structure
- 2.2 A Lower Bound on the Approximation Factor
- 3 Experiments
- 4 Conclusions and Future Work
- References
- Flexible Aggregate Similarity Search in High-Dimensional Data Sets
- 1 Introduction
- 2 Problem Description
- 3 Related Work
- 3.1 Flexible Aggregate Similarity Search
- 3.2 Multi-step Search
- 3.3 Generalized Expansion Dimension
- 4 The SUM and Avg Variants of FANN
- 4.1 Algorithm
- 4.2 Analysis
- 4.3 Variants
- 5 Experimental Results
- 5.1 Experimental Framework
- 5.2 Comparison with Other Methods
- References
- Similarity Joins and Beyond: An Extended Set of Binary Operators with Order
- 1 Introduction
- 2 Related Work
- 3 Proposal
- 3.1 Similarity Joins with Order: The Theory of Wide-Joins
- 3.2 Single-term Predicates
- 3.3 Negation of Single-term Predicates
- 3.4 Multiple-Term Predicates
- 3.5 Optimizing Wide-Joins Processing
- 4 Experiments
- 5 Conclusion
- References
- Diversity in Similarity Joins
- 1 Introduction
- 2 Related Work
- 3 Diversified Similarity Joins
- 4 Experiments
- 4.1 Performance and Result Size Evaluation
- 4.2 Scalability
- 5 Conclusion
- References
- CDA: Succinct Spaghetti
- 1 Introduction
- 2 Spaghetti
- 3 Time Complexity Analysis
- 3.1 Unsuccessful Search
- 3.2 Successful Search
- 4 CDA, the Succinct Spaghetti
- 4.1 Compact Representation of Permutations
- 4.2 Computing the Intersection
- 4.3 Time Complexity Analysis
- 5 Experimental Results
- 5.1 Index Size
- 5.2 Computing the Candidate Set
- 6 Conclusions and Future Work
- References
- Improving Metric Access Methods with Bucket Files
- 1 Introduction
- 2 Background
- 3 The Bucket-Slim-Tree
- 3.1 The Structure of the Buckets
- 3.2 Building the Bucket-Slim-Tree
- 3.3 Querying the Bucket-Slim-Tree
- 4 Experiments
- 5 Conclusion
- References
- Faster Dual-Tree Traversal for Nearest Neighbor Search
- 1 Introduction
- 2 Trees
- 3 Traversals
- 4 Nearest Neighbor Search
- 5 Delaying Reference Recursion
- 6 Experiments
- 7 Conclusion
- References
- Optimizing the Distance Computation Order of Multi-Feature Similarity Search Indexing
- 1 Introduction
- 1.1 Contribution
- 2 Related Work
- 3 Optimizing the Distance Computation Order
- 3.1 Partial and Aggregated Bounds
- 3.2 Expected Approximation Error
- 3.3 Computation Costs of Distance Functions
- 4 Experimental Evaluation
- 5 Summary and Outlook
- References
- Dynamic Permutation Based Index for Proximity Searching
- 1 Introduction
- 2 Previous and Related Work
- 3 Our Approach
- 3.1 Dynamic Permutants
- 4 Experiments
- 4.1 Synthetic Databases
- 4.2 Optimal Value of B
- 4.3 NASA Images
- 5 Conclusions
- References
- Finding Near Neighbors Through Local Search
- 1 Introduction
- 2 Improving APG
- 3 Experimental Results
- 4 Conclusions
- References
- Metrics and Evaluation
- When Similarity Measures Lie
- 1 Introduction
- 2 Conventional Task Performance
- 3 Direct Performance Assessment
- 3.1 Procedurally Generated Truths
- 3.2 Evaluation Procedure
- 4 Experiment
- 5 Results
- 6 Conclusion
- References
- An Empirical Evaluation of Intrinsic Dimension Estimators
- 1 Introduction
- 2 Intrinsic Dimension Estimators for Vector Spaces
- 3 Intrinsic Dimension Estimators for Metric Spaces
- 3.1 Fractal Based Methods
- 3.2 Distance Exponent
- 3.3 Fastmap
- 3.4 Intrinsic Search Difficulty
- 4 Experimental Results
- 4.1 Synthetic Metric Spaces
- 4.2 Real Metric Spaces
- 5 Conclusions
- References
- A Belief Framework for Similarity Evaluation of Textual or Structured Data
- 1 Framework Definition
- 2 Selection of a Belief Function
- 2.1 Subsets with Special Significance
- 2.2 Longest Common Subsequences as a Belief Function
- 2.3 Linear Gaps Accounting as a Belief Function
- 2.4 Longest Common Substrings as a Belief Function
- 2.5 The Belief Function for All Substrings Accounting
- 2.6 Semi-structured Data and Block Transposition
- 3 Acceptable Parts and Algorithm Performance
- References
- Similarity of Attributed Generalized Tree Structures: A Comparative Study
- 1 Introduction
- 2 Methodology
- 3 WT Versus AGT Algorithm
- 4 GT Versus AGT Algorithm
- 5 GED and MCS Versus AGT Algorithm
- 6 Conclusion
- References
- Evaluating Multilayer Multimedia Exploration
- 1 Introduction
- 2 Multilayer Multimedia Exploration
- 3 User Study
- 3.1 Find the Image Application
- 3.2 Settings
- 3.3 Results
- 3.4 Discussion
- 4 Conclusion
- References
- Semantic Similarity Between Images: A Novel Approach Based on a Complex Network of Free Word Associations
- 1 Introduction
- 2 Graph-Based Similarity
- 3 Complex Networks
- 4 The Model
- 5 Conclusions
- References
- Applications and Specific Domains
- Vector-Based Similarity Measurements for Historical Figures
- 1 Introduction
- 2 Related Work
- 3 Data Collection
- 4 Model Description
- 4.1 TF-IDF Model
- 4.2 Distributed Word Embedding Model
- 4.3 LDA Model
- 4.4 Deepwalk Embedding Model
- 5 Experimental Setup
- 6 Results and Analysis
- 7 Conclusion
- References
- Efficient Approximate 3-Dimensional Point Set Matching Using Root-Mean-Square Deviation Score
- 1 Introduction
- 1.1 Background
- 1.2 Related Work
- 1.3 Research Goal
- 1.4 Main Results of this Paper
- 1.5 Organization of This Paper
- 2 Preliminaries
- 2.1 Basic Definitions
- 2.2 The Minimum RMSD Score for k-point Sets
- 2.3 Approximate Point Subset Matching Problem
- 2.4 A Naive Algorithm for Approximate Point Subset Matching
- 3 A Faster Point set Matching Algorithm with Pruning
- 4 A Fixed-parameter-like Algorithm Using Spatial Constraint
- 4.1 Basic Idea
- 4.2 Probabilistic Analysis
- 5 Experiments
- 5.1 Data and Method
- 5.2 Results
- 6 Conclusion
- References
- Face Image Retrieval Revisited
- 1 Introduction
- 2 Face Detection
- 3 Face Retrieval
- 3.1 Fusion of Multiple Matching Methods
- 3.2 Multi-face Queries and Relevance Feedback
- 3.3 Efficient Query Processing
- 4 Conclusions
- References
- Semiautomatic Learning of 3D Objects from Video Streams
- 1 Introduction
- 2 Related Work
- 3 Object Extraction and Matching
- 3.1 Observations Matching
- 4 Online Object Clustering
- 5 Experiments
- 6 Conclusions
- References
- Banknote Recognition as a CBIR Problem
- 1 Introduction
- 2 Banknotes Retrieval
- 3 Method Assessment
- 4 Discussion
- References
- Efficient Image Search with Neural Net Features
- 1 Introduction: Content-Based Image Retrieval
- 2 Similarity Indexing and Searching
- 3 Efficiency Evaluation
- 3.1 In-memory Indexes
- 3.2 Disk-Oriented Indexes
- 4 Conclusions
- References
- Textual Similarity for Word Sequences
- 1 Introduction
- 2 Semantic Similarity
- 3 Semantic Distance
- 3.1 Euclid Similarity
- 3.2 Levenshtein Similarity
- 3.3 Using Semantic Distances
- 4 Experiments
- 5 Conclusion
- References
- Motion Images: An Effective Representation of Motion Capture Data for Similarity Search
- 1 Introduction and Related Work
- 2 Motion Image: Motion Capture Data as an Image
- 3 Similarity of Motion Images and Its Evaluation
- 4 Conclusions and Future Research Directions
- References
- Implementation and Engineering Solutions
- Brute-Force k-Nearest Neighbors Search on the GPU
- 1 Introduction
- 2 Related Work
- Squared Distance Matrix:
- Selecting Nearest Neighbors:
- 3 Implementation
- 3.1 Computing the Squared Distance Matrix
- 3.2 Selecting Nearest Neighbors
- Overview:
- Merge Path:
- Using MGPU:
- 4 Results
- Evaluation and Analysis:
- Comparisons:
- 5 Conclusions
- References
- Regrouping Metric-Space Search Index for Search Engine Size Adaptation
- 1 Introduction
- 2 Background
- 2.1 Search Engine Distributed Architecture
- 3 Related Work
- 4 Adapting Search Engine Size
- 4.1 Computing H-groups and G-groups
- 5 Experimental Evidence Supporting Hypotheses
- 5.1 The Number of H-groups
- 5.2 Search Performance of TT-S
- 5.3 Comparing TT-A and TT-R
- 6 Conclusions
- References
- Improving Parallel Processing of Matrix-Based Similarity Measures on Modern GPUs
- 1 Introduction
- 2 Related Work
- 3 GPU Fundamentals
- 3.1 GPU Device
- 3.2 Thread Execution
- 3.3 Memory Organization
- 4 Implementation
- 4.1 Parallelogram Blocks
- 4.2 Using Shuffle Instructions
- 4.3 Synchronization via Shared Memory
- 4.4 Blocked Algorithm
- 5 Experiments
- 5.1 Single Distance
- 5.2 Multiple Distances
- 6 Conclusions
- References
- Time Series Subsequence Similarity Search Under Dynamic Time Warping Distance on the Intel Many-core Accelerators
- 1 Introduction
- 2 Formal Definitions and Related Work
- 2.1 Formal Definitions
- 2.2 The Intel Xeon Phi Architecture and Programming Model
- 2.3 Related Work
- 3 Acceleration by the Intel Xeon Phi Coprocessor
- 3.1 Serial Algorithm
- 3.2 Parallel Algorithm
- 3.3 Combining CPU and the Intel Xeon Phi
- 4 Experiments
- 4.1 Performance
- 4.2 Impact of Queue Size
- 4.3 Comparison with Algorithms for GPU and FPGA
- 5 Conclusion
- References
- Subspace Nearest Neighbor Search - Problem Statement, Approaches, and Discussion
- 1 Introduction
- 2 Related Problems
- 3 Definition of Subspace Nearest Neighbor Search
- 4 Discussion and Open Research Questions
- 5 Conclusion
- References
- Query-Based Improvement Procedure and Self-Adaptive Graph Construction Algorithm for Approximate Nearest Neighbor Search
- 1 Introduction
- 2 Greedy Walk Algorithm
- 3 Insertion Algorithm
- 4 Improvement Based on Queries
- 5 Simulations
- 5.1 Improvement Based on Queries
- 5.2 Insertion by Repairing
- 6 Conclusion
- References
- Posters
- Is There a Free Lunch for Image Feature Extraction in Web Applications
- 1 Introduction
- 2 Related Work
- 3 Image Feature Extraction
- 4 Extraction in Web Browser
- 4.1 JavaScript and HTML5 Technologies
- 4.2 Proposed Solution
- 4.3 Security and Reliability Issues
- 5 Experiments
- 6 Conclusions
- References
- On the Use of Similarity Search to Detect Fake Scientific Papers
- 1 Introduction
- 2 Related Work
- 3 Approach
- 3.1 Feature Extractors
- 3.2 Dataset
- 4 Experiments
- 4.1 Retrieving SCIGen Papers
- 4.2 Improving Performance Through Pseudo-Relevance Feedback
- 5 Conclusions
- References
- Reducing Hubness for Kernel Regression
- 1 Introduction
- 2 Multicollinearity in Kernel Regression
- 3 Hubness in Kernel-Induced Spaces
- 3.1 Origin of Hubness: Spatial Centrality
- 4 Reducing Hubness for Kernel Regression
- 5 Experiment
- 6 Conclusion
- Demo Papers
- FELICITY: A Flexible Video Similarity Search Framework Using the Earth Mover's Distance
- 1 Introduction
- 2 System Overview
- 3 A Demonstration Scenario and GUI
- 4 Conclusion
- References
- Searching the EAGLE Epigraphic Material Through Image Recognition via a Mobile Device
- 1 The EAGLE Project
- 2 The Flagship Mobile Application
- 2.1 Image Feature Extractor
- 2.2 Indexer and Support of Similarity Search and Exact Match Modes
- 3 Results
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.