Computer Vision - ECCV 2020

Name: Computer Vision - ECCV 2020 | 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXII
Brand: Springer
Price: 96.29 EUR
Availability: OnlineOnly

16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXII

Andrea Vedaldi Horst Bischof Thomas Brox Jan-Michael Frahm(Editor)

Springer (Publisher)

Published on 16. November 2020

XLII, 785 pages

E-Book

PDF with digital watermarking

System requirements

978-3-030-58542-6 (ISBN)

€96.29incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Intro
Foreword
Preface
Organization
Contents - Part XXII
Object Tracking Using Spatio-Temporal Networks for Future Prediction Location
1 Introduction
2 Related Works
3 Proposed Method
3.1 Tracker Module
3.2 Background Motion
3.3 Trajectory Prediction
4 Implementation Details
5 Experiment Results
5.1 Comparison with the State-of-the-Art
5.2 Attributes Analysis
5.3 Ablation Studies
6 Conclusions
References
Pillar-Based Object Detection for Autonomous Driving
1 Introduction
2 Related Work
3 Method
3.1 Preliminaries
3.2 Overall Architecture
3.3 Cylindrical View
3.4 Pillar-Based Prediction
3.5 Bilinear Interpolation
3.6 Loss Function
4 Experiments
4.1 Results Compared to State-of-the-Art
4.2 Comparing Anchor-Based, Point-Based, and Pillar-Based Prediction
4.3 View Combinations
4.4 Bilinear Interpolation or Nearest Neighbor Interpolation?
5 Discussion
References
Sparse Adversarial Attack via Perturbation Factorization
1 Introduction
2 Related Work
3 Sparse Adversarial Attack
3.1 Preliminaries of Adversarial Attack
3.2 Sparse Adversarial Attack via Perturbation Factorization
3.3 Continuous Optimization for the MIP Problem
3.4 Two Extensions of SAPF
4 Experiments
4.1 Experimental Settings
4.2 Experimental Comparisons Between SAPF and Other Methods
4.3 Results of Group-Wise Sparsity and Visual Imperceptibility
4.4 Supplementary Materials
5 Conclusion
References
3D Scene Reconstruction from a Single Viewport
1 Introduction
2 Related Work
3 Problem Description and General Approach
4 Generating Synthetic 3D Training Data
4.1 Viewport Alignment
4.2 Fast Generation of TSDF Voxel Data
4.3 Spatial Compression
5 Proposed Network Architecture
5.1 Tree Network
5.2 Multipath
5.3 General Architecture
6 Loss Shaping
6.1 Output Loss Shaping
6.2 Tree Loss Shaping
7 Experiments
7.1 Test Setup
7.2 Qualitative Results
7.3 Quantitative Results
8 Conclusion
References
Learning to Optimize Domain Specific Normalization for Domain Generalization
1 Introduction
2 Related Work
2.1 Domain Generalization
2.2 Multi-source Domain Adaptation
2.3 Normalization in Neural Networks
3 Domain-Specific Optimized Normalization for Domain Generalization
3.1 Overview
3.2 Instance Normalization for Domain Generalization
3.3 Optimization for Domain-Specific Normalization
3.4 Inference
4 Experiments
4.1 Experimental Settings
4.2 Comparison with Other Methods
4.3 Ablation Study
4.4 Additional Experiments
4.5 Analysis
5 Conclusion
References
Self-supervised Outdoor Scene Relighting
1 Introduction
2 Related Work
3 Overview
4 Inverse Rendering
5 Neural Rendering
5.1 Losses
5.2 Shadow Prediction Network
5.3 Sky GAN
5.4 Training
6 Results
6.1 Outdoor Relighting Bechmarking Dataset
6.2 Qualitative Evaluation
6.3 Quantitative Evaluation
7 Discussion
8 Conclusion
References
Privacy Preserving Visual SLAM
1 Introduction
2 Related Works
2.1 Visual SLAM
2.2 Map Representation with Line Cloud
2.3 Bundle Adjustment for Map Optimization
3 Proposed Method
3.1 System Overview
3.2 Relocalization and Loop Detection with a Line Cloud
3.3 2D-3D Matching with 3D Lines and Points
3.4 Bundle Adjustments with a Line Cloud
4 Experiments
4.1 Experimental Setting
4.2 Implementation Details
4.3 Dataset and Prebulit Map Creation
4.4 Quantitative Evaluation
4.5 Qualitative Evaluation
5 Conclusions
References
Leveraging Acoustic Images for Effective Self-supervised Audio Representation Learning
1 Introduction
2 Related Works
3 ACIVW: ACoustic Images and Videos in the Wild
4 The Method
4.1 Input Data
4.2 Single Data Stream Models
4.3 Pretext Task
4.4 Knowledge Distillation
5 Experiments
5.1 Cross-Modal Retrieval
5.2 Classification
6 Conclusions
References
Learning Joint Visual Semantic Matching Embeddings for Language-Guided Retrieval
1 Introduction
2 Related Work
3 Proposed Approach
3.1 Visual Semantic Embedding
3.2 Image-Text Compositional Embedding
4 Experiments
4.1 Implement Details
4.2 Fashion-200k
4.3 UT-Zap50K
4.4 Fashion-iq
4.5 Ablation Study and Other Tasks
5 Conclusion
References
Globally Optimal and Efficient Vanishing Point Estimation in Atlanta World
1 Introduction
2 Related Work
3 Algorithm Overview
4 Simplified Case Without Perturbation
4.1 Defining Dominant Plane and Candidate Region
4.2 Mining Candidate Interval
4.3 Stabbing Candidate Intervals by Probes
5 Practical Case with Perturbation
5.1 Bounds of Number of Inliers
5.2 Collaboration Between BnB and MnS
6 Experiments
6.1 Synthetic Dataset
6.2 Real-World Dataset
7 Conclusions
References
StyleGAN2 Distillation for Feed-Forward Image Manipulation
1 Introduction
2 Related Work
3 Method Overview
3.1 Data Collection
3.2 Training Process
4 Experiments
4.1 Evaluation Protocol
4.2 Distillation of Image-to-image Translation
4.3 Distillation of Style Mixing
5 Conclusions
References
Self-Prediction for Joint Instance and Semantic Segmentation of Point Clouds
1 Introduction
2 Related Work
3 Methodology
3.1 Self-Prediction
3.2 Associated Learning Framework
3.3 Optimization Objectives
4 Experiments
4.1 Experiment Settings
4.2 Segmentation Results on S3DIS
4.3 Segmentation Results on ShapeNet
4.4 Ablation Study
5 Conclusion
References
Learning Disentangled Representations via Mutual Information Estimation
1 Introduction
2 Related Work
3 Mutual Information
4 Method
4.1 Shared Representation Learning
4.2 Exclusive Representation Learning
4.3 Implementation Details
5 Experiments
5.1 Datasets
5.2 Representation Disentanglement Evaluation
5.3 Analysis of the Objective Function
5.4 Satellite Applications
6 Conclusions
References
Challenge-Aware RGBT Tracking
1 Introduction
2 Related Work
2.1 RGBT Tracking Methods
2.2 Multi-task Learning
3 Challenge-Aware RGBT Tracker
3.1 Challenge-Aware Neural Network
3.2 Training Algorithm
3.3 Online Tracking
4 Performance Evaluation
4.1 Experimental Setting
4.2 Quantitative Comparison
4.3 In-Depth Analysis of the Proposed CAT
5 Conclusion
References
Fully Trainable and Interpretable Non-local Sparse Models for Image Restoration
1 Introduction
2 Preliminaries and Related Work
3 Proposed Approach
3.1 Trainable Sparse Coding (without Self-similarities)
3.2 Differentiable Relaxation for Non-local Sparse Priors
3.3 Similarity Metrics
3.4 Extension to Blind Denoising and Parameter Sharing
3.5 Extension to Demosaicking
3.6 Practical Variants and Implementation
4 Experiments
5 Conclusion
References
AutoSimulate: (Quickly) Learning Synthetic Data Generation
1 Introduction
2 Related Work
3 Problem Formulation
4 AutoSimulate
4.1 Stochastic Simulator (Data Generating Distribution)
4.2 Efficient Numerical Computation
5 Experiments
5.1 CLEVR Blender
5.2 Photorealistic Renderer Arnold
5.3 Additional Studies
6 Conclusion
References
LatticeNet: Towards Lightweight Image Super-Resolution with Lattice Block
1 Introduction
2 Related Work
2.1 Deep SR Models
2.2 Lightweight SR Models
2.3 Attention Mechanism
3 Proposed Method
3.1 From Lattice Filter to Lattice Block
3.2 Network Architecture
3.3 Backward Fusion Module
3.4 Loss Function
3.5 Discussions
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 The Contribution of Lattice Block for Lightweight
4.4 Ablation Analysis
4.5 Comparisons with the State-of-the-arts
5 Conclusions
References
Learning from Scale-Invariant Examples for Domain Adaptation in Semantic Segmentation
1 Introduction
2 Related Works
3 Methodology
3.1 Preliminaries
3.2 Class-Based Sorting for Target Subset Selection
3.3 Dynamic Entropy Threshold for Class Dependent Filter Selection
3.4 Self-generated Scale-Invariant Examples
3.5 Leveraging Focal Loss for Class-Imbalance
3.6 Adaptation
4 Experiments and Results
4.1 Experimental Details
4.2 Comparisons with State-of-the-art Methods
4.3 Analysis
5 Conclusion
References
Active Visual Information Gatheringpg for Vision-Language Navigation
1 Introduction
2 Related Work
3 Methodology
3.1 A Naïve Model with a Simple Exploration Ability
3.2 Where to Explore
3.3 Deeper Exploration
3.4 Training
4 Experiment
4.1 Experimental Setup
4.2 Comparison Results
4.3 Diagnostic Experiments
5 Conclusion
References
Deep Hough-Transform Line Priors
1 Introduction
2 Related Work
3 Hough Transform Block for Global Line Priors
3.1 HT: From Image Domain to Hough Domain
3.2 IHT: From Hough Domain to Image Domain
3.3 Convolution in Hough Transform Space
4 Experiments
4.1 Exp 1: Local and Global Information for Line Detection.
4.2 Exp 2: The Effect of Convolution in the Hough Domain
4.3 Exp 3: HT-IHT Block for Line Segment Detection
5 Conclusion
References
Unsupervised Shape and Pose Disentanglement for 3D Meshes
1 Introduction
2 Related Work
3 Method
3.1 Overview
3.2 Cross-Consistency
3.3 Self-consistency
3.4 Loss Terms and Objective Function
3.5 Implementation Details
4 Experiments
4.1 Datasets
4.2 Quantitative Evaluation
4.3 Qualitative Evaluation
5 Conclusion and Future Work
References
CLAWS: Clustering Assisted Weakly Supervised Learning with Normalcy Suppression for Anomalous Event Detection
1 Introduction
2 Related Work
3 Proposed Architecture
3.1 Training Data Organization
3.2 Backbone Network
3.3 Normalcy Suppression
3.4 Clustering Loss Module
3.5 Training Losses of the Proposed Algorithm
4 Experiments
4.1 Datasets
4.2 Evaluation Metric
4.3 Experimental Settings
4.4 Experiments on UCF-Crime Dataset
4.5 Experiments on ShanghaiTech
4.6 Ablation
4.7 Qualitative Analysis
5 Conclusions
References
Inclusive GAN: Improving Data and Minority Coverage in Generative Models
1 Introduction
2 Related Work
3 Inclusive GAN for Data and Minority Coverage
3.1 Adversarial Generation: GANs
3.2 Reconstructive Generation: IMLE
3.3 Harmonizing Adversarial and Reconstructive Generation: IMLE-GAN
3.4 Minority Coverage in IMLE-GAN
4 Experiments
4.1 Setup
4.2 Preliminary Study on Stacked MNIST
4.3 Empirical Study on Data and Model Biases
4.4 Comparisons on CelebA
4.5 Extension to Minority Inclusion
5 Conclusion
References
SESAME: Semantic Editing of Scenes by Adding, Manipulating or Erasing Objects
1 Introduction
2 Related Work
3 SESAME
4 Experiments
5 Conclusion
References
Dive Deeper into Box for Object Detection
1 Introduction
2 Related Work
3 Our Approach
3.1 Box Decomposition and Recombination
3.2 Semantic Consistency Module
4 Experiments
4.1 Experimental Setting
4.2 Overall Performance
4.3 Ablation Study
5 Conclusion
References
PG-Net: Pixel to Global Matching Network for Visual Tracking
1 Introduction
2 Related Works
3 Pixel to Global Matching Network
3.1 Overview
3.2 Pixel to Global Matching Module
3.3 Shared Correlation Architecture
3.4 Multiple Losses Mechanism
4 Experiment
4.1 Implementation Details
4.2 Ablation Experiments
4.3 Evaluation on VOT2018
4.4 Evaluation on VOT2018-LT
4.5 Evaluation on LaSOT Dataset
4.6 Evaluation on OTB2015
5 Conclusion
References
Why Are Deep Representations Good Perceptual Quality Features?
1 Introduction
2 Deep CNN Representations as Perceptual Quality Features
3 Problem Formulation
4 Perceptual Efficacy of Deep Features
4.1 Inputs
4.2 Measurement of the Spatial Frequency Sensitivity
4.3 Measurement of the Orientation Selectivity
4.4 Perceptual Efficacy (PE)
5 Experiments
5.1 Quality Assessment (QA) Tests
5.2 Just Noticeable Difference (JND) Test
5.3 2AFC Similarity Tests
5.4 Visual Evaluation of the Features
5.5 Super-Resolution
5.6 Discussion
6 Comparison with LPIPS and Other Metrics
7 Conclusion
References
Geometric Estimation via Robust Subspace Recovery
1 Introduction
2 Related Work
3 Method
3.1 Preliminaries on DLT
3.2 Robust Generalization
3.3 Extended Exploration of Linear Structure
3.4 Implementation Details
4 Experimental Results
4.1 Qualitative Analysis of Linear Embedding
4.2 Fundamental and Homography Estimation
4.3 Sensitivity to Outlier Rate
5 Conclusion
References
Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification
1 Introduction
1.1 Contributions
2 Related Work
3 Method
3.1 Preliminaries: f-VAEGAN
3.2 Overall Architecture
3.3 Semantic Embedding Decoder
3.4 Feedback Module
3.5 (Generalized) Zero-Shot Classification
4 Experiments
4.1 State-of-the-Art Comparison
4.2 Ablation Study
5 (Generalized) Zero-Shot Action Recognition
6 Conclusion
References
Human Correspondence Consensus for 3D Object Semantic Understanding
1 Introduction
2 Related Work
3 CorresPondenceNet
3.1 Dataset Collections
3.2 Annotation Process
3.3 Annotation Type
4 Learning Dense Semantic Embeddings
4.1 Problem Statement
4.2 Method Details
4.3 Mean Geodesic Error
4.4 Experiments
5 Other Applications
5.1 Cross-Object Registration
5.2 Partial Object Matching
6 Conclusion
References
Learning Memory Augmented Cascading Network for Compressed Sensing of Images
1 Introduction
2 Related Work
3 Memory Augmented Cascading Reconstruction
3.1 Network Architecture
3.2 Single Cascading Stage
3.3 Contextual Memory Augmentation
3.4 Network Loss and Learning
4 Experimental Results and Analysis
4.1 Ablation Studies
4.2 Results on Natural Images
4.3 Compressive MRI Reconstruction
5 Conclusions
References
Least Squares Surface Reconstruction on Arbitrary Domains
1 Introduction
1.1 Related Work
2 Linear Least Squares Height-from-Normals
2.1 Linear Equations
2.2 Discrete Formulation
3 Numerical Differentiation Kernels
3.1 2D Savitzky-Golay Filters
3.2 K-Nearest Pixels Kernel
3.3 3D K-Nearest Neighbours Kernel
4 Implementation
5 Evaluation
6 Conclusions
References
Task-Conditioned Domain Adaptation for Pedestrian Detection in Thermal Imagery
1 Introduction
2 Related Work
2.1 Pedestrian Detection in the Visible Spectrum
2.2 Multispectral Pedestrian Detection Approaches
2.3 Pedestrian Detection in Thermal Imagery
2.4 Task-Conditioned Networks
3 Task-Conditioned Domain Adaptation
3.1 Auxiliary Classification Network
3.2 Conditioning Layers
3.3 Conditioned Network Architectures
3.4 Adaptation Loss
4 Experimental Results
4.1 Dataset and Evaluation Metrics
4.2 Implementation and Training
4.3 Ablation Studies
4.4 Comparison with the State-of-the-Art
5 Conclusions
References
Improving the Transferability of Adversarial Examples with Resized-Diverse-Inputs, Diversity-Ensemble and Region Fitting
1 Introduction
2 Related Work
3 Methodology
3.1 Gradient-Based Attack Methods
3.2 Observation Analyses
3.3 Resized-Diverse-Inputs Method
3.4 Diversity-Ensemble Method
3.5 Region Fitting
4 Experiments
4.1 Experimental Settings
4.2 The Internal Relationship
4.3 Single-Model Attacks
4.4 Ensemble-Based Attacks
5 Conclusion
References
Differentiable Automatic Data Augmentation
1 Introduction
2 Related Work
3 Differentiable Automatic Data Augmentation (DADA)
3.1 Search Space
3.2 Policy Sampling from a Joint Distribution
3.3 Differentiable Relaxation with Gumbel-Softmax
3.4 RELAX Gradient Estimator
3.5 Bi-level Optimization
4 Experiments
4.1 Settings
4.2 Results
4.3 DADA for Object Detection
4.4 Further Analysis
5 Conclusion
References
SceneCAD: Predicting Object Alignments and Layouts in RGB-D Scans
1 Introduction
2 Related Work
3 SceneCAD: Joint Object Alignment and Layout Estimation
3.1 Layout Prediction
3.2 CAD Model Alignment
3.3 Learning Object and Layout Relationships
4 Object+Layout Dataset
4.1 Extraction of Scene Layouts
4.2 Extraction of Object and Layout Relationships
4.3 Synthetic Data
5 Results
5.1 CAD Alignment Performance
5.2 Layout Prediction
6 Limitations
7 Conclusion
References
Kinship Identification Through Joint Learning Using Kinship Verification Ensembles
1 Introduction
2 Related Work
3 Kinship Identification Through Joint Learning with Kinship Verification
3.1 Definition of Kinship Verification, Kinship Identification and Kinship Classification
3.2 Relationship Between Kinship Verification and Kinship Identification and the Limitation of Existing Methods
4 Joint Learning of Kinship Identification and Kinship Verification
4.1 Architecture of the Proposed Joint Learning Network (JLNet)
4.2 Comparative Methods
5 Experiments
5.1 Unbias Dataset for Training and Testing
5.2 Experimental Design
5.3 Results and Evaluation
6 Conclusion
References
Kernelized Memory Network for Video Object Segmentation
1 Introduction
2 Related Work
3 Kernelized Memory Network
3.1 Architecture
3.2 Kernelized Memory Read
4 Pre-training by Hide-and-Seek
5 Experiments
5.1 Training Details
5.2 Inference Details
5.3 DAVIS 2016 and 2017
5.4 Youtube-VOS 2018
5.5 Qualitative Results
5.6 Analysis
6 Conclusion
References
A Single Stream Network for Robust and Real-Time RGB-D Salient Object Detection
1 Introduction
2 Related Work
3 Proposed Method
3.1 Single Stream Encoder Network
3.2 Depth-Enhanced Dual Attention Module
3.3 Pyramidally Attended Feature Extraction
4 Experiments
4.1 Dataset
4.2 Evaluation Metrics
4.3 Implementation Details
4.4 Comparison with State-of-the-Art Results
4.5 Ablation Studies
5 Conclusions
References
Splitting Vs. Merging: Mining Object Regions with Discrepancy and Intersection Loss for Weakly Supervised Semantic Segmentation
1 Introduction
2 Related Works
2.1 Weakly Supervised Semantic Segmentation
2.2 Region Mining
2.3 Co-training
3 Approach
3.1 Revisiting CAM
3.2 Splitting vs. Merging
3.3 Mask Generation
4 Experiments
4.1 Datasets and Implementation Details
4.2 Ablation Study
4.3 Segmentation Results
5 Conclusion
References
Temporal Keypoint Matching and Refinement Network for Pose Estimation and Tracking
1 Introduction
2 Related Work
2.1 Single-Image Pose Estimation
2.2 Multi-person Pose Tracking
2.3 Pose Estimation in Videos
3 Proposed Approach
3.1 Single-Frame Pose Estimation
3.2 Pose Tracking with Temporal Keypoint Matching
3.3 Pose Refinement with Temporal Context
3.4 Training
4 Experiments
4.1 Datasets and Evaluation
4.2 Implementation Details
4.3 Results on PoseTrack 2017
4.4 Results on PoseTrack 2018
5 Conclusion
References
Neural Point-Based Graphics
1 Introduction
2 Related Work
3 Methods
3.1 Rendering
3.2 Model Creation
4 Experiments
5 Discussion
References
FHDe2Net: Full High Definition Demoireing Network
1 Introduction
2 Related Work
3 Full High Definition Moiré Image Dataset
4 Methodology
4.1 Cascaded Global to Local Moiré Pattern Removal
4.2 Frequency Based High-Resolution Content Separation
4.3 Layer Fusion and Refinement
4.4 Training Loss and Implementation Details
5 Experiments
5.1 Quantitative Evaluation
5.2 Qualitative Evaluation
5.3 Ablation Study
6 Conclusion
References
Learning Structural Similarity of User Interface Layouts Using Graph Networks
1 Introduction
2 Related Work
3 Method
3.1 Graph Representation
3.2 GCN-CNN Encoder-Decoder
3.3 Metric Learning via Triplet Training
4 Experiments and Discussion
4.1 Datasets
4.2 Evaluation Metrics
4.3 Baseline Comparisons
4.4 Ablation Studies
4.5 Cumulative Ablation Study
4.6 Searching Auto-parsed Layouts
5 Conclusion
References
NAS-Count: Counting-by-Density with Neural Architecture Search
1 Introduction
2 Related Work
2.1 Crowd Counting Literature
2.2 NAS Fundamentals
2.3 NAS Applications
3 NAS-Count Methodology
3.1 Automatic Multi-Scale Network
3.2 Scale Pyramid Pooling Loss
4 Experiments
4.1 Implementation Details
4.2 Search Result Analysis
4.3 Ablation Study on Searched Architectures
4.4 Hyper-parameter Study
4.5 Performance and Comparison
5 Conclusion
References
Towards Generalization Across Depth for Monocular 3D Object Detection
1 Introduction
2 Related Work
3 Problem Description
4 Details of Our Contributions
4.1 Proposed Virtual Views
4.2 Proposed Single-Stage Architecture
5 Experiments
5.1 Implementation Details
5.2 Dataset and Experimental Protocol
5.3 3D Detection
6 Conclusions
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Computer Vision - ECCV 2020

Description

More details

Other editions

Additional editions

Content

System requirements