Computer Vision - ECCV 2020

Name: Computer Vision - ECCV 2020 | 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part IV
Brand: Springer
Price: 96.29 EUR
Availability: OnlineOnly

16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part IV

Andrea Vedaldi Horst Bischof Thomas Brox Jan-Michael Frahm(Editor)

Springer (Publisher)

Published on 29. October 2020

XLIII, 817 pages

E-Book

PDF with digital watermarking

System requirements

978-3-030-58548-8 (ISBN)

€96.29incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Intro
Foreword
Preface
Organization
Contents - Part IV
Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors
1 Introduction
2 Related Work
2.1 Object Detector Basics
3 Approach
3.1 Creating a Universal Adversarial Patch
4 Crafting Attacks in the Digital World
4.1 Evaluation of Digital Attacks
5 Physical World Attacks
5.1 Printed Posters
5.2 Paper Dolls
6 Wearable Adversarial Examples
7 Conclusion
References
TuiGAN: Learning Versatile Image-to-Image Translation with Two Unpaired Images
1 Introduction
2 Related Works
2.1 Image-to-Image Translation
2.2 Image Style Transfer
2.3 Single Image Generative Models
3 Method
3.1 Network Architecture
3.2 Loss Functions
3.3 Implementation Details
4 Experiments
4.1 Baselines
4.2 Evaluation Metrics
4.3 Results
4.4 Ablation Study
5 Conclusion
References
Semi-Siamese Training for Shallow Face Learning
1 Introduction
2 Related Work
2.1 Deep Face Recognition
2.2 Low-Shot Face Recognition
2.3 Self-supervised Learning
3 The Proposed Approach
3.1 Shallow Face Learning Problem
3.2 Semi-Siamese Training
4 Experiments
4.1 Datasets and Experimental Settings
4.2 Ablation Study
4.3 SST with Various Loss Functions
4.4 SST with Various Network Architectures
4.5 SST on Deep Data Learning
4.6 Pretrain and Finetune
5 Conclusions
References
GAN Slimming: All-in-One GAN Compression by a Unified Optimization Framework
1 Introduction
2 Related Works
2.1 Deep Model Compression
2.2 GAN Compression
3 The GAN Slimming Framework
3.1 The Unified Optimization Form
3.2 End-to-End Optimization
3.3 Algorithm Implementation
4 Experiments
4.1 Unpaired Image Translation with CycleGAN
4.2 Ablation Study
4.3 Real-World Application: CartoonGAN
5 Conclusion
A Image Generation with SNGAN
References
Human Interaction Learning on 3D Skeleton Point Clouds for Video Violence Recognition
1 Introduction
2 Related Work
3 Proposed Method
3.1 Framework
3.2 Skeleton Points Interaction Learning Module
3.3 Multi-head Mechanism
3.4 Skeleton Point Convolution
4 Experiments
4.1 Ablation Study
4.2 Comparison with the State of the Art
4.3 Failure Case
5 Conclusion
References
Binarized Neural Network for Single Image Super Resolution
1 Introduction
2 Related Work
2.1 Single Image Super Resolution
2.2 Quantitative Model
3 Proposed Approach
3.1 Motivation
3.2 Quantization of Weights
3.3 Quantization of Activations
3.4 Binary Super Resolution Network
4 Experiments
4.1 Datasets
4.2 Implementations
4.3 Evaluation
4.4 Model Analysis
5 Conclusions
References
Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation
1 Introduction
2 Related Work
3 Method
3.1 Position-Sensitive Self-attention
3.2 Axial-Attention
4 Experimental Results
4.1 ImageNet
4.2 COCO
4.3 Mapillary Vistas
4.4 Cityscapes
4.5 Ablation Studies
5 Conclusion and Discussion
References
Adaptive Computationally Efficient Network for Monocular 3D Hand Pose Estimation
1 Introduction
2 Related Work
3 Method
3.1 Overview
3.2 Single Frame Hand Pose Estimator
3.3 Pose Refinement Recurrent Model
3.4 Adaptive Dynamic Gate Model
3.5 Training Strategy and Losses
4 Experiments
4.1 Datasets and Metrics
4.2 Implementation Details
4.3 Main Results
4.4 Ablation Study
5 Conclusion
References
Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking
1 Introduction
2 Related Work
2.1 Detection-Based MOT Methods
2.2 Partially End-to-End MOT Methods
2.3 Attention-Assistant MOT Methods
3 Methodology
3.1 Problem Settings
3.2 Chained-Tracker Pipeline
3.3 Network Architecture
3.4 Label Assignment and Loss Design
4 Experiment
4.1 Datasets and Evaluation Metrics
4.2 Implementation Details
4.3 Ablation Study
4.4 Benchmark Evaluation
5 Conclusion
References
Distribution-Balanced Loss for Multi-label Classification in Long-Tailed Datasets
1 Introduction
2 Related Work
3 Distribution-Balanced Loss
3.1 Re-balanced Weighting After Re-sampling
3.2 Negative-Tolerant Regularization
3.3 Distribution-Balanced Loss
4 Experiments
4.1 Datasets
4.2 Experimental Settings
4.3 Benchmarking Results
4.4 Ablation Study
4.5 Further Analysis
5 Conclusion
References
Hamiltonian Dynamics for Real-World Shape Interpolation
1 Introduction
2 Related Work
3 Background
3.1 Physical Assumptions for Shape Deformation
3.2 Shape Interpolation
4 Interpolation of Real-World Objects
5 From Hamiltonian Dynamics to Eulerian-Lagrangian Shape Interpolation
5.1 Deformation Model
5.2 Anisotropic As-rigid-As-Possible Deformation
5.3 Time Discretization
5.4 Interpolation Algorithm
6 Experiments
7 Conclusion
References
Learning to Scale Multilingual Representations for Vision-Language Tasks
1 Introduction
2 Related Work
3 Scalable Multilingual Aligned Language Representation
3.1 Efficient Multilingual Learning with a Hybrid Embedding Model
3.2 Masked Cross-Language Modeling (MCLM)
3.3 Multilingual Visual-Semantic Alignment
3.4 Cross-Lingual Consistency
4 Experimental Setup
5 Multilingual Image-Sentence Retrieval Results
6 Conclusion
References
Multi-modal Transformer for Video Retrieval
1 Introduction
2 Related Work
3 Methodology
3.1 Video Representation
3.2 Caption Representation
3.3 Similarity Estimation
3.4 Training
4 Experiments
4.1 Datasets and Metrics
4.2 Implementation Details
4.3 Ablation Studies and Comparisons
5 Summary
References
Feature Representation Matters: End-to-End Learning for Reference-Based Image Super-Resolution
1 Introduction
2 Related Work
2.1 Image Super-Resolution
2.2 Reference-Based Super-Resolution
3 Our Method
3.1 Notations
3.2 Feature Encoding Module
3.3 Match and Swap Module
3.4 Image Synthesis Module
3.5 Loss Function
4 Experiments
4.1 Implementation Details
4.2 Evaluations
4.3 Ablation Study
5 Conclusions
References
RobustFusion: Human Volumetric Capture with Data-Driven Visual Cues Using a RGBD Camera
1 Introduction
2 Related Work
3 Overview
4 Model Completion
5 Robust Performance Capture
6 Experiment
6.1 Comparison
6.2 Evaluation
7 Discussion
References
Surface Normal Estimation of Tilted Images via Spatial Rectifier
1 Introduction
2 Related Work
3 Method
3.1 Spatial Rectifier
3.2 Surface Normal Estimation by Synthesis
3.3 Truncated Angular Loss
3.4 Surface Normal Estimator Design
4 Results
4.1 Evaluation Dataset
4.2 Baseline
4.3 Surface Normal Estimation on Tilt-RGBD
4.4 Network Efficiency
4.5 Surface Normal Training Loss
5 Summary
References
Multimodal Shape Completion via Conditional Generative Adversarial Networks
1 Introduction
2 Related Work
3 Method
3.1 Learning Latent Spaces for Point Sets
3.2 Learning Multimodal Mapping for Shape Completion
3.3 Explicitly-Encoded Multimodality
3.4 Implementation Details
4 Experiments
4.1 Multimodal Completion Results
4.2 Comparison Results
4.3 Results on Real Scans
4.4 More Experiments
5 Conclusion
References
Generative Sparse Detection Networks for 3D Single-Shot Object Detection
1 Introduction
2 Related Work
3 Preliminaries
3.1 Sparse Tensor
3.2 Sparse Tensor for 3D Data Representation
4 Generative Sparse Detection Networks
4.1 Hierarchical Sparse Tensor Encoder
4.2 Generative Sparse Tensor Decoder
4.3 Multi-scale Bounding Box Anchor Prediction
4.4 Summary of GSDN Feed Forward
4.5 Losses
4.6 Prediction Post-processing
5 Experiments
5.1 Object Detection Performance Analysis
5.2 Speed and Memory Analysis
5.3 Scalability and Generalization of GSDN on Extremely Large Inputs
6 Conclusion
References
Grounded Situation Recognition
1 Introduction
2 Related Work
3 GSR and SWiG
4 Methods
5 Experiments
6 Discussion
7 Conclusion
References
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos
1 Introduction
2 Related Works
3 Proposed Approach
3.1 Overview
3.2 Video Modality Interaction
3.3 Sentence Localization
3.4 Event Captioning
4 Experiments
4.1 Experimental Settings
4.2 Ablation Studies
4.3 Comparison with State-of-the-Art Methods
4.4 Qualitative Results
5 Conclusions
References
Unpaired Learning of Deep Image Denoising
1 Introduction
2 Related Work
2.1 Deep Image Denoising
2.2 Learning CNN Denoisers Without Paired Noisy-Clean Images
3 Proposed Method
3.1 Two-Stage Training and Knowledge Distillation
3.2 D-BSN and CNNest for Self-supervised Learning
3.3 Self-supervised Loss and Bayes Denoising
3.4 Extension to Real-World Noisy Photographs
4 Experimental Results
4.1 Implementation Details
4.2 Comparison of Different Supervision Settings
4.3 Experiments on Synthetic Noisy Images
4.4 Experiments on Real-World Noisy Photographs
5 Concluding Remarks
References
Self-supervising Fine-Grained Region Similarities for Large-Scale Image Localization
1 Introduction
2 Related Work
3 Method
3.1 Retrieval-Based IBL Methods Revisit
3.2 Self-supervising Query-Gallery Similarities
3.3 Self-supervising Fine-Grained Image-to-region Similarities
3.4 Discussions
4 Experiments
4.1 Implementation Details
4.2 Comparison with State-of-the-arts
4.3 Ablation Studies
4.4 Qualitative Evaluation
4.5 Generalization on Image Retrieval Datasets
5 Conclusion
References
Rotationally-Temporally Consistent Novel View Synthesis of Human Performance Video
1 Introduction
2 Related Work
3 Proposed Method
3.1 Problem Definition
3.2 Network Architecture
3.3 Rotational and Temporal Supervision
3.4 Multi-View Human Action (MVHA) Dataset
4 Experiments
4.1 Results on the MVHA Dataset
4.2 Results on PVHM and ShapeNet Datasets
5 Conclusion
References
Side-Aware Boundary Localization for More Precise Object Detection
1 Introduction
2 Related Work
3 Side-Aware Boundary Localization
3.1 Side-Aware Feature Extraction
3.2 Boundary Localization with Bucketing
3.3 Bucketing-Guided Rescoring
3.4 Application to Single-Stage Detectors
4 Experiments
4.1 Experimental Setting
4.2 Results
4.3 Ablation Study
5 Conclusion
References
SF-Net: Single-Frame Supervision for Temporal Action Localization
1 Introduction
2 Related Work
3 Method
3.1 Problem Definition
3.2 Framework
3.3 Pseudo Label Mining and Training Objectives
3.4 Inference
4 Experiment
4.1 Datasets
4.2 Implementation Details
4.3 Evaluation Metrics
4.4 Annotation Analysis
4.5 Analysis
5 Conclusions
References
Negative Margin Matters: Understanding Margin in Few-Shot Classification
1 Introduction
2 Related Work
3 Methodology
3.1 Negative-Margin Softmax Loss
3.2 Discriminability Analysis of Deep Features w.r.t Different Margins
3.3 Intuitive Explanation
3.4 Theoretical Analysis
3.5 Framework
4 Experiments
4.1 Setup
4.2 Results
4.3 Analysis
5 Conclusion
References
Particularity Beyond Commonality: Unpaired Identity Transfer with Multiple References
1 Introduction
2 Related Work
3 Our Method
3.1 Multi-reference Guided Generator
3.2 Discriminators
3.3 Training
4 Experiments
4.1 Datasets and Implementation
4.2 Quantitative Evaluation Metrics
4.3 Analysis of Different Components
4.4 More Results
5 Conclusion
References
Tracking Objects as Points
1 Introduction
2 Related Work
3 Preliminaries
4 Tracking Objects as Points
4.1 Tracking-Conditioned Detection
4.2 Association Through Offsets
4.3 Training on Video Data
4.4 Training on Static Image Data
4.5 End-to-End 3D Object Tracking
5 Experiments
5.1 Datasets and Evaluation Metrics
5.2 Implementation Details
5.3 Public Detection
5.4 Main Results
5.5 Ablation Studies
5.6 Comparison to Alternative Motion Models
6 Conclusion
References
CPGAN: Content-Parsing Generative Adversarial Networks for Text-to-Image Synthesis
1 Introduction
2 Related Work
3 Content-Parsing Generative Adversarial Networks
3.1 Coarse-to-fine Generative Framework
3.2 Memory-Attended Text Encoder
3.3 Object-Aware Image Encoder
3.4 Fine-Grained Conditional Discriminator
4 Experiments
4.1 Experimental Setup
4.2 Ablation Study
4.3 Comparison with State-of-the-arts
5 Conclusions
References
Transporting Labels via Hierarchical Optimal Transport for Semi-Supervised Learning
1 Introduction
2 Related Work
3 Preliminaries
3.1 Discrete OT and Dual Form
3.2 Hierarchical OT
3.3 Wasserstein Barycenters
4 Method
4.1 Finding Unlabeled Measures via Wasserstein Metric
4.2 Mapping Measures via Hierarchical OT for Pseudo-Labeling
4.3 Training CNN in SSL Fashion
5 Experiments and Setup
5.1 Fully Supervised and Deep SSL Methods
5.2 Soft-Pseudo-Labels Based on Hierarchical OT
5.3 Contribution of Hierarchical Optimal Transport to SSL
5.4 Clustering Resolution
5.5 Varying Labeled Data
6 Conclusion
References
MTI-Net: Multi-scale Task Interaction Networks for Multi-task Learning
1 Introduction and Prior Work
2 Method
2.1 Multi-task Learning by Multi-modal Distillation
2.2 Task Interactions at Different Scales
2.3 Multi-scale Multi-modal Distillation
2.4 Feature Propagation Across Scales
2.5 Feature Aggregation
3 Experiments
3.1 Experimental Setup
3.2 Ablation Studies
3.3 Comparison with the State-of-the-Art
4 Conclusion
References
Learning to Factorize and Relight a City
1 Introduction
2 Related Work
3 Google Street View Time Machine Data
4 Method
4.1 Encoder-Decoder Architecture
4.2 Training
4.3 Stack Alignment
4.4 Losses
5 Experiments
5.1 Within-Scene Decomposition
5.2 Cross-Scene Factorization
6 Applications
7 Discussion
References
Region Graph Embedding Network for Zero-Shot Learning
1 Introduction
2 Related Works
3 Methodology
3.1 Overview
3.2 Constrained Part Attention Branch
3.3 Parts Relation Reasoning Branch
3.4 The Transfer and Balance Losses
3.5 Training Objective
3.6 Zero-Shot Prediction
4 Experiments
4.1 Datasets and Settings
4.2 Implementation and Parameters
4.3 Zero-Shot Recognition
4.4 Generalized Zero-Shot Recognition
4.5 Ablations
4.6 Qualitative Analysis
5 Conclusions
References
GRAB: A Dataset of Whole-Body Human Grasping of Objects
1 Introduction
2 Related Work
3 Dataset
3.1 Motion Capture (MoCap)
3.2 From MoCap Markers to 3D Surfaces
3.3 Contact Annotation
3.4 Dataset Protocol
3.5 Analysis
4 GrabNet: Learning to Grab an Object
5 Discussion
References
DEMEA: Deep Mesh Autoencoders for Non-rigidly Deforming Objects
1 Introduction
2 Related Work
3 Approach
3.1 Mesh Hierarchy
3.2 Embedded Deformation Layer (EDL)
3.3 Differentiable Space Deformation
3.4 Training
3.5 Reconstructing Meshes from Images/Depth
3.6 Network Architecture Details
4 Experiments
4.1 Baseline Architectures
4.2 Evaluation Settings
4.3 Evaluations of the Autoencoder
5 Applications
5.1 RGB to Mesh
5.2 Depth to Mesh
5.3 Latent Space Arithmetic
6 Limitations
7 Conclusion
References
RANSAC-Flow: Generic Two-Stage Image Alignment
1 Introduction
2 Related Work
3 Method
3.1 Coarse Alignment by Feature-Based RANSAC
3.2 Fine Alignment by Local Flow Prediction
3.3 Multiple Homographies
3.4 Architecture and Implementation Details
4 Experiments
4.1 Direct Correspondences Evaluation
4.2 Evaluation for Downstream Tasks
4.3 Applications
5 Conclusion
References
Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds
1 Introduction
2 Related Works
3 Approach
3.1 Omni Auditory Perception Dataset
3.2 Auditory Semantic Prediction
3.3 Auditory Depth Perception
3.4 Spatial Sound Super-Resolution (S3R)
3.5 Network Architecture
4 Experiments
4.1 Auditory Semantic Prediction
4.2 Auditory Depth Prediction
4.3 Spatial Sound Super-Resolution
4.4 Qualitative Results
4.5 Limitations and Future Work
5 Conclusion
References
Neural Object Learning for 6D Pose Estimation Using a Few Cluttered Images
1 Introduction
2 Related Work
3 Neural Object Learning
3.1 Network Architecture
3.2 Training
3.3 Gradient Based Pose Refinement and Rendering
4 Evaluation
4.1 Datasets
4.2 Implementation Details
4.3 Metrics
4.4 Quality of Rendered Images
4.5 Pose Estimation: LineMOD
4.6 Pose Estimation: LineMOD-Occ
4.7 Pose Estimation: SMOT
5 Ablation Study
6 Conclusion
References
Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency Checking
1 Introduction
2 Related Work
3 Reconstruction Pipeline
4 Dense Hybrid Recurrent MVSNet
4.1 Image Feature Extractor
4.2 Hybrid Recurrent Regularization
4.3 Training Loss
5 Dynamic Consistency Checking
6 Experiments
6.1 Implementation Details
6.2 Datasets and Results
6.3 Ablation Study
7 Discussion
8 Conclusions
References
Pixel-Pair Occlusion Relationship Map (P2ORM): Formulation, Inference and Application
1 Introduction
2 Formalizing and Representing Geometric Occlusion
3 Pixel-Pair Occlusion Relationship Estimation
4 Application to Depth Map Refinement
5 Experiments
6 Conclusion
References
MovieNet: A Holistic Dataset for Movie Understanding
1 Introduction
2 Related Datasets
3 Visit MovieNet: Data and Annotation
3.1 Data in MovieNet
3.2 Annotation in MovieNet
4 Play with MovieNet: Benchmark and Analysis
4.1 Genre Analysis
4.2 Cinematic Style Analysis
4.3 Character Recognition
4.4 Scene Analysis
4.5 Story Understanding
5 Discussion and Future Work
References
Short-Term and Long-Term Context Aggregation Network for Video Inpainting
1 Introduction
2 Related Work
2.1 Image Inpainting
2.2 Video Inpainting
3 Short-Term and Long-Term Context Aggregation Network
3.1 Network Overview
3.2 Boundary-Aware Short-Term Context Aggregation
3.3 Dynamic Long-Term Context Aggregation
3.4 Loss Function
4 Experiments
4.1 Quantitative Results
4.2 Qualitative Results
4.3 User Study
4.4 Ablation Study
5 Conclusion
References
DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF Relocalization
1 Introduction
2 Related Work
3 Hierarchical 3D Descriptors Learning
3.1 3D Local Feature Encoder and Detector
3.2 Global Descriptor Learning
4 Experiments
4.1 3D Keypoint Repeatability
4.2 Point Cloud Registration
4.3 Point Cloud Retrieval
4.4 Application to Visual SLAM
4.5 Ablation Study
5 Conclusion
References
Face Super-Resolution Guided by 3D Facial Priors
1 Introduction
2 Related Work
3 The Proposed Method
3.1 Motivations and Advantages of 3D Facial Priors
3.2 Formulation of 3D Facial Priors
3.3 Spatial Attention Module
4 Experimental Results
4.1 Datasets and Implementation Details
4.2 Quantitative Results
4.3 Qualitative Evaluation
5 Analyses and Discussions
6 Conclusions
References
Label Propagation with Augmented Anchors: A Simple Semi-supervised Learning Baseline for Unsupervised Domain Adaptation
1 Introduction
2 Related Works
3 Semi-supervised Learning and Unsupervised Domain Adaptation
3.1 Semi-supervised Learning Preliminaries
3.2 From Graph-Based Semi-supervised Learning to Unsupervised Domain Adaptation
4 Label Propagation with Augmented Anchors
4.1 The Proposed Algorithms
5 Experiments
5.1 Analysis
5.2 Results
6 Conclusion
References
Are Labels Necessary for Neural Architecture Search?
1 Introduction
2 Related Work
3 Unsupervised Neural Architecture Search
3.1 Search Phase
3.2 Evaluation Phase
3.3 Analogy to Unsupervised Learning
4 Experiments Overview
4.1 Pretext Tasks
5 Sample-Based Experiments
5.1 Experimental Design
5.2 Implementation Details
5.3 Results
6 Search-Based Experiments
6.1 Experimental Design
6.2 Implementation Details
6.3 Results
7 Discussion
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Computer Vision - ECCV 2020

Description

More details

Other editions

Additional editions

Content

System requirements