Computer Vision - ECCV 2020

Name: Computer Vision - ECCV 2020 | 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part I
Brand: Springer
Price: 96.29 EUR
Availability: OnlineOnly

16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part I

Andrea Vedaldi Horst Bischof Thomas Brox Jan-Michael Frahm(Editor)

Springer (Publisher)

Published on 3. November 2020

XLII, 815 pages

E-Book

PDF with digital watermarking

System requirements

978-3-030-58452-8 (ISBN)

€96.29incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Intro
Foreword
Preface
Organization
Contents - Part I
Quaternion Equivariant Capsule Networks for 3D Point Clouds
1 Introduction
2 Related Work
3 Preliminaries and Technical Background
3.1 Equivariance
3.2 The Quaternion Group H1
3.3 3D Point Clouds
4 SO(3)-Equivariant Dynamic Routing
4.1 Equivariant Quaternion Mean
4.2 Equivariant Weiszfeld Dynamic Routing
5 Equivariant Capsule Network Architecture
5.1 QEC Module
5.2 Network Architecture
6 Experimental Evaluations
7 Conclusion and Discussion
References
DeepFit: 3D Surface Fitting via Neural Network Weighted Least Squares
1 Introduction
2 Background and Related Work
2.1 Deep Learning for Unstructured 3D Point Clouds
2.2 Normal Vector and Principal Curvature Estimation
2.3 Jet Fitting Using Least Squares and Weighted Least Squares
3 DeepFit
3.1 Learning Point-Wise Weights
3.2 Geometric Quantities Estimation
3.3 Consistency Loss
3.4 Implementation Notes
4 Results
4.1 Dataset and Training Details
4.2 Normal Estimation Performance
4.3 Principal Curvature Estimation Performance
4.4 Surface Reconstruction and Noise Removal
5 Summary
References
NSGANetV2: Evolutionary Multi-objective Surrogate-Assisted Neural Architecture Search
1 Introduction
2 Related Work
3 Proposed Approach
3.1 Search Space
3.2 Overall Algorithm Description
3.3 Speeding Up Upper Level Optimization
3.4 Speeding Up Lower Level Optimization
4 Experiments and Results
4.1 Performance of the Surrogate Predictors
4.2 Search Efficiency
4.3 Results on Standard Datasets
5 Scalability of MSuNAS
5.1 Types of Datasets
5.2 Number of Objectives
6 Conclusion
References
Describing Textures Using Natural Language
1 Introduction
2 Related Work
3 Dataset and Tasks
3.1 Dataset Collection
3.2 Tasks and Evaluation Metrics
4 Methods
4.1 A Discriminative Approach
4.2 A Metric Learning Approach
4.3 A Generative Language Approach
5 Experiments and Analysis
5.1 Phrase and Image Retrieval
5.2 Description Generation
5.3 A Critical Analysis of Language Modeling
6 Applications
7 Conclusion
References
Empowering Relational Network by Self-attention Augmented Conditional Random Fields for Group Activity Recognition
1 Introduction
2 Related Works
3 Feature Extraction Network
4 CRF for Individual Action Recognition
5 Proposed Method
5.1 Temporal and Spatial Self-attention
5.2 Self-Attention Augmented Conditional Random Fields
5.3 Reformulation of Mean-Field Inference
5.4 Bidirectional UTE for Group Activity Recognition
6 Experimental Results
6.1 Experimental Settings
6.2 Ablation Studies
6.3 Comparison with the State-of-the-Art Works
7 Conclusions
References
AiR: Attention with Reasoning Capability
1 Introduction
2 Related Works
3 Method
3.1 Attention with Reasoning Capability
3.2 Measuring Attention Accuracy with ROIs
3.3 Reasoning-Aware Attention Supervision
3.4 Evaluation Benchmark and Human Attention Baseline
4 Experiments and Analyses
4.1 Do Machines or Humans Look at Places Important to Reasoning? How Does Attention Influence Task Performances?
4.2 How Does Attention Accuracy Evolve Throughout the Reasoning Process?
4.3 Does Progressive Attention Supervision Improve Attention and Task Performance?
5 Conclusion
References
Self6D: Self-supervised Monocular 6D Object Pose Estimation
1 Introduction
2 Related Work
2.1 Monocular 6D Pose Estimation
2.2 Neural Rendering
2.3 Recent Trends in Self-supervised Learning
2.4 Domain Adaptation for 6D Pose Estimation
3 Self-supervised 6D Pose Estimation
4 Evaluation
4.1 Analysis on the Quality of Predicted Masks
4.2 Ablation Study
4.3 Comparison with State-of-the-Art
5 Conclusion
References
Invertible Image Rescaling
1 Introduction
2 Related Work
2.1 Image Upscaling After Downscaling
2.2 Invertible Neural Network
2.3 Image Compression
3 Methods
3.1 Model Specification
3.2 Invertible Architecture
3.3 Training Objectives
4 Experiments
4.1 Dataset and Settings
4.2 Evaluation on Reconstructed HR Images
4.3 Evaluation on Downscaled LR Images
5 Conclusion
References
Synthesize Then Compare: Detecting Failures and Anomalies for Semantic Segmentation
1 Introduction
2 Related Work
3 Methodology
3.1 General Framework
3.2 Failure Detection
3.3 Anomaly Segmentation
3.4 Conceptual Explanation
4 Experiments
4.1 Failure Detection
4.2 Anomaly Segmentation
5 Conclusions
References
House-GAN: Relational Generative Adversarial Networks for Graph-Constrained House Layout Generation
1 Introduction
2 Related Work
3 Graph-Constrained House Layout Generation Problem
4 House-GAN
4.1 House Layout Generator
4.2 House Layout Discriminator
5 Implementation Details
6 Experimental Results
7 Conclusion
References
Crowdsampling the Plenoptic Function
1 Introduction
2 Related Work
3 Approach
3.1 Collecting Crowdsampled Data
3.2 The DeepMPI Scene Representation
3.3 Stage 1: Optimizing DeepMPI Color and Planes
3.4 Stage 2: Learning How Appearance Changes with Time
4 Experiments
5 Discussion and Conclusion
References
VoxelPose: Towards Multi-camera 3D Human Pose Estimation in Wild Environment
1 Introduction
2 Related Work
2.1 Single Person 3D Pose Estimation
2.2 Multiple Person 3D Pose Estimation
3 Cuboid Proposal Network
3.1 Feature Volume
3.2 Cuboid Proposals
3.3 Non-maximum Suppression
3.4 Network Structures of CPN
4 Pose Regression Network
4.1 Constructing Feature Volume
4.2 Regression of Human Poses
4.3 Training Strategies
5 Datasets and Metrics
6 Evaluation of CPN
7 Evaluation of PRN
7.1 2D Pose Estimation Accuracy
7.2 Ablation Study on 3D Pose Estimation
7.3 Comparison to the State-of-the-Arts
8 Conclusion
References
End-to-End Object Detection with Transformers
1 Introduction
2 Related Work
2.1 Set Prediction
2.2 Transformers and Parallel Decoding
2.3 Object Detection
3 The DETR Model
3.1 Object Detection Set Prediction Loss
3.2 DETR Architecture
4 Experiments
4.1 Comparison with Faster R-CNN and RetinaNet
4.2 Ablations
4.3 DETR for Panoptic Segmentation
5 Conclusion
References
DeepSFM: Structure from Motion via Deep Bundle Adjustment
1 Introduction
2 Related Work
3 Architecture
3.1 2D Feature Extraction
3.2 Depth Based Cost Volume (D-CV)
3.3 Pose Based Cost Volume (P-CV)
3.4 Cost Aggregation and Regression
3.5 Training
4 Experiments
4.1 Datasets
4.2 Evaluation
4.3 Model Analysis
5 Conclusions
References
Ladybird: Quasi-Monte Carlo Sampling for Deep Implicit Field Based 3D Reconstruction with Symmetry
1 Introduction
1.1 Sampling Methods in Monte Carlo Integration
2 Our Approach
2.1 Preliminary
2.2 Sampling
2.3 Feature Fusion Based on Symmetry
3 Experiments
3.1 Data Processing
3.2 Network Details
3.3 Samplers Impact on Training
3.4 Effect of Feature Fusion Based on Symmetry
3.5 Comparison with Other Methods
4 Conclusion
References
Segment as Points for Efficient Online Multi-Object Tracking and Segmentation
1 Introduction
2 Related Work
3 Method
3.1 Context-Aware Instance Embeddings Extraction
3.2 Instance Segmentation with Temporal Seed Consistency
4 Apollo MOTS Dataset
4.1 Overview
4.2 Annotation
5 Experiments
6 Conclusions
References
Conditional Convolutions for Instance Segmentation
1 Introduction
1.1 Related Work
2 Instance Segmentation with CondInst
2.1 Overall Architecture
2.2 Network Outputs and Training Targets
2.3 Loss Function
2.4 Inference
3 Experiments
3.1 Implementation Details
3.2 Architectures of the Mask Head
3.3 Design Choices of the Mask Branch
3.4 How Important to Upsample Mask Predictions?
3.5 CondInst without Bounding-Box Detection
3.6 Comparisons with State-of-the-Art Methods
4 Conclusions
References
MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution
1 Introduction
2 Related Work
3 Methodology
3.1 Preliminary
3.2 Rethinking Efficient Network Design
3.3 Mutual Learning Framework
4 Experiments
4.1 Evaluation on ImageNet Classification
4.2 Ablation Study
4.3 Transfer Learning
4.4 Object Detection and Instance Segmentation
5 Conclusion and Future Work
References
Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset
1 Introduction
2 Related Work
3 Dataset Specification and Collection
3.1 Ontology Specification
3.2 Image Collection and Annotation Pipeline
4 Dataset Analysis
4.1 Image Analysis
4.2 Mask Analysis
4.3 Category and Attributes Analysis
5 Evaluation Protocol and Baselines
5.1 Evaluation Metric
5.2 Attribute-Mask R-CNN
5.3 Results Discussion
6 Conclusion
References
Privacy Preserving Structure-from-Motion
1 Introduction
2 Related Work
3 Method
3.1 Initialization
3.2 Triangulation
3.3 Camera Resectioning
3.4 Bundle Adjustment
3.5 Implementation Details
4 Experiments
4.1 Evaluation of Camera Pose Accuracy
4.2 Evaluation of Initialization Scheme
4.3 Comparison with Traditional Structure-from-Motion
4.4 Structure-from-Motion on Internet Images
4.5 Qualitative Comparison of Feature Inversion Results
5 Conclusion
References
Rewriting a Deep Generative Model
1 Introduction
2 Related Work
3 Method
3.1 Objective: Changing a Rule with Minimal Collateral Damage
3.2 Viewing a Convolutional Layer as an Associative Memory
3.3 Updating W to Insert a New Value
3.4 Generalize to a Nonlinear Neural Layer
4 User Interface
5 Results
5.1 Putting Objects into a New Context
5.2 Removing Undesired Features
5.3 Changing Contextual Rules
6 Discussion
References
Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets
1 Introduction
2 Related Work
3 Methodology
3.1 Similar Images Set
3.2 Between-Set CIDEr (CIDErBtw)
3.3 CIDErBtw Training Strategies
4 Experiments
4.1 Implementation Details
4.2 Experiment Results
4.3 User Study
4.4 Qualitative Results
5 Conclusion
References
Long-Term Human Motion Prediction with Scene Context
1 Introduction
2 Related Work
3 Approach
3.1 GoalNet: Predicting 2D Path Destination
3.2 PathNet: Planning 3D Path towards Destination
3.3 PoseNet: Generating 3D Pose following Path
4 GTA Indoor Motion Dataset
5 Evaluation
5.1 Datasets
5.2 Evaluation Metric and Baselines
5.3 Comparison with Baselines
5.4 Evaluation and Visualization on Longer-Term Predictions
5.5 Discussion of Failure Cases
6 Conclusion
References
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
1 Introduction
2 Related Work
3 Neural Radiance Field Scene Representation
4 Volume Rendering with Radiance Fields
5 Optimizing a Neural Radiance Field
5.1 Positional Encoding
5.2 Hierarchical Volume Sampling
5.3 Implementation Details
6 Results
6.1 Datasets
6.2 Comparisons
6.3 Discussion
6.4 Ablation Studies
7 Conclusion
References
ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes
1 Introduction
2 Related Work
3 Developing Referential 3D-Centric Data
3.1 Creating Template Based Spatial References
3.2 Natural Reference in 3D Scenes
4 Developing 3D Neural Listeners
5 Experiments and Analysis
6 Conclusion
References
MatryODShka: Real-time 6DoF Video View Synthesis Using Multi-sphere Images
1 Introduction
2 Related Work
3 Method
3.1 Multi-sphere Image Representation
3.2 Model Architecture
3.3 Training Losses
3.4 High-resolution Rendering
4 Experiments
5 Discussion and Conclusion
References
Learning and Aggregating Deep Local Descriptors for Instance-Level Recognition
1 Introduction
2 Related Work
3 Background
4 Method
4.1 Derivation of the Architecture
4.2 Relation to Prior Work
5 Experiments
5.1 Datasets
5.2 Implementation Details
5.3 Ablation Experiments
5.4 Large-Scale Instance-Level Search
5.5 Large-Scale Instance-Level Classification
6 Conclusions
References
A Consistently Fast and Globally Optimal Solution to the Perspective-n-Point Problem
1 Introduction
1.1 Related Work
1.2 The PnP as a Quadratic Program with Quadratic Constraints
1.3 Contributions
2 Method
2.1 Minima on the 8-Sphere
2.2 Sequential Quadratic Programming
2.3 The General Case
3 The SQPnP Algorithm
4 Experiments
4.1 Synthetic Experiments
5 Conclusion
References
Learn to Recover Visible Color for Video Surveillance in a Day
1 Introduction
2 Related Work
3 Dataset
3.1 Data Capturing
3.2 Data Preprocessing
4 State Synchronization Network
5 Experimental Setup
5.1 Baselines
6 Results and Discussions
6.1 Quantitative Evaluation
6.2 Qualitative Results
6.3 Perceptual Experiments
6.4 Generalization Analysis
6.5 Ablation Experiment
7 Conclusion
References
Deep Fashion3D: A Dataset and Benchmark for 3D Garment Reconstruction from Single Images
1 Introduction
2 Related Work
3 Dataset Construction
4 A Baseline Approach for Single-View Reconstruction
4.1 Template Mesh Generation
4.2 Learning Surface Reconstruction
4.3 Training
5 Experimental Results
5.1 Benchmarking on Single-View Reconstruction
5.2 Ablation Analysis
6 Conclusions and Discussions
References
Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation
1 Introduction
2 Related Work
3 Methodology
3.1 Stochastic Sampling-Interpolation Network
3.2 Stochastic Sampling Module
3.3 Interpolation Module
3.4 Grid Prior
3.5 Integration with Residual Block
4 Experiments
4.1 Experimental Settings
4.2 Ablation Study
4.3 Object Detection
4.4 Semantic Segmentation
4.5 Image Classification
4.6 Analysis of Sampling and Interpolation Modules
4.7 Realistic Run-Time on CPU
5 Conclusion
References
BorderDet: Border Feature for Dense Object Detection
1 Introduction
2 Related Works
3 Our Approach
3.1 Motivation
3.2 Border Align
3.3 Network Architecture
3.4 Model Training and Inference
4 Experiments
4.1 Implementation Details
4.2 Ablation Study
4.3 Border Align
4.4 Analysis of BorderDet
4.5 Generalization of BorderDet
4.6 Comparisons with State-of-the-Art Detectors
5 Conclusion
References
Regularization with Latent Space Virtual Adversarial Training
1 Introduction
2 Related Work
3 Background
3.1 Virtual Adversarial Training and Local Constraint
3.2 Transformer
4 Method
5 Experiments
5.1 Datasets
5.2 Model Training
5.3 Results
6 Discussions
6.1 Adversarial Examples
6.2 Failure Analysis: Limitation of VAE Reconstruction Ability on CIFAR-10
7 Conclusion
References
Du2Net: Learning Depth Estimation from Dual-Cameras and Dual-Pixels
1 Introduction
2 Related Work
3 Dual-Pixel Sensors
4 Fusing Dual-Pixels and Dual-Cameras
4.1 Feature Extraction and Cost Volumes
4.2 Fused Confidence Volume
4.3 Disparity Refinement
4.4 Loss Function
5 Evaluation
5.1 Data Collection
5.2 Training Scheme
5.3 Ablation Study
5.4 Comparison to State-of-the-Art Methods
5.5 Applications in Computational Photography
6 Discussion
References
Model-Agnostic Boundary-Adversarial Sampling for Test-Time Generalization in Few-Shot Learning
1 Introduction
2 Related Work
2.1 Few-Shot Learning
2.2 Adversarial Learning
3 MABAS: Boundary-Adversarial Sample Generation
3.1 The Few-Shot Classification Problem
3.2 Test-Time Fine-Tuning of Embedding Functions
3.3 Fine-Tuning by Boundary-Adversarial Samples
4 Application to Various Few-Shot Methods
4.1 MetaOptNet
4.2 Few-Shot Without Forgetting
4.3 Standard Transfer Learning
5 Experiments
5.1 Experimental Setup
5.2 Quantitative Evaluation
5.3 Qualitative Evaluation
6 Conclusion
References
Targeted Attack for Deep Hashing Based Retrieval
1 Introduction
2 Related Work
2.1 Deep Hashing Based Similarity Retrieval
2.2 Adversarial Attack
3 The Proposed Method
3.1 Preliminaries
3.2 Deep Hashing Targeted Attack
4 Experiments
4.1 Benchmark Datasets and Evaluation Metrics
4.2 Overall Results on Image Retrieval
4.3 Overall Results on Video Retrieval
4.4 Discussion
4.5 Open-Set Targeted Attack
5 Conclusion and Future Work
References
Gradient Centralization: A New Optimization Technique for Deep Neural Networks
1 Introduction
2 Related Work
3 Gradient Centralization
3.1 Motivation
3.2 Notations
3.3 Formulation of GC
3.4 Embedding of GC to SGDM/Adam
4 Properties of GC
4.1 Improving Generalization Performance
4.2 Accelerating Training Process
5 Experimental Results
5.1 Setup of Experiments
5.2 Results on Mini-Imagenet
5.3 Experiments on CIFAR100
5.4 Results on ImageNet
5.5 Results on Fine-Grained Image Classification
5.6 Objection Detection and Segmentation
6 Conclusions
References
Content-Aware Unsupervised Deep Homography Estimation
1 Introduction
2 Related Work
3 Algorithm
3.1 Network Structure
3.2 Triplet Loss for Robust Homography Estimation
3.3 Unsupervised Content-Awareness Learning
4 Experimental Results
4.1 Dataset and Implementation Details
4.2 Comparisons with Existing Methods
4.3 Ablation Studies
5 Conclusions
References
Multi-view Optimization of Local Feature Geometry
1 Introduction
2 Related Work
3 Method
3.1 Overview
3.2 Two-View Refinement
3.3 Multi-view Refinement
4 Implementation Details
5 Experimental Evaluation
5.1 Image Matching
5.2 Triangulation
5.3 Camera Localization
5.4 Structure-from-Motion
6 Conclusion
References
The Phong Surface: Efficient 3D Model Fitting Using Lifted Optimization
1 Introduction
1.1 Related Work
2 Method
2.1 Phong Surface Model
2.2 Lifted Optimization with the Phong Surface
2.3 Correspondence Update on Triangles
3 Experiments
3.1 Rigid Pose Alignment of an Ellipsoid
3.2 Performance on Hand Tracking
4 Conclusions
References
Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video
1 Introduction
2 Related Works
3 Method
3.1 Joint Modeling of Human-Object Interaction
3.2 Motor Attention Module
3.3 Interaction Hotspots Module
3.4 Anticipation Module
3.5 Training and Inference
3.6 Network Architecture
4 Experiments
4.1 Datasets and Annotations
4.2 FPV Action Anticipation on EPIC-Kitchens
4.3 Ablation Study
4.4 Remarks and Discussion
5 Conclusions
References
Learning Stereo from Single Images
1 Introduction
2 Related Work
3 Method
3.1 Stereo Training Data from Monocular Depth
3.2 Handling Occlusion and Collisions
3.3 Depth Sharpening
3.4 Implementation Details
4 Experiments
4.1 Evaluation Datasets and Metrics
4.2 Comparison to Alternative Data Generation Methods
4.3 Model Architecture Ablation
4.4 Comparing Different Monocular Depth Networks
4.5 Ablating Components of Our Method
4.6 Adapting to the Target Domain
4.7 Varying the Amount of Training Data
5 Discussion
6 Conclusion
References
Prototype Rectification for Few-Shot Learning
1 Introduction
2 Related Works
3 Methodology
3.1 Denotation
3.2 Cosine Similarity Based Prototypical Network
3.3 Bias Diminishing for Prototype Rectification
4 Theoretical Analysis
4.1 Lower Bound of the Expected Performance
4.2 Derivation of Shifting Term
5 Experiments
5.1 Datasets
5.2 Implementation Details
5.3 Results on MiniImageNet and TieredImageNet
5.4 Results on Meta-Dataset
5.5 Ablation Study
5.6 Comparison with Transductive Fine-Tuning
6 Conclusions
References
Learning Feature Descriptors Using Camera Pose Supervision
1 Introduction
2 Related Work
3 Method
3.1 Loss Formulation
3.2 Differentiable Matching Layer
3.3 Coarse-to-Fine Architecture
3.4 Discussion
3.5 Implementation Details
4 Experimental Results
4.1 Feature Matching Results
4.2 Results on Downstream Tasks
4.3 Ablation Analysis
5 Conclusion
References
Semantic Flow for Fast and Accurate Scene Parsing
1 Introduction
2 Related Work
3 Method
3.1 Preliminary
3.2 Flow Alignment Module
3.3 Network Architectures
4 Experiments
4.1 Experiments on Cityscapes
4.2 Experiment on More Datasets
5 Conclusion
References
Appearance Consensus Driven Self-supervised Human Mesh Recovery
1 Introduction
2 Related Work
3 Approach
3.1 Representation and Notations
3.2 Mesh Estimation Architecture
3.3 Self-supervised Learning Objectives
4 Experiments
4.1 Ablative Study
4.2 Comparison with the State-of-the-Art
4.3 Qualitative Results
5 Conclusion
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Computer Vision - ECCV 2020

Description

More details

Other editions

Additional editions

Content

System requirements