Computer Vision - ECCV 2020

Name: Computer Vision - ECCV 2020 | 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part IX
Brand: Springer
Price: 96.29 EUR
Availability: OnlineOnly

16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part IX

Andrea Vedaldi Horst Bischof Thomas Brox Jan-Michael Frahm(Editor)

Springer (Publisher)

Published on 4. November 2020

XLIII, 819 pages

E-Book

PDF with digital watermarking

System requirements

978-3-030-58545-7 (ISBN)

€96.29incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Intro
Foreword
Preface
Organization
Contents - Part IX
Search What You Want: Barrier Panelty NAS for Mixed Precision Quantization
1 Introduction
2 Related Work
3 Method
3.1 Mixed Precision Quantization Search
3.2 BP-NAS for Mixed Precision Quantization Search
4 Experiments
4.1 Cifar-10
4.2 ImageNet
4.3 COCO Detection
5 Ablation Study
5.1 Efficient Search
5.2 Differentiated Importance Factors
6 Conclusion
References
Monocular 3D Object Detection via Feature Domain Adaptation
1 Introduction
2 Related Works
2.1 LiDAR-Based 3D Object Detection
2.2 Monocular 3D Object Detection
2.3 Domain Adaptation
3 Methodology
3.1 Overview
3.2 Siamese Framework for Adapting Pseudo-LiDAR to LiDAR
3.3 Context-Aware Foreground Segmentation
3.4 Training Loss
4 Experiment
4.1 Implementation
4.2 Comparison with State-of-the-art Methods
4.3 Ablation Study
4.4 Generalization Ability
5 Conclusions
References
Talking-Head Generation with Rhythmic Head Motion
1 Introduction
2 Related Work
2.1 Talking-Head Image Generation
2.2 Related Techniques
3 Method
3.1 Problem Formulation
3.2 The Facial Expression Learner
3.3 The Head Motion Learner
3.4 The 3D-Aware Generative Network
4 3D-Aware Generation
4.1 3D-Aware Module
4.2 Hybrid Embedding Module
4.3 Non-linear Composition Module
4.4 Objective Function
5 Experiments Setup
6 Results and Analysis
7 Conclusion and Discussion
References
AUTO3D: Novel View Synthesis Through Unsupervisely Learned Variational Viewpoint and Global 3D Representation
1 Introduction
2 Related Work
3 Methodology
3.1 Global 3D Encoding with Arbitrary Number of Appearance Describing Images
3.2 Unsupervised Viewer-Centered Relative-Pose Encoding
3.3 Overall Framework and Optimization Objective
4 Experiments
4.1 Datasets
4.2 Qualitative Results
4.3 Quantitative Results
4.4 Ablation Study of Each Module
4.5 Sensitive Analysis
4.6 Investigating the Global 3D Feature
4.7 The Effect of Source Image Ordering
5 Conclusions
References
VPN: Learning Video-Pose Embedding for Activities of Daily Living
1 Introduction
2 Related Work
3 Proposed Action Recognition Model
3.1 Video Representation
3.2 VPN
3.3 Training Jointly the 3D ConvNet and VPN
4 Experiments
4.1 Implementation Details
4.2 Ablation Study
4.3 Qualitative Analysis
4.4 Comparison with the State-of-the-art
5 Conclusion
References
Soft Anchor-Point Object Detection
1 Introduction
2 Related Work
3 Soft Anchor-Point Detector
3.1 Detection Formulation with Anchor Points
3.2 Soft-Weighted Anchor Points
3.3 Soft-Selected Pyramid Levels
3.4 Implementation Details
4 Experiments
4.1 Ablation Studies
4.2 Comparison to State of the Art
5 Conclusion
References
Beyond Fixed Grid: Learning Geometric Image Representation with a Deformable Grid
1 Introduction
2 Related Works
3 Deformable Grid
3.1 Grid Parameterization
3.2 Training of DefGrid
4 Applications
4.1 Learnable Geometric Downsampling
4.2 Object Mask Annotation
4.3 Unsupervised Image Partitioning
5 Experiments
5.1 Learnable Geometric Downsampling
5.2 Object Annotation
5.3 Unsupervised Image Partitioning
6 Conclusion
References
Soft Expert Reward Learning for Vision-and-Language Navigation
1 Introduction
2 Related Work
2.1 Vision-and-Language Navigation
2.2 Reward Learning
3 Soft Expert Reward Learning Model
3.1 Overview and Problem Definition
3.2 Encoder-Decoder Structure
3.3 Soft Expert Distillation
3.4 Self Perceiving Reward
4 Experiments
4.1 Experimental Setup
4.2 Overall Performance
4.3 Ablation Study
4.4 Visualisation
5 Conclusions
References
Part-Aware Prototype Network for Few-Shot Semantic Segmentation
1 Introduction
2 Related Work
2.1 Few-Shot Classification
2.2 Few-Shot Semantic Segmentation
2.3 Graph Neural Networks
3 Problem Setting
4 Our Approach
4.1 Embedding Network
4.2 Prototypes Generation Network
4.3 Part-Aware Mask Generation Network
4.4 Model Training with Semantic Regularization
5 Experiments
5.1 Experimental Configuration
5.2 Experiments on PASCAL-5i
5.3 Experiments on COCO-20i
5.4 Ablation Study
6 Conclusion
References
Learning from Extrinsic and Intrinsic Supervisions for Domain Generalization
1 Introduction
2 Related Work
3 Method
3.1 Extrinsic Supervision with Momentum Metric Learning
3.2 Intrinsic Supervision with Self-supervised Auxiliary Task
4 Experiments
4.1 Datasets
4.2 Network Architecture and Implementation Details
4.3 Results on VLCS Dataset
4.4 Results on PACS Dataset
4.5 Analysis of Our Method
5 Conclusions
References
Joint Learning of Social Groups, Individuals Action and Sub-group Activities in Videos
1 Introduction
2 Related Work
3 Social Activity Recognition
3.1 Group Activity Recognition Framework
3.2 Social Activity Recognition Framework
4 Datasets
4.1 Group Activity Recognition Datasets
4.2 Social Activity Recognition Dataset
5 Experimental Results
5.1 Group Activity Recognition
5.2 Social Activity Recognition
6 Conclusion
References
Whole-Body Human Pose Estimation in the Wild
1 Introduction
2 Related Work
2.1 2D Keypoint Localization Dataset
2.2 Keypoints Localization Method
3 COCO-WholeBody Dataset
3.1 Data Annotation
3.2 Evaluation Protocol and Evaluation Metrics
3.3 Dataset Statistics
4 ZoomNet: Whole-Body Pose Estimation
4.1 Localizing Body Keypoints and Face/hand Boxes with BodyNet
4.2 Face/hand Keypoint Estimation with HandHead and FaceHead
5 Experiments
5.1 Evaluation on COCO-WholeBody Dataset
5.2 Cross-Dataset Evaluation
5.3 Analysis
6 Conclusion
References
Relative Pose Estimation of Calibrated Cameras with Known SE(3) Invariants
1 Introduction
2 Related Works
3 Preliminaries
3.1 Notation
3.2 Epipolar Constraint
3.3 SE(3) Invariants
4 Minimal Problem Formulations
5 Solution Formulations for Relative Pose Estimation
5.1 Solutions by Decomposing E
5.2 Solutions by Constraining E
6 Minimal Relative Pose Solvers with SE(3) Constraints
6.1 5P, 4P-RA, 5P-ST1 and 4P-RA-ST1
6.2 4P-ST0
6.3 3P-RA-ST0
7 Experiments
7.1 Implementation Details
7.2 Synthetic Data
7.3 Real-World Data
8 Conclusions
References
Sequential Convolution and Runge-Kutta Residual Architecture for Image Compressed Sensing
1 Introduction
2 Preliminaries
2.1 Compressed Sensing
2.2 Data-Driven Methods for Image Compressed Sensing
2.3 Residual Neural Network
2.4 ResNet and ODEs
3 The Proposed Model
3.1 Sequential Convolutional Module
3.2 Learned Runge-Kutta Block
3.3 The Overall Structure
4 Experimental Studies
4.1 Weights Initialization
4.2 Datasets and Implementation Details
4.3 Experimental Results
4.4 Ablation Studies
5 Conclusion
References
Deep Hough Transform for Semantic Line Detection
1 Introduction
2 Related Work
3 Deep Hough Transform for Line Detection
3.1 Line Parameterization and Reverse
3.2 Feature Transformation with Deep Hough Transform
3.3 Line Detection in the Parametric Space
3.4 Reverse Mapping
4 The Proposed Evaluation Metric
4.1 Review of Existing Metrics
4.2 The Proposed Metric
5 Experiments
5.1 Implementation Details
5.2 Evaluation Protocol
5.3 Grid Search for Quantization Interval
5.4 Comparisons
5.5 Ablation Study
6 Conclusions
References
Structured Landmark Detection via Topology-Adapting Deep Graph Learning
1 Introduction
2 Related Work
3 Method
3.1 Cascaded GCNs
3.2 Graph Signal with Appearance and Shape Information
3.3 Landmark Graph with Learnable Connectivity
3.4 Training
4 Experiments
4.1 Datasets
4.2 Experiment Settings
4.3 Comparison with the SOTA Methods
4.4 Graph Structure Visualization
4.5 Ablation Studies
5 Conclusion
References
3D Human Shape and Pose from a Single Low-Resolution Image with Self-Supervised Learning
1 Introduction
2 Related Work
3 Algorithm
3.1 3D Human Representation
3.2 Resolution-Aware 3D Human Estimation
3.3 Self-Supervision
3.4 Contrastive Learning
4 Experiments
4.1 Implementation Details
4.2 Comparison to State-of-the-Art Methods
4.3 Ablation Study
5 Conclusion
References
Learning to Balance Specificity and Invariance for In and Out of Domain Generalization
1 Introduction
2 Related Work
3 Approach
3.1 Problem Setup
3.2 Activation or Feature Selection via Domain-Specific Masks
4 Experiments
4.1 Experimental Settings
4.2 Results
5 Analysis
6 Conclusion
References
Contrastive Learning for Unpaired Image-to-Image Translation
1 Introduction
2 Related Work
3 Methods
4 Experiments
4.1 Unpaired Image Translation
4.2 Ablation Study and Analysis
4.3 High-Resolution Single Image Translation
5 Conclusion
Appendix A Additional Image-to-Image Results
A.1 Additional Comparisons
B.2 Additional Datasets
Appendix B Additional Single Image Translation Results
Appendix C Unpaired Translation Details and Analysis
C.1 Training Details
C.2 Evaluation Details
C.3 Pseudocode
C.4 Distribution Matching
C.5 Additional Ablation Studies
References
DLow: Diversifying Latent Flows for Diverse Human Motion Prediction
1 Introduction
2 Related Work
3 Diversifying Latent Flows (DLow)
4 Diverse Human Motion Prediction
4.1 Diversity Sampling with DLow
5 Experiments
5.1 Quantitative Results
5.2 Qualitative Results
6 Conclusion
References
GRNet: Gridding Residual Network for Dense Point Cloud Completion
1 Introduction
2 Related Work
3 Gridding Residual Network
3.1 Overview
3.2 Gridding
3.3 3D Convolutional Neural Network
3.4 Gridding Reverse
3.5 Cubic Feature Sampling
3.6 Multi-layer Perceptron
3.7 Gridding Loss
4 Experiments
4.1 Datasets
4.2 Evaluation Metrics
4.3 Implementation Details
4.4 Shape Completion on ShapeNet
4.5 Shape Completion on Completion3D
4.6 Shape Completion on KITTI
4.7 Ablation Study
5 Conclusion
References
Gait Lateral Network: Learning Discriminative and Compact Representations for Gait Recognition
1 Introduction
2 Related Work
3 Our Approach
3.1 Lateral Connections
3.2 Compact Block
3.3 Training Strategy
4 Experiment
4.1 Settings
4.2 Performance Comparison
4.3 Ablation Study
5 Conclusion
References
Blind Face Restoration via Deep Multi-scale Component Dictionaries
1 Introduction
2 Related Work
2.1 Single Image Restoration
2.2 Reference-Based Image Restoration
3 Proposed Method
3.1 Off-Line Generation of Component Dictionaries
3.2 Deep Face Dictionary Network
3.3 Model Objective
4 Experiments
4.1 Training Details
4.2 Results on Synthetic Images
4.3 Ablation Study
5 Conclusion
References
Robust Neural Networks Inspired by Strong Stability Preserving Runge-Kutta Methods
1 Introduction
2 Background and Related Work
2.1 Neural Networks and Differential Equations
2.2 Robust Machine Learning and Adversarial Attacks
3 Strong Stability Preserving Networks
3.1 Motivation of Strong Stability Preserving Method
3.2 Strong Stability Preserving Networks
4 Experiments
4.1 Experimental Setup
4.2 Evaluation on MNIST with Standard Training
4.3 SSP with Adversarial Training
5 Conclusion
References
Inequality-Constrained and Robust 3D Face Model Fitting
1 Introduction
1.1 Related Work
2 Inequality-Constrained 3D Model Fitting
2.1 Background and Notation
2.2 Inequality Constraints
2.3 Objective Function
2.4 Optimization
3 Experimental Validation
3.1 Experimental Setup
3.2 Results
4 Conclusions and Future Work
References
Gabor Layers Enhance Network Robustness
1 Introduction
2 Related Work
3 Methodology
3.1 Convolutional Gabor Filter as a Layer
3.2 Implementation of the Gabor Layer
3.3 Regularization
3.4 Lipschitz Constant Regularization
4 Experiments
4.1 Implementation Details
4.2 Robustness Assessment
4.3 Performance of Gabor-Layered Architectures
4.4 Distribution of Singular Values
4.5 Robustness in Gabor-Layered Architectures
4.6 Effects of Adversarial Training
5 Conclusions
References
Conditional Image Repainting via Semantic Bridge and Piecewise Value Function
1 Introduction
2 Related Works
3 Preliminaries
3.1 Object-Driven Attention for Content Generation
3.2 Segmentation-Based Adversarial Training for Compositing
4 Conditional Image Repainting
4.1 Semantic-Bridge Attention for Content Generation
4.2 Piecewise Value Function for Content Compositing
4.3 Network Architecture Design
4.4 Learning
5 Experiments
5.1 Content Generation
5.2 Content Compositing
5.3 Qualitative Study
6 Conclusion
References
Learnable Cost Volume Using the Cayley Representation
1 Introduction
2 Related Work
3 Learnable Correlation Volume
3.1 Vanilla Cost Volume
3.2 Learnable Cost Volume
3.3 Learning with the Cayley Representation
3.4 Interpretation
3.5 Relation with the Weighted Sum of Squared Difference
4 Experiments
4.1 Supervised Optical Flow Estimation
4.2 Unsupervised Optical Flow Estimation
4.3 Ablation Study
4.4 Robustness Analysis
5 Conclusions
References
HALO: Hardware-Aware Learning to Optimize
1 Introduction
2 Related Works
3 The Proposed HALO Framework
3.1 Faster and Better: A Jacobian-Regularized Learned Optimizer
3.2 More Hardware-Efficient: Stochastic Structural Sparsity
4 Experiments and Analysis
4.1 Experiment Setup
4.2 Ablation Studies of the Proposed HALO
4.3 HALO Under Different Datasets/Optimizees
5 Conclusions
References
Structured3D: A Large Photo-Realistic Dataset for Structured 3D Modeling
1 Introduction
2 Related Work
3 A Unified Representation of 3D Structure
3.1 The ``Primitive + Relationship'' Representation
3.2 Relation to Existing Models
4 The Structured3D Dataset
4.1 Extraction of Structured 3D Models
4.2 Photo-Realistic 2D Rendering
4.3 Use Cases
5 Experiments
5.1 Experiment Setup
5.2 Experiment Results
6 Conclusion
References
BroadFace: Looking at Tens of Thousands of People at once for Face Recognition
1 Introduction
2 Related Works
3 Proposed Method
3.1 Typical Learning
3.2 BroadFace
3.3 Discussion
4 Experiments
4.1 Implementation Details
4.2 Evaluations on Face Recognition
4.3 Evaluations on Image Retrieval
4.4 Analysis of BroadFace
5 Conclusion
References
Interpretable Visual Reasoning via Probabilistic Formulation Under Natural Supervision
1 Introduction
2 Related Work
2.1 Visual Question Answering and Reasoning
2.2 Hybrid Transparent with Bayesian Interpretation
3 Method
3.1 Model Definition
3.2 Learning
3.3 Intuitive Explanation
3.4 Parametrization and Implementation
4 Experiments
4.1 Datasets
4.2 Evaluation on Real-World Datasets
4.3 Evaluation on Synthetic Datasets
4.4 Discussion
5 Conclusion
References
Domain Adaptive Semantic Segmentation Using Weak Labels
1 Introduction
2 Related Work
3 Domain Adaptation with Weak Labels
3.1 Problem Definition
3.2 Algorithm Overview
3.3 Weak Labels for Category Classification
3.4 Weak Labels for Feature Alignment
3.5 Network Optimization
3.6 Acquiring Weak Labels
4 Experimental Results
4.1 Comparison with State-of-the-art Methods
4.2 Weakly-Supervised Domain Adaptation (WDA)
4.3 Ablation Study
5 Conclusions
References
Knowledge Distillation Meets Self-supervision
1 Introduction
2 Related Work
3 Methodology
3.1 Preliminaries
3.2 Learning SSKD
3.3 Imperfect Self-supervised Predictions
4 Experiments
4.1 Ablation Study
4.2 Benchmark
4.3 Further Analysis
5 Conclusion
References
Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions
1 Introduction
2 Related Work
3 Sparse Neighbourhood Consensus Networks
3.1 Review: Neighbourhood Consensus Networks
3.2 Sparse-NCNet: Efficient Neighbourhood Consensus Networks
3.3 Match Relocalization by Guided Search
4 Experimental Evaluation
4.1 HPatches Sequences
4.2 InLoc Benchmark
4.3 Aachen Day-Night
5 Conclusion
References
Reconstructing the Noise Variance Manifold for Image Denoising
1 Introduction
2 Related Work
2.1 Image Prior Based Methods
2.2 Discriminative Deep Learning Methods
2.3 Generative Models
3 Our Method
3.1 Image Noise Modeling in Real-World Images
3.2 Conditional Image Generation
3.3 Image Denoising Based on Noise Variance Manifold Reconstruction
4 Experimental Results
4.1 Training Settings
4.2 Comparisons on Real-World Images
5 Conclusions
References
Occlusion-Aware Depth Estimation with Adaptive Normal Constraints
1 Introduction
2 Related Work
3 Method
3.1 Differentiable Homography Warping
3.2 DepthNet for Initial Depth Prediction
3.3 Occlusion-Aware RefineNet
4 Datasets and Implementation Details
5 Experiments
5.1 Evaluation Metrics
5.2 Comparisons
5.3 Video Reconstruction
5.4 Ablation Studies
6 Conclusion and Limitations
References
VisualEchoes: Spatial Image Representation Learning Through Echolocation
1 Introduction
2 Related Work
3 Approach
3.1 Echolocation Simulation
3.2 Case Study: Spatial Cues in Echoes
3.3 VisualEchoes Spatial Representation Learning Framework
3.4 Downstream Tasks for the Learned Spatial Representation
4 Experiments
4.1 Transferring VisualEchoes Features for RGB2Depth
4.2 Evaluating on Downstream Tasks
4.3 Qualitative Results
5 Conclusions and Future Work
References
Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval
1 Introduction
2 Related Work
3 Background
4 Approximating Average Precision (AP)
4.1 Smoothing AP
5 Experimental Setup
5.1 Datasets
5.2 Test Protocol
5.3 Implementation Details
6 Results
6.1 Evaluation on Stanford Online Products (SOP)
6.2 Evaluation on VehicleID and INaturalist
6.3 Evaluation on Face Retrieval
6.4 Ablation Study
6.5 Further Discussion
7 Conclusions
References
Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation
1 Introduction
2 Related Works
3 Methods
4 Experiments
4.1 Urban Scene Segmentation Results
4.2 Ablation Studies
4.3 Modified Wide ResNet-38: WR-41
5 Conclusion
References
Spatially Aware Multimodal Transformers for TextVQA
1 Introduction
2 Related Work
3 Background: Multimodal Transformers
3.1 Self-attention Layer
3.2 Limitations
4 Approach
4.1 Graph over Input Tokens
4.2 Spatially aware Self-Attention Layer
4.3 Implementation Details
5 Experiments
5.1 Evaluation on TextVQA Dataset
5.2 Evaluation on ST-VQA
6 Analysis
7 Conclusion
References
Every Pixel Matters: Center-Aware Feature Alignment for Domain Adaptive Object Detector
1 Introduction
2 Related Work
2.1 Object Detection
2.2 UDA for Object Detector
3 Proposed Method
3.1 Algorithm Overview
3.2 Global Feature Alignment
3.3 Center-Aware Alignment
3.4 Overall Objective for Proposed Framework
3.5 Network Architecture and Discussions
4 Experimental Results
4.1 Implementation Details
4.2 Datasets
4.3 Overall Performance
4.4 More Results and Analysis
5 Conclusions
References
URIE: Universal Image Enhancement for Visual Recognition in the Wild
1 Introduction
2 Related Work
2.1 Fragility of Visual Recognition Models
2.2 Recognition of Distorted Images
2.3 Image Restoration
3 URIE: Architecture and Training
3.1 Selective Enhancement Module
3.2 Overall Architecture
3.3 Training Strategy
3.4 Discussion
4 Experiments
4.1 Training Configurations
4.2 Experimental Configurations
4.3 Performance Evaluation
5 Conclusion
References
Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation
1 Introduction
2 Related Work
3 Method
3.1 Overall
3.2 Self-adaptive View Aggregation
3.3 Depth Map Estimator
3.4 Multi-metric Pyramid Depth Map Aggregation
4 Experiments
4.1 Implementation Details
4.2 Benchmarks Results
4.3 Ablation Studies
4.4 Runtime and Memory Performance
5 Conclusion
References
SPL-MLL: Selecting Predictable Landmarks for Multi-label Learning
1 Introduction
2 Related Work
3 Our Algorithm: Selecting Predictable Landmarks for Multi-label Learning
3.1 Explicit Landmark Selection
3.2 Predictable Landmark Classification
3.3 Objective Function
3.4 Optimization
4 Experiments
4.1 Experiment Settings
4.2 Experimental Results
5 Conclusions and Future Work
References
Unpaired Image-to-Image Translation Using Adversarial Consistency Loss
1 Introduction
2 Related Work
3 Method
3.1 Adversarial-Translation Loss
3.2 Adversarial-Consistency Loss
3.3 Other Losses
3.4 Implementation Details
4 Experiments
4.1 Experimental Settings
4.2 Ablation Studies
4.3 Comparison with Baselines
5 Limitations and Discussion
References
Correction to: Unpaired Image-to-Image Translation Using Adversarial Consistency Loss
Correction to: Chapter "Unpaired Image-to-Image Translation Using Adversarial Consistency Loss" in: A. Vedaldi et al. (Eds.): Computer Vision - ECCV 2020, LNCS 12354, https://doi.org/10.1007/978-3-030-58545-7_46
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Computer Vision - ECCV 2020

Description

More details

Other editions

Additional editions

Content

System requirements