Computer Vision - ECCV 2020

Name: Computer Vision - ECCV 2020 | 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XIV
Brand: Springer
Price: 96.29 EUR
Availability: OnlineOnly

16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XIV

Andrea Vedaldi Horst Bischof Thomas Brox Jan-Michael Frahm(Editor)

Springer (Publisher)

Published on 12. November 2020

XLII, 803 pages

E-Book

PDF with digital watermarking

System requirements

978-3-030-58568-6 (ISBN)

€96.29incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Intro
Foreword
Preface
Organization
Contents - Part XIV
SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation
1 Introduction
2 Related Work
3 Method
3.1 Spatial Preservation Module
3.2 Mask-Specialized Regression Branch
3.3 Spatial Mask Prediction Module
3.4 Loss Function
3.5 Single-Stage Video Instance Segmentation
4 Experiments
4.1 Dataset and Implementation Details
4.2 State-of-the-art Comparison
4.3 Ablation Study
4.4 Video Instance Segmentation Results
5 Conclusion
References
SemanticAdv: Generating Adversarial Examples via Attribute-Conditioned Image Editing
1 Introduction
2 Related Work
3 SemanticAdv
3.1 Problem Definition
3.2 Attribute-Conditioned Image Editing
3.3 Generating Semantically Meaningful Adversarial Examples
4 Experiments
4.1 Experimental Setup
4.2 SemanticAdv on Face Identity Verification
4.3 SemanticAdv on Face Landmark Detection
4.4 SemanticAdv on Street-View Semantic Segmentation
5 Conclusions
References
Learning with Noisy Class Labels for Instance Segmentation
1 Introduction
2 Related Works
3 Methodology
3.1 Division of Samples
3.2 Classification Loss
3.3 Reverse Cross Entropy Loss
4 Theoretical Analyses
4.1 Noise Robustness
4.2 Gradients
5 Experiments
5.1 Datasets and Noise Settings
5.2 Implementation Details
5.3 Main Results
5.4 Discussion
6 Conclusion
References
Deep Image Clustering with Category-Style Representation
1 Introduction
2 Related Work
3 Method
3.1 Maximize Mutual Information
3.2 Disentangle Category-Style Information
3.3 Match to Prior Distribution
3.4 The Unified Model
4 Experiments
4.1 Experimental Settings
4.2 Main Result
4.3 Ablation Study
5 Conclusions
References
Self-supervised Motion Representation via Scattering Local Motion Cues
1 Introduction
2 Related Work
3 Proposed Method
3.1 Overview
3.2 Context Guided Motion Upsampling Layer
3.3 Context Guided Motion Network
3.4 Enhancing Action Recognition
3.5 Training Strategy
4 Experimental Results
4.1 Comparison with Other Motion Representation Method
4.2 Action Recognition
4.3 Ablation Study
5 Conclusion
References
Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets
1 Introduction
2 Related Work
3 Our Method
3.1 Network Architecture
3.2 Network Training
4 Datasets
4.1 HC Depth Dataset
4.2 Incremental Dataset Mixing Strategy
5 Experiments
5.1 Experimental Setup
5.2 Experimental Results
6 Conclusions
References
BMBC: Bilateral Motion Estimation with Bilateral Cost Volume for Video Interpolation
1 Introduction
2 Related Work
2.1 Deep-Learning-Based Video Interpolation
2.2 Cost Volume
3 Proposed Algorithm
3.1 Bilateral Motion Estimation
3.2 Motion Approximation
3.3 Frame Synthesis
3.4 Training
4 Experimental Results
4.1 Datasets
4.2 Comparison with the State-of-the-Arts
4.3 Model Analysis
5 Conclusions
References
Hard Negative Examples are Hard, but Useful
1 Introduction
2 Background
3 Triplet Diagram
4 Why Some Triplets are Hard to Optimize
5 Modification to Triplet Loss
6 Experiments and Results
6.1 Hard Negative Triplets During Training
6.2 Generalizability of SCT Features
7 Discussion
References
ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions
1 Introduction
2 Related Work
3 Revisit: 1-Bit Convolution
4 Methodology
4.1 Baseline Network
4.2 ReActNet
4.3 Distributional Loss
5 Experiments
5.1 Experimental Settings
5.2 Comparison with State-of-the-Art
5.3 Ablation Study
5.4 Visualization
6 Conclusions
References
Video Object Detection via Object-Level Temporal Aggregation
1 Introduction
2 Related Work
3 Proposed Approach
3.1 Temporal Aggregation
3.2 Adaptive Keyframe Scheduling
4 Performance Evaluation
4.1 Experiment Settings
4.2 Quantitative Comparisons
4.3 Qualitative Comparisons
4.4 Speed-Accuracy Tradeoffs
4.5 Ablation Studies
5 Concluding Remarks
References
Object Detection with a Unified Label Space from Multiple Datasets
1 Introduction
2 Related Work
3 Training with Heterogeneous Label Spaces
3.1 Preliminaries
3.2 Unifying Label Spaces with a Single Detector
3.3 A Loss Function to Deal with the Ambiguous Label Spaces
3.4 Resolving the Label Space Ambiguity with Pseudo Labeling
3.5 Evaluating a Unified Object Detector
4 Experiments
4.1 Ablation Study
4.2 Comparing Pseudo Labeling with an Upper Bound
4.3 Main Results
5 Conclusions
References
Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D
1 Introduction
2 Related Work
2.1 Monocular Object Detection
2.2 Inference in the Bird's-Eye-View Frame
3 Method
3.1 Lift: Latent Depth Distribution
3.2 Splat: Pillar Pooling
3.3 Shoot: Motion Planning
4 Implementation
4.1 Architecture Details
4.2 Frustum Pooling Cumulative Sum Trick
5 Experiments and Results
5.1 Description of Baselines
5.2 Segmentation
5.3 Robustness
5.4 Zero-Shot Camera Rig Transfer
5.5 Benchmarking Against Oracle Depth
5.6 Motion Planning
6 Conclusion
References
Comprehensive Image Captioning via Scene Graph Decomposition
1 Introduction
2 Related Work
3 Method
3.1 Scene Graph Detection and Decomposition
3.2 Sub-graph Proposal Network
3.3 Decoding Sentences from Sub-graphs
3.4 Training and Inference
4 Experiments
4.1 Ablation Study
4.2 Accurate and Diverse Image Captioning
4.3 Grounded Image Captioning
4.4 Controllable Image Captioning
5 Conclusion
References
Symbiotic Adversarial Learning for Attribute-Based Person Search
1 Introduction
2 Related Works
3 Symbiotic Adversarial Learning (SAL)
3.1 Multi-modal Common Space Embedding Base
3.2 Middle-Level Granularity-Consistent Cycle Generation
3.3 High-Level Common Space Alignment with Augmented Adversarial Learning
3.4 Symbiotic Training Scheme for SAL
4 Experiments
4.1 Comparisons to the State-of-the-Arts
4.2 Further Analysis and Discussions
5 Conclusion
References
Amplifying Key Cues for Human-Object-Interaction Detection
1 Introduction
2 Related Work
3 Method
3.1 Overview
3.2 Model Architecture
3.3 Training and Inference
4 Experiments
4.1 Experimental Setup
4.2 Ablation Studies
4.3 Results and Comparisons
5 Conclusions
References
Rethinking Few-Shot Image Classification: A Good Embedding is All You Need?
1 Introduction
2 Related Works
3 Method
3.1 Problem Formulation
3.2 Learning Embedding Model Through Classification
3.3 Sequential Self-distillation
4 Experiments
4.1 Setup
4.2 Results on ImageNet Derivatives
4.3 Results on CIFAR Derivatives
4.4 Results on Meta-Dataset
4.5 Embeddings from Self-supervised Representation Learning
4.6 Ablation Experiments
4.7 Effects of Distillation
4.8 Choice of Base Classifier
4.9 Comparsions of Different Network Backbones
4.10 Multi-task vs Multi-way Classification?
References
Adversarial Background-Aware Loss for Weakly-Supervised Temporal Activity Localization
1 Introduction
2 Related Work
3 Method
3.1 Feature Embedding
3.2 Angular Center Loss with a Pair of Triplets (ACL-PT)
3.3 Adopting an Adversarial Approach (A2CL-PT)
3.4 Classification Loss
3.5 Classification and Localization
4 Experiments
4.1 Datasets and Evaluation
4.2 Implementation Details
4.3 Comparisons with the State-of-the-Art
4.4 Ablation Study and Analysis
4.5 Qualitative Analysis
5 Conclusion
References
Action Localization Through Continual Predictive Learning
1 Introduction
2 Related Work
3 Self-supervised Action Localization
3.1 Feature Extraction and Spatial Region Proposal
3.2 Self-supervised Future Prediction
3.3 Prediction Error-Based Attention Map
3.4 Extraction of Action Tubes
3.5 Implementation Details
4 Experimental Setup
4.1 Data
4.2 Metrics and Baselines
5 Quantitative Evaluation
5.1 Quality of Localization Proposals
5.2 Spatial-Temporal Action Localization
5.3 Comparison with Other LSTM-Based Approaches
5.4 Ablative Studies
5.5 Unsupervised Egocentric Gaze Prediction
5.6 Qualitative Evaluation
6 Conclusion
References
Generative View-Correlation Adaptation for Semi-supervised Multi-view Learning
1 Introduction
2 Related Work
2.1 Multi-view Learning
2.2 Semi-supervised Learning
3 Our Approach
3.1 Preliminaries and Motivation
3.2 Semi-supervised Mixup
3.3 Dual-Level View-Correlation Adaptation
3.4 Label-Level Fusion
4 Experiments
4.1 Dataset
4.2 Baselines
4.3 Implementation
4.4 Performance
4.5 Ablation Study
5 Conclusion
References
READ: Reciprocal Attention Discriminator for Image-to-Video Re-identification
1 Introduction
2 Related Work
3 Proposed Method
3.1 Image Embedding Network
3.2 Video Embedding Network
3.3 Reciprocal Attention Discriminator (READ)
3.4 Sampling
3.5 Training Objective
4 Experiments
4.1 Benchmark
4.2 Methods to be Compared
4.3 Implementation Detail
5 Results
5.1 Ablation Study
5.2 Comparison
5.3 Analysis
5.4 Visualization
5.5 Computational Cost
6 Conclusion
References
3D Human Shape Reconstruction from a Polarization Image
1 Introduction
2 Related Work
3 The Proposed SfP Approach
3.1 Surface Normal Estimation
3.2 Human Pose and Shape Estimation
3.3 Polarization Human Pose and Shape Dataset
4 Empirical Evaluations
4.1 Evaluation of Surface Normal Estimation
4.2 Evaluation of Pose and Shape Estimation
5 Conclusion
References
The Devil Is in the Details: Self-supervised Attention for Vehicle Re-identification
1 Introduction
2 Related Works
3 Self-supervised Attention for Vehicle Re-identification
3.1 Self-supervised Residual Generation
3.2 Deep Feature Extraction
3.3 End-To-End Training
4 Experiments
4.1 Vehicle Re-identification Datasets
4.2 Implementation Details
4.3 Experimental Evaluation
5 Ablation Studies
5.1 Residual Generation Techniques
5.2 Incorporating Residual Information
6 Conclusion
References
Improving One-Stage Visual Grounding by Recursive Sub-query Construction
1 Introduction
2 Related Work
3 Approach
3.1 Sub-query Learner
3.2 Sub-query Modulation
3.3 Framework Details
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Quantitative Results
4.4 Performance Break-Down Studies
4.5 Ablation Studies
4.6 Qualitative Results
5 Conclusions
References
Multi-level Wavelet-Based Generative Adversarial Network for Perceptual Quality Enhancement of Compressed Video
1 Introduction
2 Related Work
3 Motivation for WPT
4 The Proposed MW-GAN
4.1 Multi-level Wavelet-Based Generator
4.2 Multi-level Wavelet-Based Discriminator
4.3 Loss Functions
5 Experiments
5.1 Settings
5.2 Quantatative Comparison
5.3 Subjective Comparison
5.4 Ablation Study
6 Generalization Ability
7 Conclusion
References
Example-Guided Image Synthesis Using Masked Spatial-Channel Attention and Self-supervision
1 Introduction
2 Related Work
3 Method
3.1 Feature Extraction
3.2 Masked Spatial-Channel Attention Module
3.3 Image Synthesis
3.4 Self-supervised Training
4 Experiments
5 Conclusion
References
Content-Consistent Matching for Domain Adaptive Semantic Segmentation
1 Introduction
2 Related Works
3 Content-Consistent Matching
3.1 Semantic Layout Matching
3.2 Pixel-Wise Similarity Matching
3.3 Active Matching with Self-training
3.4 Objective
4 Experiments
4.1 Dataset and Evaluation Metric
4.2 Implementation Detail
4.3 Comparison with the State-of-the-arts
4.4 Ablation Studies
5 Conclusion
References
AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting
1 Introduction
2 Related Work
3 Proposed Method
3.1 Overall Architecture
3.2 Text Detection Module
3.3 Character-Based Recognition Module
3.4 Language Module
3.5 Loss Function
4 Experiment
4.1 Datasets
4.2 Implementation Details
4.3 Ablation Study
4.4 Comparisons with State-of-the-Art Methods
4.5 Time Cost Analysis of AE TextSpotter
4.6 Discussion
5 Conclusion and Future Work
References
History Repeats Itself: Human Motion Prediction via Motion Attention
1 Introduction
2 Related Work
3 Our Approach
3.1 Motion Attention Model
3.2 Prediction Model
3.3 Training
3.4 Network Structure
4 Experiments
4.1 Datasets
4.2 Evaluation Metrics and Baselines
4.3 Results
5 Conclusion
References
Unsupervised Video Object Segmentation with Joint Hotspot Tracking
1 Introduction
2 Related Work
3 Methodology
3.1 Target Object Initialization
3.2 Object Segmentation and Hotspot Tracking
3.3 Network Training
4 Experiments
4.1 Dataset and Metrics
4.2 Evaluation on Unsupervised Video Object Segmentation
4.3 Evaluation on Hotspot Tracking
4.4 Ablation Study
5 Conclusion
References
SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach
1 Introduction
2 Related Work
3 Method
4 Datasets and Rare-Pose Evaluation Protocols
4.1 Datasets and Evaluation Metrics
4.2 Evaluation Protocols
5 Experiments
5.1 Ablation Study
5.2 Comparison with State-of-The-Art Methods
6 Conclusion
References
CAFE-GAN: Arbitrary Face Attribute Editing with Complementary Attention Feature
1 Introduction
2 Related Work
3 CAFE-GAN
3.1 Discriminator
3.2 Generator
3.3 Model Objective
4 Experiments
4.1 Experimental Setup
4.2 Qualitative Result
4.3 Quantitative Result
4.4 Analysis of CAFE
5 Conclusion
References
MimicDet: Bridging the Gap Between One-Stage and Two-Stage Object Detection
1 Introduction
2 Related Work
3 MimicDet Framework
3.1 Backbone and Staggered Feature Pyramid
3.2 Refinement Module
3.3 Detection Heads
3.4 Head Mimicking
4 Implementation Details
4.1 Training
4.2 Inference
5 Experiments
5.1 Ablation Study
5.2 Comparison with State-of-the-art Methods
6 Conclusion
References
Latent Topic-Aware Multi-label Classification
1 Introduction
2 Topic-Aware Multi-Label Classification-TMLC
2.1 Preliminaries
2.2 The Overview of TMLC
2.3 Topic-Aware Data Factorization
2.4 Inter-topic Correlation
2.5 Topic-Aware Label-Specific Feature Extraction
2.6 Topic-Aware Instance-Specific Sample Extraction
2.7 Optimization
3 Relations to Previous Works and Discussions
4 Experiments
4.1 Datasets
4.2 Evaluation Metrics
4.3 Methods
4.4 Experimental Results
4.5 Parameter Analysis
5 Conclusion
References
Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning
1 Introduction
2 Related Work
3 Method
3.1 Mirrored Viewpoint-Adapted Matching Encoder
3.2 Sentence Decoder
3.3 Learning Process
4 Experiment
4.1 Datasets and Metrics
4.2 Training Details
4.3 Model Variations
4.4 Results
5 Conclusion
References
Attract, Perturb, and Explore: Learning a Feature Alignment Network for Semi-supervised Domain Adaptation
1 Introduction
2 Related Work
2.1 Unsupervised Domain Adaptation
2.2 Semi-supervised Learning
2.3 Semi-supervised Domain Adaptation
3 Intra-domain Discrepancy
4 Method
4.1 Problem Formulation
4.2 Spherical Feature Space with Prototypes
4.3 Attraction Scheme
4.4 Perturbation Scheme
4.5 Exploration Scheme
4.6 Overall Framework and Training Objective
5 Experiments
5.1 Experimental Setup
5.2 Results
5.3 Analysis
6 Conclusions
References
Curriculum Manager for Source Selection in Multi-source Domain Adaptation
1 Introduction
2 Related Work
3 Preliminaries
4 CMSS: Curriculum Manager for Source Selection
4.1 CMSS: Theoretical Insights
5 Experimental Results
5.1 Experiments on Digit Recognition
5.2 Experiments on DomainNet
5.3 Experiments on PACS
5.4 Experiments on Office-Caltech10
5.5 Comparison with Other Re-weighting Methods
6 Interpretations
6.1 Visualizations of Source Selection
6.2 Selection over Time
7 Conclusion
References
Powering One-Shot Topological NAS with Stabilized Share-Parameter Proxy
1 Introduction
2 Related Work
3 Approach
3.1 Topology Augmented Search Space
3.2 Training the One-Shot Hyper-network
3.3 Stabilizing Performance Estimation
3.4 Evolution Algorithm
4 Experiments and Results
4.1 Experiments Settings
4.2 Main Results
4.3 Ablation Studies
5 Conclusion
References
Classes Matter: A Fine-Grained Adversarial Approach to Cross-Domain Semantic Segmentation
1 Introduction
2 Related Work
2.1 Semantic Segmentation
2.2 Domain Adaptation
2.3 Domain Adaptive Semantic Segmentation
3 Method
3.1 Revisit Traditional Feature Alignment
3.2 Fine-Grained Adversarial Learning
3.3 Extracting Class Knowledge for Domain Encodings
4 Experiments
4.1 Datasets
4.2 Evaluation Metrics
4.3 Implementation Details
4.4 Comparison with State-of-the-art Methods
4.5 Feature Distribution
4.6 Ablation Studies
5 Conclusion
References
Boundary-Preserving Mask R-CNN
1 Introduction
2 Related Work
3 Boundary-Preserving Mask R-CNN
3.1 Motivation
3.2 Boundary-Preserving Mask Head
3.3 Learning and Optimization
4 Experiments
4.1 Overall Results
4.2 Ablation Experiments
4.3 Experiments on Cityscapes
4.4 Discussions
4.5 Qualitative Results
5 Conclusion
References
Self-supervised Single-View 3D Reconstruction via Semantic Consistency
1 Introduction
2 Related Work
3 Approach
3.1 Resolving Camera-Shape Ambiguity via Semantic Consistency
3.2 Progressive Training
3.3 Texture Cycle Consistency Constraint
4 Experimental Results
4.1 Experimental Settings
4.2 Qualitative Results
4.3 Quantitative Evaluations
4.4 Ablation Studies
5 Failure Case and Limitations
6 Conclusion
References
MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down Distillation
1 Introduction
2 Related Work
3 Approach
3.1 Background: Knowledge Distillation
3.2 Network Self-Boosting
3.3 Top-Down Distillation
3.4 Meta-Learned Soft Teacher Label Generator
3.5 Training and Inference
4 Experimental Results
4.1 Setups
4.2 CIFAR-100
4.3 ILSVRC2012
4.4 Comparison with Traditional Distillation
4.5 Ablation Study
4.6 Visualization and Discussion
5 Conclusion
References
Learning Monocular Visual Odometry via Self-Supervised Long-Term Modeling
1 Introduction
2 Related Work
3 Method
3.1 Background
3.2 Cycle Consistency Within Memory-Aided Sequential Modeling
3.3 Long-Range Constraints via Stage-Wise Training
4 Experimental Results
4.1 Settings
4.2 Results
5 Conclusions
References
The Devil Is in Classification: A Simple Framework for Long-Tail Instance Segmentation
1 Introduction
2 Related Works
3 Analysis: Performance Drop on Long-Tail Distribution
4 Solutions: Alleviating Classification Bias
4.1 Using Existing Long-Tail Classification Approaches
4.2 Proposed SimCal: Calibrating the Classifier
5 Experiments
5.1 Datasets and Metrics
5.2 Evaluating Adapted Existing Classification Methods
5.3 Evaluating Proposed SimCal
5.4 Model Design Analysis of SimCal
5.5 Generalizability Test of SimCal
6 Conclusions
References
What Is Learned in Deep Uncalibrated Photometric Stereo?
1 Introduction
2 Related Work
3 Learning for Lighting Calibration
4 Guided Calibration Network
4.1 Guided Feature Extraction
4.2 Network Architecture
5 Experimental Results
5.1 Evaluation on Synthetic Data
5.2 Evaluation on Real Data
5.3 Failure Cases
6 Conclusions
References
Prior-Based Domain Adaptive Object Detection for Hazy and Rainy Conditions
1 Introduction
2 Related Work
3 Proposed Method
3.1 Detection Network
3.2 Prior-Adversarial Training
3.3 Residual Feature Recovery Block
3.4 Overall Loss
4 Experiments and Results
4.1 Implementation Details
4.2 Adaptation to Hazy Conditions
4.3 Adaptation to Rainy Conditions
5 Conclusions
References
Adversarial Ranking Attack and Defense
1 Introduction
2 Related Works
3 Adversarial Ranking
3.1 Candidate Attack
3.2 Query Attack
3.3 Robustness and Defense
4 Experiments
4.1 MNIST Dataset
4.2 Fashion-MNIST Dataset
4.3 Stanford Online Products Dataset
5 Discussions
5.1 Adversarial Example Transferability
5.2 Universal Perturbation for Ranking
6 Conclusion
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Computer Vision - ECCV 2020

Description

More details

Other editions

Additional editions

Content

System requirements