Computer Vision - ECCV 2020

Name: Computer Vision - ECCV 2020 | 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XV
Brand: Springer
Price: 96.29 EUR
Availability: OnlineOnly

16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XV

Andrea Vedaldi Horst Bischof Thomas Brox Jan-Michael Frahm(Editor)

Springer (Publisher)

Published on 15. November 2020

XLII, 795 pages

E-Book

PDF with digital watermarking

System requirements

978-3-030-58555-6 (ISBN)

€96.29incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Intro
Foreword
Preface
Organization
Contents - Part XV
ReDro: Efficiently Learning Large-Sized SPD Visual Representation
1 Introduction
2 Related Work
3 The Proposed Relation Dropout (ReDro)
3.1 Forward Propagation in the Presence of ReDro
3.2 Backward Propagation in the Presence of ReDro
3.3 Discussion
4 Experimental Result
4.1 On the Computational Advantage of ReDro
4.2 On the Efficiency of ReDro Versus Its Intensity Level
4.3 On the Performance of ReDro with Typical Methods
4.4 Ablation Study on the Group Number k
5 Conclusion
References
Graph-Based Social Relation Reasoning
1 Introduction
2 Related Work
3 Approach
3.1 Revisiting the Paradigm of Social Relation Recognition
3.2 From Image to Graph
3.3 Graph Relational Reasoning Network
3.4 Discussion
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Results and Analysis
5 Conclusion
References
EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection
1 Introduction
2 Related Work
3 Method
3.1 Two-Stream RPN
3.2 Refinement Network
3.3 Consistency Enforcing Loss
3.4 Overall Loss Function
4 Experiments
4.1 Datasets and Evaluation Metric
4.2 Implementation Details
4.3 Ablation Study
4.4 Experiments on KITTI Dataset
4.5 Experiments on SUN-RGBD Dataset
5 Conclusion
References
Self-Supervised Monocular 3D Face Reconstruction by Occlusion-Aware Multi-view Geometry Consistency
1 Introduction
2 Related Work
2.1 Single-View Method
2.2 Multi-view or Video Based Method
3 Method
3.1 Overview
3.2 Model
3.3 2D Feature Loss
3.4 Occlusion-Aware View Synthesis
3.5 Pixel Consistency Loss
3.6 Dense Depth Consistency Loss
3.7 Facial Epipolar Loss
3.8 Combined Loss
4 Experiment
4.1 Implementation Details
4.2 Qualitative Result
4.3 2D Face Alignment
4.4 3D Face Reconstruction
4.5 Ablation Study
5 Conclusion
References
Asynchronous Interaction Aggregation for Action Detection
1 Introduction
2 Related Works
3 Proposed Method
3.1 Instance Level and Temporal Memory Features
3.2 Interaction Modeling and Aggregation
3.3 Asynchronous Memory Update Algorithm
4 Experiments on AVA
4.1 Implementation Details
4.2 Ablation Experiments
4.3 Main Results
5 Experiments on UCF101-24
6 Experiments on EPIC-Kitchens
7 Conclusion
References
Shape and Viewpoint Without Keypoints
1 Introduction
2 Related Work
3 Approach
3.1 Preliminaries
3.2 Our Method
4 Experiments
4.1 Experimental Detail
4.2 Qualitative Evaluation
4.3 Quantitative Evaluation
4.4 Evaluations on Other Categories
4.5 Limitations
5 Conclusion
References
Learning Attentive and Hierarchical Representations for 3D Shape Recognition
1 Introduction
2 Related Work
3 Proposed Method
3.1 Framework
3.2 Hybrid Attentions
3.3 Hierarchical Representation Learning
4 Experimental Results and Analysis
4.1 3D Shape Classification and Retrieval
4.2 Sketch-Based 3D Shape Retrieval
4.3 Ablation Study
5 Conclusion
References
TF-NAS: Rethinking Three Search Freedoms of Latency-Constrained Differentiable Neural Architecture Search
1 Introduction
2 Related Work
3 Our Method
3.1 Review of Differentiable NAS
3.2 The Search Space
3.3 Three-Freedom NAS
4 Experiments
4.1 Dataset and Settings
4.2 Comparisons with Current SOTA
4.3 Analyses of Bi-sampling Search Algorithm
4.4 Analyses of Sink-connecting Search Space
4.5 Analyses of Elasticity-scaling Strategy
5 Conclusion
References
Associative3D: Volumetric Reconstruction from Sparse Views
1 Introduction
2 Related Work
3 Approach
3.1 Object Branch
3.2 Camera Branch
3.3 Stitching Object and Camera Branches
4 Experiments
4.1 Experimental Setup
4.2 Full Scene Evaluation
4.3 Inter-view Object Affinity Matrix
4.4 Stitching Stage
4.5 Failure Cases
4.6 Results on NYU Dataset
5 Conclusion
References
PlugNet: Degradation Aware Scene Text Recognition Supervised by a Pluggable Super-Resolution Unit
1 Introduction
2 Related Works
3 Approach
3.1 Overall Framework
3.2 Pluggable Super-Resolution Unit
3.3 Feature Enhancement
3.4 Training and Inference
4 Experiment
4.1 Datasets
4.2 Implementation Details
4.3 Ablation Study
4.4 Experiments on the Parameter
4.5 Comparison with State of the Art
5 Conclusion
References
Memory Selection Network for Video Propagation
1 Introduction
2 Related Work
3 Proposed Method
3.1 Overview
3.2 Memory Pool Construction
3.3 Memory Selection Network
3.4 Video Propagation Frameworks
3.5 Training Pipeline
4 Experiments
4.1 Comparison with State-of-the-arts
4.2 Ablation Study
5 Conclusion
References
Disentangled Non-local Neural Networks
1 Introduction
2 Related Works
3 Non-local Networks in Depth
3.1 Dividing Non-local Block into Pairwise and Unary Terms
3.2 What Visual Clues Are Expected to Be Learnt by Pairwise and Unary Terms?
3.3 Does the Non-local Block Learn Visual Clues Well?
3.4 Why the Non-Local Block Does Not Learn Visual Clues Well?
4 Disentangled Non-local Neural Networks
4.1 Formulation
4.2 Behavior of DNL on Learning Visual Clues
5 Experiments
5.1 Semantic Segmentation
5.2 Object Detection/Segmentation and Action Recognition
6 Conclusion
References
URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark
1 Introduction
2 Related Work
3 Refer-Youtube-VOS Dataset
4 Unified Referring VOS Network
4.1 Our Framework
5 Experiments
5.1 Implementation Details
5.2 Evaluation Metrics
5.3 Quantitative Results
5.4 Qualitative Results
5.5 Analysis
6 Conclusion
References
Generalizing Person Re-Identification by Camera-Aware Invariance Learning and Cross-Domain Mixup
1 Introduction
2 Related Work
3 Method
3.1 Overview
3.2 Camera-Aware Neighborhood Invariance
3.3 Cross-Domain Mixup
3.4 Overall Loss Function
4 Experiment
4.1 Dataset and Evaluation Protocol
4.2 Implementation Details
4.3 Parameter Analysis
4.4 Ablation Study
4.5 Comparison with State-of-the-art Methods
5 Conclusion
References
Semi-supervised Crowd Counting via Self-training on Surrogate Tasks
1 Introduction
2 Related Works
3 Background: Crowd Counting as Density Estimation
4 Methodology
4.1 Using Unlabeled Data for Feature Learning
4.2 Constructing Surrogate Tasks for Feature Learning
4.3 Inter-Relationship-Aware Self-Training (IRAST) for Semi-supervised Training on Surrogate Tasks
5 Overall Training Process
6 Experimental Results
6.1 Experimental Settings
6.2 Datasets and Results
6.3 Ablation Study
7 Conclusions
References
Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training
1 Introduction
2 Related Work
3 Dynamic Quality in the Training Procedure
3.1 Proposal Classification
3.2 Bounding Box Regression
4 Dynamic R-CNN
4.1 Dynamic Label Assignment
4.2 Dynamic SmoothL1 Loss
5 Experiments
5.1 Dataset and Evaluation Metrics
5.2 Implementation Details
5.3 Main Results
5.4 Ablation Experiments
5.5 Studies on the Effect of Hyperparameters
5.6 Universality
5.7 Comparison with State-of-the-Arts
6 Conclusion
References
Boosting Decision-Based Black-Box Adversarial Attacks with Random Sign Flip
1 Introduction
2 Related Work
3 Approach
3.1 Preliminaries
3.2 Threat Models
3.3 Sign Flip Attack
4 Experiments
4.1 Attacks on Undefended Models
4.2 Attacks on Defensive Models
4.3 Attacks on Real-World Applications
5 Conclusion
References
Knowledge Transfer via Dense Cross-Layer Mutual-Distillation
1 Introduction
2 Related Work
3 Proposed Method
3.1 KD and DML
3.2 Dense Cross-Layer Mutual-Distillation
4 Experiments
4.1 Experiments on CIFAR-100
4.2 Experiments on ImageNet
4.3 Deep Analysis of DCM
5 Conclusions
References
Matching Guided Distillation
1 Introduction
2 Related Work
3 Methodology
3.1 Feature Distillation Revisit
3.2 Channel Matching
3.3 Channel Reduction
3.4 Implementation Details
4 Experiments
4.1 Main Results
4.2 Ablation Study
5 Discussion and Future Work
References
Clustering Driven Deep Autoencoder for Video Anomaly Detection
1 Introduction
2 Related Work
2.1 Video Anomaly Detection with Two Stream Networks
2.2 Data Representation and Data Clustering
3 Methods
3.1 Spatial Autoencoder
3.2 Motion Autoencoder
3.3 Variance Attention Module
3.4 Clustering
3.5 Training Objective
3.6 Anomaly Score
4 Experiments
4.1 Video Anomaly Detection Datasets
4.2 Implementation Details
4.3 Evaluation Metric
4.4 Results
4.5 Ablation Study
4.6 Exploration of Cluster Numbers
4.7 Attention Visualization
4.8 Comparison with Optical Flow
5 Conclusion
References
Learning to Compose Hypercolumns for Visual Correspondence
1 Introduction
2 Related Work
3 Dynamic Hyperpixel Flow
3.1 Multi-layer Feature Extraction
3.2 Dynamic Layer Gating
3.3 Correlation Computation and Matching
3.4 Training Objective
4 Experiments
4.1 Results and Comparisons
4.2 Comparison to Soft Layer Gating
5 Conclusion
References
Stochastic Bundle Adjustment for Efficient and Scalable 3D Reconstruction
1 Introduction
2 Related Works
3 Bundle Adjustment Revisited
4 Stochastic Bundle Adjustment
4.1 Clustering Based Reformulation
4.2 Chance Constrained Relaxation
4.3 Steepest Descent Correction
4.4 Stochastic Graph Clustering
5 Experiments
5.1 Experiment Settings
5.2 Performance Profiles
5.3 Results on Large-Scale Dataset
5.4 Ablation Study on Steepest Descent Correction
6 Conclusion
References
Object-Based Illumination Estimation with Rendering-Aware Neural Networks
1 Introduction
2 Related Work
3 Overview
4 Network Structures
5 Training
5.1 Supervision and Training Losses
5.2 Training Data Preparation
5.3 Implementation
6 Results
6.1 Validations
6.2 Comparisons
6.3 Ablation Studies
6.4 Performance
7 Conclusion
References
Progressive Point Cloud Deconvolution Generation Network
1 Introduction
2 Related Work
2.1 Deep Learning on 3D Data
2.2 3D Point Cloud Generation
3 Our Approach
3.1 Progressive Deconvolution Generation Network
3.2 Shape-Preserving Adversarial Loss
4 Experiments
4.1 Experimental Settings
4.2 Evaluation of Point Cloud Generation
4.3 Ablation Study and Analysis
5 Conclusions
References
SSCGAN: Facial Attribute Editing via Style Skip Connections
1 Introduction
2 Related Work
3 Method
3.1 Multiple Skip Connections Architecture
3.2 Style Skip Connections
3.3 Spatial Information Transfer
3.4 Loss Functions
4 Results and Analysis
4.1 Ablation Study
4.2 Comparisons with State-of-the-Arts
5 Conclusions
References
Negative Pseudo Labeling Using Class Proportion for Semantic Segmentation in Pathology
1 Introduction
2 Related Works
3 Negative Pseudo Labeling with Label Proportions
3.1 Pseudo Labeling
3.2 Negative Pseudo Labeling
3.3 Multi Negative Pseudo Labeling
4 Adaptive Pseudo Labeling
5 Experimental Results
5.1 Dataset
5.2 Experiment Settings
5.3 Quantitative Evaluation
5.4 Qualitative Evaluation
6 Conclusion
References
Learn to Propagate Reliably on Noisy Affinity Graphs
1 Introduction
2 Related Work
3 Propagation on Noisy Affinity Graphs
3.1 Problem Statement
3.2 Algorithm Overview
3.3 GCN-Based Local Predictor
3.4 Confidence-Based Path Scheduler
3.5 Training of Local Predictor
4 Experiments
4.1 Experimental Settings
4.2 Method Comparison
4.3 Ablation Study
4.4 Further Analysis
4.5 Applications
5 Conclusion
References
Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search
1 Introduction
2 Related Work
3 The Downside of DARTS
3.1 Preliminary of Differentiable Architecture Search
3.2 Performance Collapse Caused by Intractable Skip Connections
3.3 Non-negligible Discrepancy of Discretization
4 Fair DARTS
4.1 Stepping Out the Pitfalls of Skip Connections
4.2 Resolve Discrepancy from Continuous Representation to Discrete Encoding
5 Experiments and Results
5.1 Searching Architectures for CIFAR-10
5.2 Transferring to ImageNet
5.3 Searching Proxylessly on ImageNet
6 Ablation Study and Analysis
6.1 Removing Skip Connections from S1
6.2 How Does Zero-One Loss Matter?
6.3 Discussions from Fairness Perspective
7 Conclusion
References
TANet: Towards Fully Automatic Tooth Arrangement
1 Introduction
2 Related Work
3 Method
3.1 Overview
3.2 Preprocessing
3.3 Network
3.4 Loss Function
3.5 Implementation and Training Details
4 Experiments
4.1 Dataset
4.2 Evaluation Metric
4.3 Ablation Study
4.4 User Study
4.5 Visualization
5 Discussion
6 Conclusion
References
UnionDet: Union-Level Detector Towards Real-Time Human-Object Interaction Detection
1 Introduction
2 Related Work
2.1 One-Stage Object Detection
2.2 Human-Object Interactions
3 Method
3.1 Challenges in Union-Level Detection
3.2 Union-Level Detector: Union Branch
3.3 Instance-Level Detector: Instance Branch
3.4 Training UnionDet
3.5 HOI Detection Inference
4 Experiments
5 Conclusions
References
GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and Scene-Aware Supervision
1 Introduction
2 Related Work
3 Pose and Shape Representation
4 Network Architecture Design
5 Geometrical and Scene-Aware Supervision
6 Experiments
6.1 Datasets and Experimental Settings
6.2 Ablation Study of Network Architecture and Loss Design
6.3 Comparison with State-of-the-Art Methods
7 Conclusion
References
Resolution Switchable Networks for Runtime Efficient Image Recognition
1 Introduction
2 Related Work
3 Proposed Method
3.1 Multi-resolution Parallel Training
3.2 Multi-resolution Interaction Effects
3.3 Multi-resolution Ensemble Distillation
4 Experiments
4.1 Implementation Details
4.2 Results
4.3 Ablation Study
5 Conclusions
References
SMAP: Single-Shot Multi-person Absolute 3D Pose Estimation
1 Introduction
2 Related Work
3 Methods
3.1 Intermediate Representations
3.2 Depth-Aware Part Association
3.3 3D Pose Reconstruction
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Evaluation Metrics
4.4 Comparison with State-of-the-Art Methods
4.5 Ablation Analysis
5 Conclusion
References
Learning to Detect Open Classes for Universal Domain Adaptation
1 Introduction
2 Related Work
3 Calibrated Multiple Uncertainties
3.1 Limitations of Previous Works
3.2 Multiple Uncertainties
3.3 Uncertainty Calibration
3.4 Calibrated Multiple Uncertainties Framework
4 Experiments
4.1 Setup
4.2 Results
4.3 Analysis
5 Conclusion
References
Visual Compositional Learning for Human-Object Interaction Detection
1 Introduction
2 Related Works
2.1 Human-Object Interaction Detection
2.2 Low-Shot and Zero-Shot Learning
2.3 Feature Disentangling and Composing
3 Visual Compositional Learning
3.1 Overview
3.2 Multi-branch Network
3.3 Composing Interactions
3.4 Training and Inference
4 Experiment
4.1 Datasets and Metrics
4.2 Implementation Details
4.3 Results and Comparisons
4.4 Generalized Zero-Shot HOI Detection
4.5 Ablation Analysis
4.6 Visualization of Features
5 Conclusion
References
Deep Plastic Surgery: Robust and Controllable Image Editing with Human-Drawn Sketches
1 Introduction
2 Related Work
3 The Deep Plastic Surgery Algorithm
3.1 Sketch Refinement via Dilation
3.2 Controllable Sketch-Based Image Editing
4 Experimental Results
4.1 Implementation Details
4.2 Comparisons with State-of-the-Art Methods
4.3 Ablation Study
4.4 Applications
4.5 Limitation and User Interaction
5 Conclusion
References
Rethinking Class Activation Mapping for Weakly Supervised Object Localization
1 Introduction
2 Approach
2.1 Preliminary: Class Activation Mapping (CAM)
2.2 Thresholded Average Pooling (TAP)
2.3 Negative Weight Clamping (NWC)
2.4 Percentile as a Standard for Thresholding (PaS)
3 Related Work
3.1 CAM-Based WSOL Methods
3.2 Spatial Pooling Methods
4 Experiments
4.1 Experiment Setting
4.2 Quantitative Results
4.3 Qualitative Results
5 Conclusion
References
OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features
1 Introduction
2 Preliminaries: Matching Networks
3 The OS2D Model
4 Training the Model
5 Related Works
6 Experiments
6.1 Ablation Study
6.2 Evaluation of OS2D Against Baselines
7 Conclusion
References
Interpretable Neural Network Decoupling
1 Introduction
2 Related Work
3 Architecture Decoupling
3.1 Architecture Controlling Module
3.2 Network Training
4 Experiments
4.1 Implementation Details
4.2 Network Interpretability
4.3 Network Acceleration
4.4 Adversarial Samples Detection
5 Conclusion
References
Omni-Sourced Webly-Supervised Learning for Video Recognition
1 Introduction
2 Related Work
3 Method
3.1 Overview
3.2 Framework Formulation
3.3 Task-Specific Data Collection
3.4 Teacher Filtering
3.5 Transforming to the Target Domain
3.6 Joint Training
4 Datasets
4.1 Target Datasets
4.2 Web Sources
5 Experiments
5.1 Video Architectures
5.2 Verifying the Efficacy of OmniSource
5.3 Comparisons with State-of-the-art
5.4 Validating the Good Practices in OmniSource
6 Conclusion
References
CurveLane-NAS: Unifying Lane-Sensitive Architecture Search and Adaptive Point Blending
1 Introduction
2 Related Work
3 CurveLane-NAS Framework
3.1 Elastic Backbone Search Module
3.2 Feature Fusion Search Module
3.3 Adaptive Point Blending Search Module
3.4 Unified Multi-objective Search
4 Experiments
4.1 New CurveLanes Benchmark
4.2 Other Datasets and Evaluation Metrics
4.3 Lane Detection Results
5 Conclusion
References
Contextual-Relation Consistent Domain Adaptation for Semantic Segmentation
1 Introduction
2 Related Works
3 Methods
3.1 Problem Definition
3.2 Overview of Network Architecture
3.3 Contextual-Relation Consistent Domain Adaptation
3.4 CrCDA with Pixel-/Global-Scale
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Comparison with State-of-Art
4.4 Ablation Studies and Analysis
5 Conclusions
References
Estimating People Flows to Better Count Them in Crowded Scenes
1 Introduction
2 Related Work
3 Approach
3.1 Formalization
3.2 Regressing the Flows
3.3 Exploiting Optical Flow
4 Experiments
4.1 Evaluation Metrics
4.2 Benchmark Datasets and Ground-Truth Data
4.3 Comparing Against Recent Techniques
4.4 Ablation Study
5 Conclusion
References
Generate to Adapt: Resolution Adaption Network for Surveillance Face Recognition
1 Introduction
2 Related Work
3 Methodology
3.1 Framework Overview
3.2 Low-Resolution Face Synthesis
3.3 Loss Function
3.4 Feature Adaption Network
4 Experiments
4.1 Experiment Settings
4.2 Implementation Details
4.3 Ablation Study
4.4 Compare with SOTA Methods
5 Conclusion
References
Learning Feature Embeddings for Discriminant Model Based Tracking
1 Introduction
2 Related Work
2.1 Siamese Network Based Trackers
2.2 Online Discriminatively Trained Trackers
2.3 Meta-Learning Based Few-Shot Learning
3 Learning Feature Embeddings
3.1 Features Extraction Network
3.2 Discriminant Model Solver
3.3 Fast Convergence with Shrinkage Loss
4 Online Tracking with Learned Feature Embeddings
4.1 Features Extraction
4.2 Online Learning and Update
4.3 Localization and Refine
5 Experiments
5.1 Implementation Details
5.2 Ablation Studies
5.3 State-of-the-Art Comparisons
6 Conclusion
References
WeightNet: Revisiting the Design Space of Weight Networks
1 Introduction
2 Related Work
3 WeightNet
3.1 Rethinking CondConv
3.2 Rethinking SENet
3.3 WeightNet Structure
4 Experiments
4.1 Classification
4.2 Object Detection
4.3 Ablation Study and Analysis
5 Conclusion and Future Works
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Computer Vision - ECCV 2020

Description

More details

Other editions

Additional editions

Content

System requirements