Computer Vision - ECCV 2020

Name: Computer Vision - ECCV 2020 | 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXI
Brand: Springer
Price: 96.29 EUR
Availability: OnlineOnly

16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXI

Andrea Vedaldi Horst Bischof Thomas Brox Jan-Michael Frahm(Editor)

Springer (Publisher)

Published on 11. November 2020

XLIII, 791 pages

E-Book

PDF with digital watermarking

System requirements

978-3-030-58589-1 (ISBN)

€96.29incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Intro
Foreword
Preface
Organization
Contents - Part XXI
DVI: Depth Guided Video Inpainting for Autonomous Driving
1 Introduction
2 Related Work
3 Proposed Approach
3.1 3D Depth Map
3.2 Candidate Color Sampling Criteria
3.3 Regularization with Belief Propagation
3.4 Color Harmonization
3.5 Video Fusion
3.6 Temporal Smoothing
4 Experiments and Results
4.1 Inpainting Dataset
4.2 Comparisons
4.3 Ablation Study
5 Conclusion
References
Incorporating Reinforced Adversarial Learning in Autoregressive Image Generation
1 Introduction
2 Related Work
2.1 Generative Models
2.2 Reinforcement Learning in Sequence Generation
3 Background: VQ-VAE & VQ-VAE-2
4 Reinforced Adversarial Learning
4.1 Policy Gradients
4.2 Discriminator
4.3 Partial Generation
4.4 Training
5 Experiments
5.1 Implementation Details
5.2 Synthetic Experiments
5.3 Real World Experiments
5.4 Ablation Study
5.5 Image Completion
6 Conclusion
References
APRICOT: A Dataset of Physical Adversarial Attacks on Object Detection
1 Introduction
2 Related Work
3 The APRICOT Dataset
3.1 Generating Adversarial Patches
3.2 Dataset Description
4 Effectiveness of Adversarial Patches
4.1 Digital Performance
4.2 Physical Performance
5 Flagging Adversarial Patches
5.1 Detecting Adversarial Patches Using Synthetic Supervision
5.2 Determining if a Detection is Adversarial Using Uncertainty and Density
5.3 Localizing Adversarial Regions with Density and Reconstruction
6 Conclusion
References
Visual Question Answering on Image Sets
1 Introduction
2 Related Works
3 Dataset
3.1 Annotation Collection
3.2 Dataset Analysis
4 ISVQA Problem Formulation and Baselines
4.1 Problem Definition
4.2 Model Definitions
5 Experiments
5.1 Human Performance
5.2 Implementation Details
5.3 Results
6 Conclusion and Discussion
References
Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots
1 Introduction
2 Related Work
3 Object as Hotspots
3.1 Hotspot Definition
3.2 Hotspot Selection and Assignment
4 HotSpot Network
4.1 Object-as-Hotspots Head
4.2 Learning and Inference
5 Experiments
5.1 Datasets and Evaluation
5.2 Implementation Details
5.3 Experiment Results on KITTI Benchmark
5.4 Experiment Results on NuScenes Dataset
5.5 Analysis
5.6 Ablation Studies
6 Conclusion
References
Placepedia: Comprehensive Place Understanding with Multi-faceted Annotations
1 Introduction
2 Related Work
3 The Placepedia Dataset
3.1 Hierarchical Administrative Areas and Places
3.2 Place Images
4 Study on Comprehensive Place Understanding
4.1 Benchmarks
4.2 PlaceNet
4.3 Experimental Settings
4.4 Analysis on Recognition Results
5 Study on Multi-faceted City Embedding
5.1 City Embedding
5.2 Experimental Results
6 Conclusion
References
DELTAS: Depth Estimation by Learning Triangulation and Densification of Sparse Points
1 Motivation
2 Related Work
3 Method
3.1 Interest Point Detector and Descriptor
3.2 Point Matching and Triangulation
3.3 Densification of Sparse Depth Points
3.4 Overall Training Objective
4 Experimental Results
4.1 Implementation Details
4.2 Detector and Descriptor Quality
4.3 Depth Results
5 Conclusion
References
Dynamic Low-Light Imaging with Quanta Image Sensors
1 Introduction
2 Background
2.1 Quanta Image Sensors
2.2 How Dark Is One Photon per Pixel?
2.3 Related Work
3 Method
3.1 QIS Imaging Model
3.2 The Dilemma of Noise and Motion
3.3 Student-Teacher Learning
3.4 Choice of Teacher and Student Networks
4 Experiments
4.1 Setting
4.2 Synthetic Experiments
4.3 Real Experiments
4.4 Ablation Study
5 Conclusion
References
Disambiguating Monocular Depth Estimation with a Single Transient
1 Introduction
2 Related Work
3 Method
3.1 Image Formation Model of a Diffused SPAD
3.2 Ambient Rejection and Falloff Correction
3.3 Histogram Matching
4 Evaluation and Assessment
4.1 Implementation Details
4.2 Simulated Results
5 Experimental Demonstration
5.1 Prototype RGB-SPAD Camera Hardware
5.2 Experimental Results
6 Discussion
References
DSDNet: Deep Structured Self-driving Network
1 Introduction
2 Related Work
3 Deep Structured Self-driving Network
3.1 Backbone Feature Network and Object Detection
3.2 Probabilistic Multimodal Social Prediction
3.3 Safe Motion Planning Under Uncertain Future
3.4 Learning
4 Experimental Evaluation
4.1 Multi-modal Interactive Prediction
4.2 Motion Planning
4.3 Object Detection Results
4.4 Ablation Study and Qualitative Results
5 Conclusion
References
QuEST: Quantized Embedding Space for Transferring Knowledge
1 Introduction
2 Related Work
3 Approach
3.1 Preliminaries
3.2 Distilling Visual Teacher-Word Assignments
3.3 Discussion
4 Experiments
4.1 Comparison with Prior Work
4.2 Transfer Learning to Small-Sized Datasets
4.3 Further Analysis
5 Conclusions
References
EGDCL: An Adaptive Curriculum Learning Framework for Unbiased Glaucoma Diagnosis
1 Introduction
2 Related Work
3 Methodology
3.1 Student Network for Spatial Evidence Identification
3.2 Curriculum Generation
3.3 Teacher Network for Glaucoma Diagnosis
4 Experiments and Results
4.1 Dataset and Evaluation
4.2 Training and Inference
4.3 Performance of Unbiased Glaucoma Diagnosis
4.4 Effectiveness of Dual-Curriculum Learning
4.5 Performance Comparison
5 Conclusions
References
Backpropagated Gradient Representations for Anomaly Detection
1 Introduction
2 Related Works
2.1 Anomaly Detection
2.2 Backpropagated Gradients
3 Gradient-Based Representations
3.1 Geometric Interpretation of Gradients
3.2 Theoretical Interpretation of Gradients
4 Method: Gradient Constraint
5 Experiments
5.1 Experimental Setup
5.2 Baseline Comparison
5.3 Comparison with State-of-The-Art Algorithms
6 Conclusion
A Appendix
A.1 Additional Results on fMNIST
A.2 Histogram Analysis on CIFAR-10
A.3 Parameter Setting for the Gradient Loss
A.4 Additional Details on CURE-TSR Dataset
References
Dense RepPoints: Representing Visual Objects with Dense Point Sets
1 Introduction
2 Related Work
3 Methodology
3.1 Review of RepPoints for Object Detection
3.2 Dense RepPoints
3.3 Efficient Computation
3.4 Different Sampling Strategies
3.5 Sampling Supervision
3.6 Representative Points to Object Segment
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Ablation Study
4.4 Comparison with Other SOTA Methods
5 Conclusion
References
On Dropping Clusters to Regularize Graph Convolutional Neural Networks
1 Introduction
2 Related Work
3 DropCluster
3.1 Preliminaries
3.2 Spatial Correlation
3.3 Depth-Wise Correlation
3.4 Number of Seed Entries
3.5 Discussions
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Hyperparameter Analysis
4.4 Ablation Study
4.5 Comparisons with Other State-of-the-Art Methods
4.6 Implementation on Networks with Extended Depths
4.7 Further Implementations
5 Conclusion
References
Adaptive Video Highlight Detection by Learning from User History
1 Introduction
2 Related Work
3 Our Approach
3.1 Background: Temporal Convolution Networks
3.2 Temporal-Adaptive Instance Normalization
3.3 Adaptive Highlight Detector
3.4 Learning and Optimization
4 Experiments
4.1 Dataset
4.2 Setup and Implementation Details
4.3 Baselines
4.4 Results and Comparison
4.5 Analysis
4.6 Application to Video Summarization
5 Conclusion
References
Improving 3D Object Detection Through Progressive Population Based Augmentation
1 Introduction
2 Related Work
3 Methods
3.1 Search Space for 3D Point Cloud Augmentation
3.2 Learning Through Progressive Population Based Search
3.3 Schedule Optimization with Historical Data
4 Experiments
4.1 Surpassing Single-Stage Models on the KITTI Dataset
4.2 Automated Data Augmentation Benefits Large-Scale Data
4.3 Better Results with Less Computation
4.4 Automated Data Augmentation Improves Data Efficiency
4.5 Progressive Population Based Augmentation Generalizes on Image Classification
5 Conclusion
References
DR-KFS: A Differentiable Visual Similarity Metric for 3D Shape Reconstruction
1 Introduction
2 Related Work
3 Differential Visual Shape Similarity Metric
3.1 Differentiable Renderer
4 Results and Evaluation
4.1 Quantitative Evaluation
4.2 Qualitative Evaluation
4.3 Design Analysis
5 Conclusion
References
SPAN: Spatial Pyramid Attention Network for Image Manipulation Localization
1 Introduction
2 Related Work
3 Method
3.1 Overview
3.2 Local Self-attention Block
3.3 Positional Projection
3.4 Pyramid Propagation
3.5 Framework Training
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Evaluation and Comparison
5 Conclusion
References
Adversarial Learning for Zero-Shot Domain Adaptation
1 Introduction
2 Related Work
3 Background
4 Approach
4.1 Problem Definition
4.2 Main Idea
4.3 Training
5 Experiments
5.1 Adaptation Across Synthetic Domains
5.2 Adaptation in Public Dataset
6 Conclusion and Future Work
References
YOLO in the Dark - Domain Adaptation Method for Merging Multiple Models
1 Introduction
2 Related Work
3 Proposed Model: YOLO in the Dark
3.1 Overview
3.2 Generative Model for Domain Adaptation
3.3 Training Environment
4 Experiments
4.1 Object Detection in RAW Images
4.2 Ablation Study
5 Conclusion
References
Identity-Aware Multi-sentence Video Description
1 Introduction
2 Related Work
3 Connecting Identities to Video Descriptions
3.1 Fill-in the Identity
3.2 Identity-Aware Video Description
4 Dataset
5 Experiments
5.1 Implementation Details
5.2 Fill-in the Identity
5.3 Identity-Aware Video Description
6 Conclusion
References
VQA-LOL: Visual Question Answering Under the Lens of Logic
1 Introduction
2 Related Work
3 The Lens of Logic
3.1 Composite Questions
3.2 Dataset Creation Process
3.3 Analytical Setup
4 Method
4.1 Cross-Modal Feature Encoder
4.2 Our Model: Lens of Logic (LOL)
4.3 Loss Functions
4.4 Implementation Details
5 Experiments
5.1 Can't We Just Parse the Question into Components?
5.2 Explicit Training with Logically Composed Questions
5.3 Analysis
5.4 Evaluation on VQA V2.0 Test Data
6 Discussion
7 Conclusion
References
Piggyback GAN: Efficient Lifelong Learning for Image Conditioned Generation
1 Introduction
2 Related Work
3 Method
3.1 Piggyback Filter Learning
3.2 Unconstrained Filter Learning
3.3 Expanding Filter Bank
3.4 Learning Piggyback GAN
4 Experiments
4.1 Paired Image-Conditioned Generation
4.2 Unpaired Image-Conditioned Generation
5 Conclusion
References
TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering
1 Introduction
2 Related Works
2.1 Visual Question Answering
2.2 Visual Reasoning in VQA
3 Our Approach
3.1 Overview
3.2 Root Attention
3.3 Root to Leaf Attention Passing
3.4 Leaf Attention
3.5 Message Passing Module for Units Interaction
3.6 Multi-stage Reasoning and Policy Network
3.7 The Readout Layer
4 Experiments
4.1 Experimental Setup
4.2 Ablation Study
4.3 Experimental Results on GQA
4.4 Experimental Results on VQAv2 and CLEVR
4.5 Visualization
5 Conclusion
References
Mining Inter-Video Proposal Relations for Video Object Detection
1 Introduction
2 Related Works
3 Our HVR-Net
3.1 Video-Level Triplet Selection
3.2 Intra-video Proposal Relation
3.3 Proposal-Level Triplet Selection
3.4 Inter-Video Proposal Relation
4 Experiments
4.1 Implementation Details
4.2 Ablation Studies
4.3 SOTA Comparison
4.4 Visualization
5 Conclusion
References
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
1 Introduction
2 Related Work
3 Dataset
3.1 Data Collection
3.2 Data Analysis and Comparison
4 Cross-Modal Moment Localization (XML)
4.1 XML Backbone Network
4.2 Convolutional Start-End Detector
4.3 Training and Inference
5 Experiments
5.1 Data, Metrics and Implementation Details
5.2 Baselines Comparison
5.3 Model Analysis
6 Conclusion
References
Minimum Class Confusion for Versatile Domain Adaptation
1 Introduction
2 Related Work
3 Approach
3.1 Minimum Class Confusion
3.2 Versatile Approach to Domain Adaptation
3.3 Regularizer to Existing DA Methods
4 Experiments
4.1 Setup
4.2 Results and Discussion
4.3 Empirical Analyses
5 Conclusion
References
Large Batch Optimization for Object Detection: Training COCO in 12minutes
1 Introduction
2 Related Work
2.1 CNN-Based Detectors
2.2 Large Batch Optimization
3 Method
3.1 Problems of Linear Scaling Rule
3.2 Periodical Moments Decay LAMB
3.3 LargeDet Framework and Guidelines
4 Experiments
4.1 Experiments on COCO
4.2 Training COCO in 12min
4.3 Experiments on Open Images
5 Conclusions
References
Towards Practical and Efficient High-Resolution HDR Deghosting with CNN
1 Introduction
2 Related Works
3 Proposed Method
4 Experiments
4.1 Implementation
4.2 Quantitative Evaluation
4.3 Ablation Experiments
4.4 Qualitative Evaluation
4.5 Running Time
5 Discussion
6 Conclusion
References
Monocular Differentiable Rendering for Self-supervised 3D Object Detection
1 Introduction
2 Related Work
3 Method
3.1 Loss Functions
3.2 Escaping Rotational Local Minima
3.3 Detection Confidence Score
4 Experiments
4.1 Comparison to SoTA
4.2 Ablation Studies
4.3 Limitations and Failure Cases
5 Conclusion
References
Shape Prior Deformation for Categorical 6D Object Pose and Size Estimation
1 Introduction
2 Related Work
3 Our Method
3.1 Categorical Shape Prior
3.2 Our Network Architecture
3.3 6D Pose Estimation
3.4 Loss Functions
4 Experiments
4.1 Experimental Setup
4.2 Implementation Details
4.3 Comparison to Baseline
4.4 Evaluation of Shape Reconstruction
4.5 Ablation Studies
4.6 Qualitative Results
5 Conclusions
References
Dynamic and Static Context-Aware LSTM for Multi-agent Motion Prediction
1 Introduction
2 Related Work
3 Method
3.1 The Function of Queues
3.2 Individual Context Module
3.3 Social-Aware Context Module
3.4 Semantic Guidance from Scene Context
3.5 Model Training
4 Experiments
4.1 Datasets and Evaluation Metrics
4.2 Implementation Details
4.3 Standard Evaluations
5 Discussion
5.1 Memory Cell Visualization
5.2 The Capture of Motion Pattern
5.3 Exploration on the Queue Length
5.4 Social Behaviors Understanding
5.5 Analysis of Multimodal Predictions
6 Conclusions
References
Image-Based Table Recognition: Data, Model, and Evaluation
1 Introduction
2 Related Work
3 Automatic Generation of PubTabNet
4 Encoder-Dual-Decoder (EDD) Model
5 Tree-Edit-Distance-Based Similarity (TEDS)
6 Experiments
6.1 Implementation Details
6.2 Quantitative Analysis
6.3 Qualitative Analysis
6.4 Error Analysis
6.5 Generalization
7 Conclusion
References
Group Activity Prediction with Sequential Relational Anticipation Model
1 Introduction
2 Related Work
3 Our Approach
3.1 Relation Modeling for Group Activity
3.2 Observation Encoder E
3.3 Sequential Decoder D
3.4 Feature Aggregation for Prediction
3.5 Loss Functions and Model Learning
3.6 Discussion
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Comparison with State-of-the-Art
4.4 Ablation Study
4.5 Position Prediction Evaluation
5 Conclusion
References
PiP: Planning-Informed Trajectory Prediction for Autonomous Driving
1 Introduction
2 Related Work
3 Method
3.1 Problem Formulation
3.2 Planning Coupled Module
3.3 Target Fusion Module
3.4 Maneuver Based Decoding
3.5 Implementation Details
4 Experiments
4.1 Datasets
4.2 Baseline Methods
4.3 Quantitative Evaluation
4.4 User Study
4.5 Qualitative Analysis
5 Conclusion
References
PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer
1 Introduction
2 Related Work
3 Method
3.1 Sketch of Convolution Operations
3.2 Design Details
4 Experiments
4.1 ILSVRC 2012
4.2 Ablation and Analysis
4.3 MS COCO 2017
5 Conclusion
References
Hierarchical Context Embedding for Region-Based Object Detection
1 Introduction
2 Related Work
2.1 Region-Based Object Detection
2.2 Context Information for Object Detection
2.3 Context Information for Other Vision Tasks
3 Approach
3.1 Framework Overview
3.2 Image-Level Categorical Embedding
3.3 Hierarchical Contextual RoI Feature Generation
3.4 Early-and-Late Fusion and Inference
4 Experiments
4.1 Implementation Details
4.2 Comparisons with Baselines
4.3 Error Analyses
4.4 Ablation Studies
4.5 Comparisons with State-of-the-Art
5 Conclusions
References
Attention-Driven Dynamic Graph Convolutional Network for Multi-label Image Recognition
1 Introduction
2 Related Work
3 Method
3.1 Overview of ADD-GCN
3.2 Semantic Attention Module
3.3 Dynamic GCN
3.4 Final Classification and Loss
4 Experiments
4.1 Evaluation Metrics
4.2 Implementation Details
4.3 Comparison with State of the Arts
4.4 Ablation Studies
4.5 Visualization
5 Conclusion
References
Gen-LaneNet: A Generalized and Scalable Approach for 3D Lane Detection
1 Introduction
2 Related Work
3 Gen-LaneNet
3.1 Geometry in 3D Lane Detection
3.2 Geometry-Guided Anchor Representation
3.3 Two-Stage Framework with Decoupled Image Encoding and Geometry Reasoning
3.4 Training
4 Synthetic Dataset and Construction Strategy
5 Experiments
5.1 Experimental Setup
5.2 Anchor Effect
5.3 The Upper Bound of the Two-Stage Framework
5.4 Whole System Evaluation
6 Conclusion
References
Sparse-to-Dense Depth Completion Revisited: Sampling Strategy and Graph Construction
1 Introduction
2 Related Work
3 Spatial Sampling Strategy.
3.1 Low-Discrepancy Sequences and Quasi-Random Sampling
3.2 Quasi-Random Sampling Pattern Comparison and Criterion
4 Graph Construction for GNN-Based Depth Completion
4.1 Spatially-Variant Filter and Neighborhood Consideration
4.2 Graph Construction and Network Propagation
5 Experimental Results
5.1 Datasets
5.2 Ablation Study
5.3 Comparison with State-of-the-Art
5.4 Cross-Dataset Evaluation
6 Conclusion
References
MEAD: A Large-Scale Audio-Visual Dataset for Emotional Talking-Face Generation
1 Introduction
2 Related Work
3 MEAD
3.1 Design Criteria
3.2 Data Acquisition
3.3 Analysis and Comparison
3.4 Evaluation
4 Emotional Talking-Face Baseline
5 Experiments and Results
5.1 Experiment Setup
5.2 Baseline Comparison
5.3 Evaluation Results for Our Baseline
6 Limitations and Future Work
7 Conclusion
References
Detecting Human-Object Interactions with Action Co-occurrence Priors
1 Introduction
2 Related Work
3 Proposed Method
3.1 Action Co-occurrence Priors
3.2 Anchor Action Selection via Non-exclusive Action Suppression
3.3 Hierarchical Architecture
3.4 ACP Projection for Knowledge Distillation
4 Experiments
4.1 Datasets and Metrics
4.2 Quantitative Results
4.3 Additional Analysis
5 Conclusion
References
Learning Connectivity of Neural Networks from a Topological Perspective
1 Introduction
2 Related Work
3 Methodology
3.1 Topological Perspective of Neural Networks
3.2 Search Space
3.3 Optimization of Topological Connectivity
4 Experiments and Analysis
4.1 Connectivity Optimization for Classical Networks
4.2 Expanding to Larger Search Spaces by TopoNet
4.3 Transferability on Different Tasks
4.4 Exploring Topological Properties by Graph Damage
4.5 Visualization of the Optimization Process
5 Conclusion and Future Work
References
JSTASR: Joint Size and Transparency-Aware Snow Removal Algorithm Based on Modified Partial Convolution and Veiling Effect Removal
1 Introduction
2 Related Work
2.1 Single Image Snow Removal Algorithm
2.2 Single Image Haze Removal Algorithm
3 Proposed Method
3.1 Snow Model Formulation
3.2 Joint Size and Transparency-Aware Snow Removal
3.3 Veiling Effect Removal
4 Experimental Result
4.1 Dataset Generation
4.2 Training Detail
4.3 Comparison with State-of-the-art Methods
4.4 Ablation Study
5 Conclusion
References
Ocean: Object-Aware Anchor-Free Tracking
1 Introduction
2 Related Work
3 Object-Aware Anchor-Free Networks
3.1 Anchor-Free Regression Network
3.2 Object-Aware Classification Network
3.3 Loss Function
3.4 Relation to Prior Anchor-Free Work
4 Object-Aware Anchor-Free Tracking
4.1 Framework
4.2 Integrating Online Update
5 Experiments
5.1 Implementation Details
5.2 State-of-the-art Comparison
5.3 Analysis of the Proposed Method
6 Conclusion
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Computer Vision - ECCV 2020

Description

More details

Other editions

Additional editions

Content

System requirements