Computer Vision - ECCV 2020

Name: Computer Vision - ECCV 2020 | 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXVIII
Brand: Springer
Price: 96.29 EUR
Availability: OnlineOnly

16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXVIII

Andrea Vedaldi Horst Bischof Thomas Brox Jan-Michael Frahm(Editor)

Springer (Publisher)

Published on 2. November 2020

XLII, 789 pages

E-Book

PDF with digital watermarking

System requirements

978-3-030-58604-1 (ISBN)

€96.29incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Intro
Foreword
Preface
Organization
Contents - Part XXVIII
SqueezeSegV3: Spatially-Adaptive Convolution for Efficient Point-Cloud Segmentation
1 Introduction
2 Related Work
2.1 Point-Cloud Segmentation
2.2 Adaptive Convolution
2.3 Efficient Neural Networks
3 Spherical Projection of LiDAR Point-Cloud
4 Spatially-Adaptive Convolution
4.1 Standard Convolution
4.2 Spatially-Adaptive Convolution
4.3 Efficient Computation of SAC
4.4 Relationship with Prior Work
5 SqueezeSegV3
5.1 The Architecture of SqueezeSegV3
5.2 Loss Function
6 Experiments
6.1 Dataset and Evaluation Metrics
6.2 Implementation Details
6.3 Comparing with Prior Methods
6.4 Ablation Study
7 Conclusion
References
An Attention-Driven Two-Stage Clustering Method for Unsupervised Person Re-identification
1 Introduction
2 Related Work
2.1 Unsupervised Person Re-ID
2.2 Attention in Person Re-ID
3 Our Approach
3.1 Voxel Attention (VA)
3.2 Two-Stage Clustering (TC)
3.3 Progressive Training
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Model Performances on Benchmark Datasets
4.4 Contribution of the Voxel Attention
4.5 Contribution of Two-Stage Clustering
4.6 Contribution of Progressive Training
4.7 Component Analysis of ADTC
5 Conclusion
References
Toward Fine-Grained Facial Expression Manipulation
1 Introduction
2 Related Work
3 Methodology
3.1 Relative Action Units (AUs)
3.2 Network Structure
3.3 Multi-scale Feature Fusion
3.4 Loss Functions
4 Experiments
4.1 Implementation Details
4.2 Evaluation Metrics
4.3 Qualitative Evaluation
4.4 Quantitative Evaluation
4.5 Ablation Study
5 Conclusion
References
Adaptive Object Detection with Dual Multi-label Prediction
1 Introduction
2 Related Work
3 Method
3.1 Multi-label Prediction
3.2 Conditional Adversarial Feature Alignment
3.3 Category Prediction Based Regularization
3.4 Overall End-to-End Learning
4 Experiments
4.1 Implementation Details
4.2 Domain Adaptation from Real to Virtual Scenes
4.3 Adaptation from Clear to Foggy Scenes
4.4 Ablation Study
4.5 Further Analysis
5 Conclusion
References
Table Structure Recognition Using Top-Down and Bottom-Up Cues
1 Introduction
2 Related Work
3 TabStruct-Net
3.1 Top-Down: Cell Detection
3.2 Bottom-Up: Structure Recognition
3.3 Post-Processing
4 Experiments
4.1 Datasets
4.2 Baseline Methods
4.3 Implementation Details
4.4 Evaluation Measures
4.5 Experimental Setup
5 Results on Table Structure Recognition
5.1 Analysis of Results
5.2 Ablation Study
6 Summary
References
Novel View Synthesis on Unpaired Data by Conditional Deformable Variational Auto-Encoder
1 Introduction
2 Related Works
3 Method
3.1 Overview Framework
3.2 Conditional Deformable Module (CDM)
3.3 Deformed Feature Based Normalization Module (DFNM)
3.4 Overall Optimization Objective
4 Experiments
4.1 Dataset and Implementation Details
4.2 Results and Ablation Studies on 3D Chair and MultiPIE
4.3 Results and Analysis on MultiPIE
5 Conclusions
References
Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments
1 Introduction
2 Related Work
3 VLN in Continuous Environments (VLN-CE)
3.1 Transferring Nav-Graph Trajectories
3.2 VLN-CE Dataset
4 Instruction-Guided Navigation Models in VLN-CE
4.1 Sequence-to-Sequence Baseline
4.2 Cross-Modal Attention Model
4.3 Auxiliary Losses and Training Regimes
5 Experiments
5.1 Establishing Baseline Performance for VLN-CE
5.2 Model Performance in VLN-CE
5.3 Examining the Impact of the Nav-Graph in VLN
6 Discussion
References
Boundary Content Graph Neural Network for Temporal Action Proposal Generation
1 Introduction
2 Related Work
3 Our Approach
3.1 Problem Definition
3.2 Feature Encoding
3.3 Boundary Content Graph Network
3.4 Training of BC-GNN
3.5 Inference of BC-GNN
4 Experiment
4.1 Dataset and Setup
4.2 Temporal Action Proposal Generation
4.3 Temporal Action Detection with Our Proposals
5 Conclusion
References
Pose Augmentation: Class-Agnostic Object Pose Transformation for Object Recognition
1 Introduction and Related Work
2 Object Pose Transforming Network
2.1 Eliminate-Add Structure of the Generator
2.2 Pose-Eliminate Module
2.3 Continuous Pose Transforming Training
2.4 Loss Function
3 Experimental Methods
3.1 Datasets
3.2 Network Implementation
4 Experiments and Results
4.1 Object Pose Transformation Experiments
4.2 Object Recognition Experiment
4.3 Class-Agnostic Object Transformation Experiment
4.4 Object Pose Significance on Different Object Recognition Tasks
4.5 Generalization to Imagenet
5 Conclusions
References
VLANet: Video-Language Alignment Network for Weakly-Supervised Video Moment Retrieval
1 Introduction
2 Related Work
2.1 Temporal Action Detection
2.2 Video Moment Retrieval
3 Method
3.1 Method Overview
3.2 Input Representation
3.3 Surrogate Proposal Selection Module
3.4 Cascaded Cross-Modal Attention Module
4 Experiment
4.1 Datasets
4.2 Quantitative Result
4.3 Model Variants and Ablation Study
4.4 Analysis of Multi-modal Similarity
4.5 Visualization of Attention Map
4.6 Visualization of Inference
5 Conclusions
References
Attention-Based Query Expansion Learning
1 Introduction
2 Related Work
3 Attention-Based Query Expansion Learning
3.1 Generalized Query Expansion
3.2 Query Expansion Learning
3.3 Learnable Attention-Based Query Expansion (LAttQE)
3.4 Database-Side Augmentation
4 Experiments
4.1 Training Setup and Implementation Details
4.2 Test Datasets and Evaluation Protocol
4.3 Model Study
4.4 Comparison with Existing Methods
5 Conclusions
References
Interpretable Foreground Object Search as Knowledge Distillation
1 Introduction
2 Related Works
2.1 Foreground Object Search
2.2 Knowledge Distillation
3 Foreground Object Search Dataset
3.1 Pipeline to Establish Pattern-Level FoS Dataset
3.2 Interchangeable Foregrounds Labelling
3.3 Evaluation Set and Metrics
4 Proposed Approach
4.1 Overall Training Scheme
4.2 Foreground Encoder
4.3 Query Encoder
4.4 Pattern-Level Foreground Object Search
4.5 Implementation Details
5 Experiments
5.1 Foreground Encoder
5.2 Query Encoder
6 Conclusions
References
Improving Knowledge Distillation via Category Structure
1 Introduction
2 Related Work
2.1 Model Compression
2.2 Knowledge Distillation
3 Category Structure Knowledge Distillation
3.1 Knowledge Distillation
3.2 Category Structure
3.3 Loss for Category Structure Transfer
4 Experiments
4.1 Experimental Settings
4.2 Results on CIFAR-10
4.3 Results on CIFAR-100
4.4 Results on Tiny ImageNet
4.5 Ablation Study
4.6 Analysis
5 Conclusion
References
High Resolution Zero-Shot Domain Adaptation of Synthetically Rendered Face Images
1 Introduction
2 Related Work
2.1 Generative Models
2.2 Zero-Shot Domain Transfer
2.3 Domain Adaptation
3 Method
3.1 Step 1: Sampling
3.2 Step 2: Latent Code Refinement
3.3 Step 3: Synthetic Fit and Latent Code Interpolation
3.4 Step 4: Result Sample Selection
4 Experiments
4.1 Qualitative Experiments
4.2 Quantitative Experiments
5 Conclusions
References
Attentive Prototype Few-Shot Learning with Capsule Network-Based Embedding
1 Introduction
2 Related Work
2.1 Few-Shot Learning
2.2 Capsule Networks
3 Method
3.1 Approach Details
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Results Evaluation
5 Conclusion
References
Weakly Supervised Instance Segmentation by Learning Annotation Consistent Instances
1 Introduction
2 Related Work
3 Method
3.1 Notation
3.2 Conditional Distribution
3.3 Prediction Distribution
4 Learning Objective
4.1 Task-Specific Loss Function:
4.2 Learning Objective for Instance Segmentation:
5 Optimization
5.1 Visualization of the Learning Process
6 Experiments
6.1 Data Set and Evaluation Metric
6.2 Initialization
6.3 Comparison with Other Methods
6.4 Ablation Experiments
7 Conclusion
References
DA4AD: End-to-End Deep Attention-Based Visual Localization for Autonomous Driving
1 Introduction
2 Related Work
3 Problem Statement
4 Method
4.1 Network Architecture
4.2 System Workflow
4.3 Local Feature Embedding
4.4 Attentive Keypoint Selection
4.5 Weighted Feature Matching
4.6 Loss
5 Experiments
5.1 Apollo-DaoxiangLake Dataset
5.2 Performances
5.3 Ablations and Visualization
6 Conclusion
References
Visual-Relation Conscious Image Generation from Structured-Text
1 Introduction
2 Related Work
3 Proposed Method
3.1 Visual-Relation Layout Module
3.2 Stacking-GANs
3.3 Loss Function
4 Experiments
4.1 Dataset and Compared Methods
4.2 Implementation and Training Details
4.3 Comparison with State-of-the-Arts
4.4 Ablation Study
5 Conclusion
References
Patch-Wise Attack for Fooling Deep Neural Network
1 Introduction
2 Related Work
2.1 Adversarial Examples
2.2 Attack Settings
2.3 Ensemble Strategy
3 Methodology
3.1 Development of Gradient-Based Attack Methods
3.2 Patch-Wise Iterative Fast Gradient Sign Method
4 Experiment
4.1 Setup
4.2 Amplification Factor
4.3 Project Kernel Size
4.4 Attacks vs. Normally Trained Models
4.5 Attacks vs. Defense Models
5 Conclusions
References
Feature Pyramid Transformer
1 Introduction
2 Related Work
3 Feature Pyramid Transformer
3.1 Non-local Interaction Revisited
3.2 Self-Transformer
3.3 Grounding Transformer
3.4 Rendering Transformer
3.5 Overall Architecture
4 Experiments
4.1 Instance-Level Recognition
4.2 Experiments on Pixel-Level Recognition
5 Conclusion
References
MABNet: A Lightweight Stereo Network Based on Multibranch Adjustable Bottleneck Module
1 Introduction
2 Related Work
3 Proposed Network
3.1 Multibranch Adjustable Bottleneck (MAB) Module
3.2 MABNet Overview
3.3 Feature Extraction by 2D MAB
3.4 Cost Volume
3.5 Cost Aggregation by 3D MAB
3.6 Disparity Regression
3.7 Training Loss
4 Experiments
4.1 Implementation Details
4.2 Ablation Studies
4.3 Evaluations on Benchmarks
5 Conclusions
References
Guided Saliency Feature Learning for Person Re-identification in Crowded Scenes
1 Introduction
2 Related Work
3 Proposed Approach
3.1 Architecture of the Proposed Model
3.2 Guided Saliency Feature Learning
3.3 Guided Adaptive Spatial Matching
4 Experiments
4.1 Experiment Settings
4.2 Datasets
4.3 Occluded Person Re-identification
4.4 Non-occluded Person Re-identification
4.5 Cross-domain Person Re-identification
4.6 Ablation Study
5 Conclusion
References
Asymmetric Two-Stream Architecture for Accurate RGB-D Saliency Detection
1 Introduction
2 Related Work
3 The Proposed Method
3.1 The Overall Architecture
3.2 DepthNet
3.3 RGBNet
3.4 Depth Attention Module
4 Experiments
4.1 Dataset
4.2 Experimental Setup
4.3 Ablation Analysis
4.4 Comparison with State-of-the-Art
5 Conclusion
References
Explaining Image Classifiers Using Statistical Fault Localization
1 Introduction
2 Related Work
3 Preliminaries
3.1 Deep Neural Networks (DNNs)
3.2 Statistical Fault Localization (SFL)
4 What Is an Explanation?
5 SFL Explanation for DNNs
5.1 SFL Explanation Algorithm
5.2 Relationship Between Pexp and Definition1
6 Experimental Evaluation
6.1 Experimental Setup
6.2 Are the Explanations from DeepCover Useful?
6.3 Comparison with the State-of-the-art
6.4 Generating ``ground Truth'' with a Chimera Benchmark
6.5 Trojaning Attacks
6.6 Threats to Validity
7 Conclusions
References
Deep Graph Matching via Blackbox Differentiation of Combinatorial Solvers
1 Introduction
2 Related Work
3 Methods
3.1 Differentiability of Combinatorial Solvers
3.2 Graph Matching
3.3 Cost Margin
3.4 Solvers
3.5 Architecture Design
4 Experiments
4.1 Pascal VOC
4.2 Willow ObjectClass
4.3 SPair-71k
4.4 Ablations Studies
5 Conclusion
References
Video Representation Learning by Recognizing Temporal Transformations
1 Introduction
2 Prior Work
3 Learning Video Dynamics
3.1 Transformations of Time
3.2 Training
3.3 Implementation
4 Experiments
5 Conclusions
References
Unsupervised Monocular Depth Estimation for Night-Time Images Using Adversarial Domain Feature Adaptation
1 Introduction
2 Proposed Method
2.1 Learning Fd and Gd from Day-Time Images
2.2 Learning Fn Using Night-Time Images
2.3 Training Losses
3 Experiments and Results
3.1 Oxford Robotcar Dataset: Training and Testing Data Setup
3.2 Experimental Setup
3.3 Study 1: Depth Evaluation
3.4 Study 2: Visual Place Recognition: Day Versus Night
4 Conclusions and Future Scope
References
Variational Connectionist Temporal Classification
1 Introduction
2 Related Work
2.1 Methodology
2.2 Connectionist Temporal Classification
2.3 Variational Connectionist Temporal Classification
3 Experimental Results
3.1 Scene Text Recognition
3.2 Offline Handwritten Text Recognition
3.3 Further Analysis
4 Conclusion
References
End-to-end Dynamic Matching Network for Multi-view Multi-person 3D Pose Estimation
1 Introduction
2 Related Work
2.1 Single-View 2D Pose Estimation
2.2 Multi-view 3D Pose Estimation
3 Method
3.1 2d Pose Estimator Backbone
3.2 Dynamic Matching
3.3 3D Pose Estimation
3.4 Loss Function
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Ablation Study
4.4 Comparison with Previous Works
5 Conclusion
References
Orderly Disorder in Point Cloud Domain
1 Introduction
2 Related Work
3 Proposed Pattern-Wise Network
3.1 Network Properties
3.2 Network Architecture
3.3 Classification and Segmentation Networks
4 Experimental Results
5 Conclusions
References
Deep Decomposition Learning for Inverse Imaging Problems
1 Introduction
2 Background
2.1 Deep Learning for the Inverse Problem
2.2 Range-Nullspace (R-N) Decomposition
3 Deep Decomposition Learning
3.1 Training Strategy
3.2 The Relationship to Other Work
4 Experiments
4.1 Implementation
4.2 CS-MRF Reconstruction
4.3 Ablation Study
5 Conclusion
References
FLOT: Scene Flow on Point Clouds Guided by Optimal Transport
1 Introduction
2 Related Works
3 Method
3.1 Step 1: Finding Soft-Correspondences Between Points
3.2 Step 2: Flow Estimation from Soft-Correspondences
3.3 Training
3.4 Similarities and Differences with Existing Techniques
4 Experiments
4.1 Datasets
4.2 Performance Metrics
4.3 Study of FLOT
4.4 Performance on FT3Ds and KITTIs
4.5 Performance on FT3Do and KITTIo
5 Conclusion
References
Accurate Reconstruction of Oriented 3D Points Using Affine Correspondences
1 Introduction
2 Epipolar Geometry-Consistent ACs
2.1 Extension to the Multi-view Case
3 Multi-view Linear Estimation of Surface Normals
4 Photoconsistency Optimization for Accurate Normals
4.1 2-DoF Formulation
4.2 3-DoF Formulation
5 Experimental Validation
5.1 Synthetic Data
5.2 Photoconsistency Refinement
5.3 Tracking
6 Conclusions and Future Work
References
Volumetric Transformer Networks
1 Introduction
2 Related Work
3 Volumetric Transformer Networks
3.1 Preliminaries
3.2 Motivation and Overview
3.3 Volumetric Transformation Estimator
3.4 Loss Function
3.5 Implementation and Training Details
4 Experiments
4.1 Experimental Setup
4.2 Fine-Grained Image Recognition
4.3 Instance-Level Image Retrieval
5 Conclusion
References
360 Camera Alignment via Segmentation
1 Introduction
1.1 Related Work
2 Methods
2.1 Background on Equirectangular Images
2.2 Segmentation Framework
2.3 Vanishing Point Image
2.4 Training Method
2.5 Test-Time Prediction
3 Experiments
3.1 Sun360 Dataset
3.2 Noise Dataset
3.3 Construction Dataset
3.4 Downstream Segmentation Task
4 Conclusion
References
A Novel Line Integral Transform for 2D Affine-Invariant Shape Retrieval
1 Introduction
2 Related Work
3 Affine Theory of Line Integral
4 The Proposed Line Integral Transform
4.1 Binding Line Pair and Its Affine Property
4.2 The Proposed Transform
4.3 Affine Invariants
5 Experimental Results and Discussions
6 Conclusions
References
Explanation-Based Weakly-Supervised Learning of Visual Relations with Graph Networks
1 Introduction
2 Related Works
3 Method
3.1 Object Detection
3.2 Predicate Classification
3.3 Explanation-Based Relationship Detection
3.4 Prior over Relationships
4 Experiments
4.1 Setup
4.2 HICO-DET
4.3 Visual Relationship Detection Dataset
4.4 Unusual Relations Dataset
5 Conclusion
References
Guided Semantic Flow
1 Introduction
2 Related Works
3 Problem Statement
4 Guided Semantic Flow
4.1 Network Architecture
4.2 Objective Functions
4.3 Training Details
5 Experimental Results
5.1 Implementation Details
5.2 Results
5.3 Ablation Study
6 Conclusion
References
Document Structure Extraction Using Prior Based High Resolution Hierarchical Semantic Segmentation
1 Introduction
2 Related Work
3 Methodology
3.1 Network Pipeline
3.2 Network Architecture
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Results
5 Conclusion
References
Measuring the Importance of Temporal Features in Video Saliency
1 Introduction
2 Related Work
3 Methods
3.1 Center Bias
3.2 Gold Standard Model
3.3 Static Baseline Model
4 Experiments
4.1 Metrics
4.2 Datasets
4.3 Performance Results
4.4 Analyzing Temporal Effects
4.5 Evaluating Temporal Modelling
5 Discussion
References
Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
1 Introduction
2 Related Work
2.1 3D Perception Models
2.2 Neural Architecture Search
3 SPVConv: Designing Effective 3D Modules
3.1 Point-Voxel Convolution: Coarse Voxelization
3.2 Sparse Convolution: Aggressive Downsampling
3.3 Solution: Sparse Point-Voxel Convolution
4 3D-NAS: Searching Efficient 3D Architectures
4.1 Design Space
4.2 Training Paradigm
4.3 Search Algorithm
5 Experiments
5.1 3D Scene Segmentation
5.2 3D Object Detection
6 Analysis
6.1 Sparse Point-Voxel Convolution
6.2 Architecture Search
7 Conclusion
References
Towards Reliable Evaluation of Algorithms for Road Network Reconstruction from Aerial Images
1 Introduction
2 Existing Metrics
2.1 Pixel-Based Metrics
2.2 Path-Based Metrics
2.3 Junction-Based Metric (JUNCT)
2.4 Subgraph-Based Metric (SUBG)
2.5 Summary
3 New Metrics
3.1 Path-Based Metric (OPT-P)
3.2 Junction-Based Metric (OPT-J)
3.3 Subgraph-Based Metric (OPT-G)
4 Experiments
4.1 Synthetic Data
4.2 Real Data
5 Conclusion
References
Online Continual Learning Under Extreme Memory Constraints
1 Introduction
2 Related Work
3 Memory-Constrained Online Continual Learning
3.1 Problem and Notation
3.2 Batch-Level Distillation
3.3 Warm-Up Stage
3.4 Joint Training Stage
3.5 Memory Efficient Data Augmentation
4 Experiments
4.1 Experimental Protocol
4.2 Experimental Evaluation
5 Conclusions
References
Learning to Cluster Under Domain Shift
1 Introduction
2 Related Works
3 Proposed Method
3.1 Multi-domain Clustering with Mutual Information
3.2 Domain Alignment
3.3 Training and Adaptation Procedures
4 Experiments
4.1 Ablation Study
4.2 Comparison with Other Methods
4.3 Limited Target Data Scenario
5 Conclusions
References
Defense Against Adversarial Attacks via Controlling Gradient Leaking on Embedded Manifolds
1 Introduction
2 Related Work
3 Gradient Leaking Hypothesis
3.1 Preliminary
3.2 Gradient Leaking
3.3 Empirical Study
4 Adversarial Defenses
4.1 Making the Data Manifold Flat
4.2 Adding Noise in the Normal Space
5 Experiments
5.1 Primary Experiments
5.2 Integration into Other Defense Algorithms
6 Conclusion and Future Work
References
Improving Optical Flow on a Pyramid Level
1 Introduction
2 Related Work
3 Main Contributions
3.1 Pyramid Flow Networks
3.2 Improving Pyramid Levels in PFNs
3.3 Improving Gradient Flow Across PFN Levels
3.4 Additional Refinements
4 Experiments
4.1 Setup and Modifications over HD3
4.2 Flow Ablation Experiments
4.3 Optical Flow Benchmark Results
5 Conclusions
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Computer Vision - ECCV 2020

Description

More details

Other editions

Additional editions

Content

System requirements