Computer Vision - ECCV 2020

Name: Computer Vision - ECCV 2020 | 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXIV
Brand: Springer
Price: 96.29 EUR
Availability: OnlineOnly

16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXIV

Andrea Vedaldi Horst Bischof Thomas Brox Jan-Michael Frahm(Editor)

Springer (Publisher)

Published on 29. November 2020

XLIII, 803 pages

E-Book

PDF with digital watermarking

System requirements

978-3-030-58586-0 (ISBN)

€96.29incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Intro
Foreword
Preface
Organization
Contents - Part XXIV
Deep Novel View Synthesis from Colored 3D Point Clouds
1 Introduction
2 Related Work
3 Method
3.1 Architecture
3.2 Training Loss
4 Experiments
4.1 Cascaded Outputs Comparison
4.2 Scene Revealing Evaluation
4.3 Novel View Synthesis Evaluation
4.4 Results on the KITTI Dataset
5 Conclusion
References
Consensus-Aware Visual-Semantic Embedding for Image-Text Matching
1 Introduction
2 Related Work
2.1 Knowledge Based Deep Learning
2.2 Image-Text Matching
3 Consensus-Aware Visual-Semantic Embedding
3.1 Exploit Consensus Knowledge to Enhance Concept Representations
3.2 Consensus-Aware Representation Learning
3.3 Training and Inference
4 Experiments
4.1 Dataset and Settings
4.2 Implementation Details
4.3 Comparison to State-of-the-art
4.4 Ablation Studies
4.5 Further Analysis
5 Conclusions
References
Spatial Hierarchy Aware Residual Pyramid Network for Time-of-Flight Depth Denoising
1 Introduction
2 Related Work
3 ToF Imaging Model
4 Spatial Hierarchy Aware Residual Pyramid Network
4.1 Residual Regression Module
4.2 Residual Fusion Module
4.3 Depth Refinement Module
4.4 Loss Function
5 Experiments
5.1 Datasets
5.2 Data Pre-processing
5.3 Training Settings
5.4 Ablation Studies
5.5 Results on Synthetic Datasets
5.6 Results on the Realistic Dataset
6 Conclusion
References
Sat2Graph: Road Graph Extraction Through Graph-Tensor Encoding
1 Introduction
2 Related Work
3 Sat2Graph
3.1 Graph-Tensor Encoding (GTE)
3.2 Training Sat2Graph
4 Evaluation
4.1 Datasets
4.2 Baselines
4.3 Implementation Details
4.4 Evaluation Metrics
4.5 Quantitative Evaluation
4.6 Qualitative Evaluation
5 Discussion
6 Conclusion
References
Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition
1 Introduction
2 Related Work
3 Dataset
4 Methodology
4.1 Preservation of Audio Source Knowledge
4.2 Audiovisual Representation for Multi-task
4.3 Sound Events in Different Scenes
5 Experiments
5.1 Implementation Details
5.2 Aerial Scene Recognition
5.3 Ablation Study
6 Conclusions
References
Polarimetric Multi-view Inverse Rendering
1 Introduction
2 Related Work
3 Polarimetric Ambiguities in Surface Normal Prediction
3.1 Polarimetric Calculation
3.2 Ambiguities
4 Polarimetric Multi-view Inverse Rendering
4.1 Color Polarization Sensor Data Processing
4.2 Initial Geometric Reconstruction
4.3 Photometric and Polarimetric Optimization
5 Experimental Results
5.1 Implementation Details
5.2 Comparison Using Synthetic Data
5.3 Comparison Using Real Data
5.4 Refinement for Polarimetric MVS ch6cui2017polarimetric
6 Conclusions
References
SideInfNet: A Deep Neural Network for Semi-Automatic Semantic Segmentation with Side Information
1 Introduction
2 Related Work
2.1 Interactive Segmentation
2.2 Semantic Segmentation with Side Information
3 SideInfNet
3.1 CNN-Based Semantic Segmentation
3.2 Side Information Feature Map Construction
3.3 Fusion Weight Learning
3.4 Adaptive Architecture
4 Experiments and Results
4.1 Zone Segmentation
4.2 BreAst Cancer Histology Segmentation
4.3 Urban Segmentation
4.4 Varying Levels of Side Information
4.5 SideInfNet with Another CNN Backbone
5 Conclusion
References
Improving Face Recognition by Clustering Unlabeled Faces in the Wild
1 Introduction
2 Related Work
3 Learning from Unlabeled Faces
3.1 Separating Overlapping Identities
3.2 Clustering Faces with GCN
3.3 Joint Data Re-training with Clustering Uncertainty
4 Experiments
4.1 Controlled Disjoint: MS-Celeb-1M Splits
4.2 Controlled Overlap: Overlapping Identities
4.3 Semi-controlled: Limited Labeled, Large-Scale Unlabeled Data
4.4 Soft Labels for Clustering Uncertainty
4.5 Uncontrolled: Large-Scale Labeled and Unlabeled Data
5 Conclusion
References
NeuRoRA: Neural Robust Rotation Averaging
1 Introduction
2 Related Works
3 Multiple Rotation Averaging
4 Learning to Predict Absolute Orientations
4.1 The Network Design Choice
4.2 View-Graph Cleaning Network
4.3 Bootstrapping Absolute Orientations
4.4 Fine-Tuning Network
4.5 Training
5 Results
6 Discussion
References
SG-VAE: Scene Grammar Variational Autoencoder to Generate New Indoor Scenes
1 Introduction
2 Deep Generative Model for Scene Generation
2.1 Scene-Grammar Variational Autoencoder
2.2 CFG of Indoor Scenes
2.3 The VAE Network
3 Discovery of the Scene Grammar
3.1 Data-Driven Relationship Discovery
3.2 Creating a CFG from the Causal Graph
4 Experiments
4.1 Comparison with Baselines on Other Datasets
4.2 Scene Layout Estimation from the RGB-D Image
5 Related Works
6 Conclusion
References
Unsupervised Learning of Optical Flow with Deep Feature Similarity
1 Introduction
2 Related Work
3 Our Approach
3.1 Background on Unsupervised Optical Flow Learning
3.2 Feature Similarity from Multiple Layers
3.3 Learning Optical Flow with Feature Similarity
4 Experimental Results
4.1 Evaluation on Benchmarks
5 Conclusion
References
Blended Grammar Network for Human Parsing
1 Introduction
2 Related Work
3 Blended Grammar Network
3.1 Grammar
3.2 Blended Grammar Network Structure
3.3 Part-Aware CRNN
3.4 Loss Function
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Ablation Study
4.4 Comparison with State-of-the-Art
5 Conclusion
References
P2Net: Patch-Match and Plane-Regularization for Unsupervised Indoor Depth Estimation
1 Introduction
2 Related Work
2.1 Supervised Depth Estimation
2.2 Unsupervised Depth Estimation
2.3 Piece-Wise Planar Scene Reconstruction
3 Method
3.1 Overview
3.2 Keypoints Extraction
3.3 Patch-Based Multi-view Photometric Consistency Error
3.4 Planar Consistency Loss
3.5 Loss Function
4 Experiments
4.1 Implementation Details
4.2 Datasets
4.3 Ablation Experiments
5 Conclusion
References
Efficient Attention Mechanism for Visual Dialog that Can Handle All the Interactions Between Multiple Inputs
1 Introduction
2 Related Work
2.1 Attention Mechanisms for Vision-Language Tasks
2.2 Visual Dialog
3 Efficient Attention Mechanism for Many Utilities
3.1 Attention Mechanism of Transformer
3.2 Application to Bi-modal Tasks
3.3 Light-Weight Transformer for Many Inputs
3.4 Interactions Between All Utilities
4 Implementation Details for Visual Dialog
4.1 Problem Definition
4.2 Representation of Utilities
4.3 Overall Network Design
4.4 Design of Decoders
4.5 Multi-task Learning
5 Experimental Results
5.1 Experimental Setup
5.2 Comparison with State-of-the-Art Methods
5.3 Ablation Study
5.4 Visualization of Generated Attention
6 Summary and Conclusion
References
Adaptive Mixture Regression Network with Local Counting Map for Crowd Counting
1 Introduction
2 Related Work
2.1 Density Estimation Based Approaches
2.2 Direct Counting Regression Approaches
3 Proposed Method
3.1 Local Counting Map
3.2 Scale-Aware Module
3.3 Mixture Regression Module
3.4 Adaptive Soft Interval Module
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Comparisons with State of the Art
4.4 Ablation Studies
5 Conclusion
References
BIRNAT: Bidirectional Recurrent Neural Networks with Adversarial Training for Video Snapshot Compressive Imaging
1 Introduction
1.1 Video Snapshot Compressive Imaging
1.2 Related Work
1.3 Contributions and Organization of This Paper
2 Mathematical Model of Video SCI
3 Proposed Network for Reconstruction
3.1 Measurement Energy Normalization
3.2 AttRes-CNN
3.3 Bidirectional Recurrent Reconstruction Network
3.4 Optimization
4 Experiments
4.1 Training, Testing Datasets and Experimental Settings
4.2 Results on Simulation Datasets
4.3 Results on Real SCI Data
5 Conclusions
References
Ultra Fast Structure-Aware Deep Lane Detection
1 Introduction
2 Related Work
3 Method
3.1 New Formulation for Lane Detection
3.2 Lane Structural Loss
3.3 Feature Aggregation
4 Experiments
4.1 Experimental Setting
4.2 Ablation Study
4.3 Results
5 Conclusion
References
Cross-Identity Motion Transfer for Arbitrary Objects Through Pose-Attentive Video Reassembling
1 Introduction
2 Related Work
3 Method
3.1 Overview
3.2 Network Architecture
3.3 Training and Losses
3.4 Inference
4 Experiments
4.1 Experimental Setup
4.2 Experimental Results
4.3 Human Evaluation
4.4 Analysis
5 Conclusion
References
Domain Adaptive Object Detection via Asymmetric Tri-Way Faster-RCNN
1 Introduction
2 Related Work
3 The Proposed ATF Approach
3.1 Network Architecture of ATF
3.2 Principle of the Chief Net
3.3 Principle of the Ancillary Net
3.4 Training Loss of Our ATF
4 Experiments
4.1 Implementation Details
4.2 Datasets
4.3 Cross-Domain Detection in Different Visibility and Cameras
4.4 Cross-Domain Detection on Large Domain Shift
4.5 Cross-Domain Detection from Synthetic to Real
4.6 Analysis and Discussion
5 Conclusions
References
Exclusivity-Consistency Regularized Knowledge Distillation for Face Recognition
1 Introduction
2 Related Work
3 Proposed Formulation
3.1 Weight Exclusivity
3.2 Feature Consistency
3.3 Exclusivity-Consistency Regularized Knowledge Distillation
4 Experiments
4.1 Datasets
4.2 Experimental Settings
4.3 Ablation Study and Exploratory Experiments
4.4 Comparison to State-of-the-Art Methods
5 Conclusion
References
Learning Camera-Aware Noise Models
1 Introduction
2 Related Work
3 Proposed Method
3.1 Noise-Generating Network
3.2 Camera-Encoding Network
3.3 Learning
4 Experimental Results
4.1 Quantitative and Qualitative Results
4.2 Ablation Studies
4.3 Robustness Analysis of the Camera-Encoding Network
5 Application to Real Image Denoising
5.1 Real-World Image Denoising
5.2 Camera-Specific Denoising Networks
6 Conclusions
References
Towards Precise Completion of Deformable Shapes
1 Introduction
2 Related Work
3 Method
3.1 Overview
3.2 Encoder
3.3 Generator
3.4 Loss Function
3.5 Training Procedure
3.6 Implementation Considerations
4 Experiments
4.1 Datasets
4.2 Methods in Comparison
4.3 Single View Completion
4.4 Non-rigid Partial Correspondences
4.5 Real Scans
5 Conclusions
References
Iterative Distance-Aware Similarity Matrix Convolution with Mutual-Supervised Point Elimination for Efficient Point Cloud Registration
1 Introduction
2 Related Work
3 Model
3.1 Notation
3.2 Similarity Matrix Convolution
3.3 Two-Stage Point Elimination
3.4 Mutual-Supervision Loss
3.5 Balanced Sampling for Training
4 Experiments
4.1 Experimental Setup
4.2 Results
4.3 Efficiency
4.4 Ablation Study
5 Conclusions
References
Pairwise Similarity Knowledge Transfer for Weakly Supervised Object Localization
1 Introduction
2 Related Work
3 Problem Description and Background
4 Proposed Method
4.1 Re-localization
4.2 Knowledge Transfer
4.3 Network Architectures
5 Experiments
5.1 COCO 2017 Dataset
5.2 ILSVRC 2013 Detection Dataset
6 Conclusion
References
Environment-Agnostic Multitask Learning for Natural Language Grounded Navigation
1 Introduction
2 Background
3 Environment-Agnostic Multitask Learning
3.1 Overview
3.2 Multitask Reinforcement Learning
3.3 Environment-Agnostic Representation Learning
4 Model Architecture
5 Experiments
5.1 Experimental Setup
5.2 Environment-Agnostic Multitask Learning
5.3 Multitask Learning
5.4 Environment-Agnostic Learning
5.5 Reward Shaping for NDH Task
6 Conclusion
References
TPFN: Applying Outer Product Along Time to Multimodal Sentiment Analysis Fusion on Incomplete Data
1 Introduction
1.1 Related Works
2 Preliminaries
3 Time Product Fusion Network
3.1 Main Idea: Outer Product Through Time
3.2 Low-Rank Inference Module
3.3 Low-Rank Regularization
4 Experiments
4.1 Performance on MOSI and MOSEI
4.2 Effect of Time Window Size k
4.3 Effect of Regularization Parameter
4.4 Discussion on CP-rank
5 Conclusions
References
ProxyNCA++: Revisiting and Revitalizing Proxy Neighborhood Component Analysis
1 Introduction
2 Related Work
3 Methods
3.1 Neighborhood Component Analysis (NCA)
3.2 ProxyNCA
3.3 Aligning with NCA by Optimizing Proxy Assignment Probability
3.4 About Temperature Scaling
3.5 About Global Pooling
3.6 About Fast Moving Proxies
3.7 Layer Norm (Norm) and Class Balanced Sampling (CBS)
4 Experiments
4.1 Experimental Setup
4.2 Evaluation
4.3 Ablation Study
5 Conclusion
References
Learning with Privileged Information for Efficient Image Super-Resolution
1 Introduction
2 Related Work
3 Method
3.1 Teacher
3.2 Student
4 Experiments
4.1 Experimental Details
4.2 Ablation Studies
4.3 Analysis on Compact Features
4.4 Results
5 Conclusion
References
Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive Person Re-identification
1 Introduction
2 Related Work
3 Proposed Method
3.1 Formulation
3.2 Self-adaptive Classification
3.3 Memory-Based Temporal-Guided Cluster
4 Experiment
4.1 Dataset
4.2 Implementation Details
4.3 Ablation Study
4.4 Comparison with State-of-the-Art Methods
5 Conclusion
References
Autoencoder-Based Graph Construction for Semi-supervised Learning
1 Introduction
2 Related Work
2.1 Semi-supervised Learning
2.2 Graph-Based SSL
2.3 Matrix Completion
3 Problem Formulation
4 Our Approach
4.1 Learning the Graph with Autoencoder
4.2 Simultaneous Training
5 Experiments
5.1 Performance Evaluations
5.2 Comparison to Graph-Based SSL ch30Tempens,ch30SNTG,ch30GSCNN
5.3 Ablation Study
5.4 Wide ResNet Results
6 Conclusion
References
Virtual Multi-view Fusion for 3D Semantic Segmentation
1 Introduction
2 Related Work
3 Method Overview
4 Virtual View Selection
5 Multiview Fusion
5.1 2D Semantic Segmentation Model
5.2 3D Fusion of 2D Semantic Features
6 Experiments
6.1 Evaluation on ScanNet Dataset
6.2 Evaluation on Stanford 3D Indoor Spaces (S3DIS)
6.3 Ablation Studies
7 Conclusion
References
Decoupling GCN with DropGraph Module for Skeleton-Based Action Recognition
1 Introduction
2 Background
3 Approach
3.1 Decoupling Graph Convolutional Network
3.2 Attention-Guided DropGraph
4 Experiments
4.1 Datasets and Model Configuration
4.2 Ablation Study
4.3 Comparisons to the State-of-the-Art
5 Conclusion
References
Deep Shape from Polarization
1 Introduction
2 Related Work
3 Proposed Method
3.1 Image Formation and Physical Solution
3.2 Learning with Physics
4 Dataset and Implementation Details
4.1 Dataset
4.2 Software Implementation
5 Experimental Results
5.1 Comparisons to Physics-Based SfP
5.2 Robustness to Lighting Variations
5.3 Importance of Polarization
5.4 Importance of Physics Revealed by Ablating Priors
5.5 Quantitative Evaluation on Our Test Set
5.6 Qualitative Evaluation on Our Test Set
6 Discussion
References
A Boundary Based Out-of-Distribution Classifier for Generalized Zero-Shot Learning
1 Introduction
2 Related Work
3 Revisit Spherical Variational Auto-Encoders
4 Proposed Approach
4.1 Problem Formulation
4.2 Boundary Based Out-of-Distribution Classifier
4.3 Implementation Details
5 Experiments
5.1 Datasets, Evaluation and Baselines
5.2 Out-of-Distribution Classification
5.3 Comparison with State-of-the-Arts
5.4 Model Analysis
6 Conclusions
References
Mind the Discriminability: Asymmetric Adversarial Domain Adaptation
1 Introduction
2 Revisiting Transferability and Discriminability in Symmetric Adversarial Domain Adaptation
2.1 The Theory of Domain Adaptation
2.2 Limitations and Insights
3 Asymmetric Adversarial Domain Adaptation
3.1 The Learning Framework
3.2 Discussions and Theories
4 Experiments
4.1 Experimental Setup
4.2 Overall Results
4.3 Spectral Analysis Using SVD
4.4 Analytics and Discussions
5 Related Work
6 Conclusion
References
SeqXY2SeqZ: Structure Learning for 3D Shapes by Sequentially Predicting 1D Occupancy Segments from 2D Coordinates
1 Introduction
2 Related Work
3 Overview
4 Voxel Tubelization
5 SeqXY2SeqZ
6 Experiments and Analysis
6.1 Representation Ability
6.2 Single Image 3D Reconstruction
6.3 Ablation Studies and Analysis
7 Conclusion
References
Simultaneous Detection and Tracking with Motion Modelling for Multiple Object Tracking
1 Introduction
2 Related Work
3 Omni-MOT Dataset
4 Deep Motion Modeling Network
4.1 Data Preparation
4.2 Anchor Tubes
4.3 Motion Model
4.4 Architecture
4.5 Deployment
5 Experiments
6 Conclusion
References
Deep FusionNet for Point Cloud Semantic Segmentation
1 Introduction
2 Related Work
2.1 Voxel-Based Networks
2.2 Point-Based Networks
3 FusionNet
3.1 Point Cloud Representation
3.2 Neighborhood Aggregations
3.3 Inner-Voxel Aggregation
3.4 Down/Up-Sampling
3.5 Architecture and Sparse Implementation
4 Experiments
4.1 Datasets
4.2 Ablation Study
4.3 Benchmark Evaluations
4.4 Analysis of the Voxel-Based ``Mini-PointNet''
4.5 Efficiency for Large-Scale Point Cloud Processing
5 Conclusions and Future Work
References
Deep Material Recognition in Light-Fields via Disentanglement of Spatial and Angular Information
1 Introduction
2 Related Work
3 Light-Field Imaging Model
3.1 Light-Field Geometry
3.2 Analysis of Baseline Methods
4 Methods
4.1 Representative Selection
4.2 Angular Registration
4.3 Network Architecture
5 Experimental Results
5.1 Data and Training Procedure
5.2 Overall Performance
5.3 Ablation Studies
5.4 Visualization
6 Conclusion
References
Dual Adversarial Network for Deep Active Learning
1 Introduction
2 Related Work
3 Method
3.1 Dual Adversarial Network for Deep Active Learning
3.2 Training Procedure of DAAL
4 Experiments
4.1 Datasets
4.2 Experimental Settings
4.3 Image Classification on CIFAR10/100
4.4 Semantic Segmentation on Cityscapes
4.5 Performance Analysis
4.6 Ablation Study
5 Conclusion
References
Fully Convolutional Networks for Continuous Sign Language Recognition
1 Introduction
2 Related Work
3 Method
3.1 Main Stream Design
3.2 Gloss Feature Enhancement
4 Experiments
4.1 Experimental Setup
4.2 Results
4.3 Ablation Studies
4.4 Online Recognition
5 Conclusions
References
Self-adapting Confidence Estimation for Stereo
1 Introduction
2 Related Work
3 Learning a Confidence Measure Out-of-the-Box
3.1 Taxonomy of Stereo Matching Systems
3.2 Self-supervision Cues for Black-Box Models
3.3 Multi-modal Binary Cross Entropy
4 Experimental Results
4.1 Implementation Details
4.2 Ablation Study
4.3 Comparison with Offline Methods
4.4 Self-adapting In-the-wild
5 Conclusion
References
Deep Surface Normal Estimation on the 2-Sphere with Confidence Guided Semantic Attention
1 Introduction
2 Related Work
3 Surface Normal Estimation on the 2-Sphere
3.1 Network Architecture
3.2 Multi-scale Mutual Feature Fusion
3.3 2-Sphere vs. 3D Euclidean Space
3.4 Loss Function
4 Confidence Guided Semantic Attention
4.1 Confidence Map for Raw Depth
4.2 The CGSA Module
5 Experiments
5.1 Implementation Details
5.2 Comparisons with the State-of-the-Arts
5.3 Ablation Study
6 Conclusion and Future Work
References
AutoSTR: Efficient Backbone Search for Scene Text Recognition
1 Introduction
2 Related Works
2.1 Scene Text Recognition (STR)
2.2 Neural Architecture Search (NAS)
3 Methodology
3.1 Problem Formulation
3.2 Search Space
3.3 Search Algorithm
3.4 Comparison with Other NAS Works
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Comparison with State of the Art
4.4 Case Study: Searched Backbones and Discussion
4.5 Comparison with Other NAS Approaches
4.6 Ablation Study
5 Conclusion
References
Mitigating Embedding and Class Assignment Mismatch in Unsupervised Image Classification
1 Introduction
2 Related Work
2.1 Unsupervised Embedding
2.2 Unsupervised Classification
3 Model
3.1 Stage 1: Unsupervised Deep Embedding
3.2 Stage 2: Unsupervised Class Assignment with Refining Pretrained Embeddings
4 Experiments
4.1 Image Classification Task
4.2 Component Analyses
4.3 Improvement over SOTA
4.4 Implication on Semi-supervised Learning
5 Conclusion
References
Adversarial Training with Bi-directional Likelihood Regularization for Visual Classification
1 Introduction
2 Related Work
2.1 Adversarial Attacks
2.2 Defensive Methods
3 Proposed Algorithm
3.1 Preliminaries
3.2 Modeling the Feature Distribution
3.3 Likelihood Regularization for Features
4 Experiments
4.1 MNIST
4.2 CIFAR
4.3 Evolution of the Likelihood Regularization
4.4 Hyper-parameter Analysis
4.5 Adversaries for Training
5 Conclusion
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Computer Vision - ECCV 2020

Description

More details

Other editions

Additional editions

Content

System requirements