Computer Vision - ECCV 2022 Workshops

Name: Computer Vision - ECCV 2022 Workshops | Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part VIII
Brand: Springer
Price: 85.59 EUR
Availability: OnlineOnly

Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part VIII

Leonid Karlinsky Tomer Michaeli Ko Nishino(Editor)

Springer (Publisher)

Published on 11. February 2023

XXV, 497 pages

E-Book

PDF with digital watermarking

System requirements

978-3-031-25085-9 (ISBN)

€85.59incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

The 8-volume set, comprising the LNCS books 13801 until 13809, constitutes the refereed proceedings of 38 out of the 60 workshops held at the 17th European Conference on Computer Vision, ECCV 2022. The conference took place in Tel Aviv, Israel, during October 23-27, 2022; the workshops were held hybrid or online.

The 367 full papers included in this volume set were carefully reviewed and selected for inclusion in the ECCV 2022 workshop proceedings. They were organized in individual parts as follows:

Part I: W01 - AI for Space; W02 - Vision for Art; W03 - Adversarial Robustness in the Real World; W04 - Autonomous Vehicle Vision

Part II: W05 - Learning With Limited and Imperfect Data; W06 - Advances in Image Manipulation;

Part III: W07 - Medical Computer Vision; W08 - Computer Vision for Metaverse; W09 - Self-Supervised Learning: What Is Next?;

Part IV: W10 - Self-Supervised Learning for Next-Generation Industry-LevelAutonomous Driving; W11 - ISIC Skin Image Analysis; W12 - Cross-Modal Human-Robot Interaction; W13 - Text in Everything; W14 - BioImage Computing; W15 - Visual Object-Oriented Learning Meets Interaction: Discovery, Representations, and Applications; W16 - AI for Creative Video Editing and Understanding; W17 - Visual Inductive Priors for Data-Efficient Deep Learning; W18 - Mobile Intelligent Photography and Imaging;

Part V: W19 - People Analysis: From Face, Body and Fashion to 3D Virtual Avatars; W20 - Safe Artificial Intelligence for Automated Driving; W21 - Real-World Surveillance: Applications and Challenges; W22 - Affective Behavior Analysis In-the-Wild;

Part VI : W23 - Visual Perception for Navigation in Human Environments: The JackRabbot Human Body Pose Dataset and Benchmark; W24 - Distributed Smart Cameras; W25 - Causality in Vision; W26 - In-Vehicle Sensing and Monitorization; W27 - Assistive Computer Vision and Robotics; W28 - Computational Aspectsof Deep Learning;

Part VII: W29 - Computer Vision for Civil and Infrastructure Engineering; W30 - AI-Enabled Medical Image Analysis: Digital Pathology and Radiology/COVID19; W31 - Compositional and Multimodal Perception;

Part VIII: W32 - Uncertainty Quantification for Computer Vision; W33 - Recovering 6D Object Pose; W34 - Drawings and Abstract Imagery: Representation and Analysis; W35 - Sign Language Understanding; W36 - A Challenge for Out-of-Distribution Generalization in Computer Vision; W37 - Vision With Biased or Scarce Data; W38 - Visual Object Tracking Challenge.

More details

Other editions

Content

Intro
Foreword
Preface
Organization
Contents - Part VIII
W31 - Challenge on Compositional and Multimodal Perception
W31 - Challenge on Compositional and Multimodal Perception
YORO - Lightweight End to End Visual Grounding
1 Introduction
2 Related Work
3 YORO Architecture
3.1 Multi-modal Inputs
3.2 Transformer Encoder
3.3 Multi-modal Transformer Outputs
3.4 Feature Projector and Detection Heads
4 Training Losses
5 Experiments
5.1 Dataset
5.2 Training and Evaluation Details
5.3 Ablation Study
5.4 Quantitative Comparison
5.5 Comparison of Trade-Offs Between Size, Speed, and Accuracy
5.6 Qualitative Results
6 Conclusion
References
W32 - Uncertainty Quantification for Computer Vision
W32 - Uncertainty Quantification for Computer Vision
Localization Uncertainty Estimation for Anchor-Free Object Detection
1 Introduction
2 Related Works
2.1 Anchor-Free Object Detection
2.2 Uncertainty Estimation
3 Uncertainty-Aware Detection (UAD)
3.1 Power Likelihood
3.2 Uncertainty-Aware Classification
3.3 Training and Inference
4 Experiments
4.1 Ablation Study
4.2 Comparison with Other Methods
4.3 Discussion
5 Conclusion
References
Variational Depth Networks: Uncertainty-Aware Monocular Self-supervised Depth Estimation
1 Introduction
2 Related Work
3 Methods
3.1 Background and Motivation
3.2 Variational Depth Networks
4 Experiments
4.1 Setup
4.2 ScanNet: Uncertainty-Aware Reconstruction
4.3 ScanNet: Prior Ablation Study
4.4 KITTI: 2D Depth Evaluation
5 Conclusions
References
Unsupervised Joint Image Transfer and Uncertainty Quantification Using Patch Invariant Networks
1 Introduction
2 Related Work
2.1 Generative Adversarial Networks
2.2 Unpaired Image Transfer and Domain Mapping
2.3 Uncertainty Quantification
3 Method
3.1 Preliminaries and GAN Architecture
3.2 Patch Invariance
3.3 Uncertainty by Loss Attenuation
3.4 Implementation Details
4 Experiments
4.1 Datasets
4.2 Compared Methods and Scenarios
4.3 Quantitative Evaluation
4.4 Qualitative Evaluation
4.5 Uncertainty Scores
5 Conclusions
References
Uncertainty Quantification Using Query-Based Object Detectors
1 Introduction
2 Background and Related Work
2.1 Object Detection
2.2 Uncertainty Quantification in Object Detection
3 Uncertainty Estimation in Query-Based Detectors
3.1 Detection Transformer and AdaMixer
3.2 Merging Bounding Boxes
3.3 Uncertainty Quantification
4 Experiments
4.1 Experimental Protocols
4.2 Experiment 1: Evaluating Location Uncertainty
4.3 Experiment 2: Evaluating Classification Performance
4.4 Experiment 3: Evaluating Objectness Uncertainty
4.5 Experiment 4: Runtime-Aware Performance Comparison
5 Conclusions
References
W33 - Recovering 6D Object Pose
W33 - Recovering 6D Object Pose
CenDerNet: Center and Curvature Representations for Render-and-Compare 6D Pose Estimation
1 Introduction
1.1 Context
1.2 Related Work
1.3 Contributions
2 CenDerNet
2.1 From Images to Center and Curvature Heatmaps
2.2 From Center Heatmaps to 3D Centers
2.3 6D Pose Estimation
3 Experiments
3.1 DIMO
3.2 T-LESS
4 Conclusions
References
Trans6D: Transformer-Based 6D Object Pose Estimation and Refinement
1 Introduction
2 Related Work
2.1 6D Object Pose Estimation
2.2 Vision Transformer
3 Methodology
3.1 Transformer-Based Baselines for 6D Object Pose Estimation
3.2 Patch-Aware Feature Fusion
3.3 Pure Transformer-Based Pose Refinement
3.4 Training
4 Experiments
4.1 Implementation Details
4.2 Datasets
4.3 Evaluation Metrics
4.4 Ablation Studies
4.5 Comparison with State of the Arts
5 Conclusions
References
Learning to Estimate Multi-view Pose from Object Silhouettes
1 Introduction
2 Related Work
3 Method
3.1 Network Architecture
3.2 Confidence-Based Loss Function
3.3 Training
4 Experiments
4.1 Datasets
4.2 Results
5 Conclusion
References
TransNet: Category-Level Transparent Object Pose Estimation
1 Introduction
2 Related Works
2.1 Transparent Object Visual Perception for Manipulation
2.2 Opaque Object Category-Level Pose Estimation
3 TransNet
3.1 Architecture Overview
3.2 Object Instance Segmentation
3.3 Transparent Object Depth Completion
3.4 Transparent Object Surface Normal Estimation
3.5 Generalized Point Cloud
3.6 Transformer Feature Embedding
3.7 Pose and Scale Estimation
4 Experiments
4.1 Comparison with Baseline
4.2 Embedding Method Analysis
4.3 Ablation Study of Generalized Point Cloud
4.4 Depth and Surface Normal Exploration on TransNet
5 Conclusions
References
W34 - Drawings and Abstract Imagery: Representation and Analysis
W34 - Drawings and Abstract Imagery: Representation and Analysis
Fuse and Attend: Generalized Embedding Learning for Art and Sketches
1 Introduction
2 Proposed Method
2.1 Gated Fusion for Representative Refinement
2.2 Attention-Based Query Embedding Refinement and Augmentation for Positive Generation
2.3 Student-Teacher Based Contrastive Learning
3 Related Work and Experiments
3.1 Comparison of Our Method RCERM Against the SOTA ERM and Fishr Methods:
3.2 Ablation/Analysis of Our Method:
3.3 Key Takeaways:
4 Conclusion
References
3D Shape Reconstruction from Free-Hand Sketches
1 Introduction
2 Related Works
3 3D Reconstruction from Sketches
3.1 Synthetic Sketch Generation
3.2 Sketch Standardization
3.3 Sketch-Based 3D Reconstruction
4 Experimental Results
4.1 3D Sketching Dataset
4.2 Training Details and Evaluation Metrics
4.3 Implementation Details
4.4 Results and Comparisons
4.5 Sketch Standardization Module
4.6 View Estimation Module
5 Summary
References
Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine
1 Introduction
2 Background
2.1 Metrics to Evaluate Search Engines
2.2 Similarity Approaches to Establish Relevance
3 Related Work
4 Methods
5 Results and Discussion
6 Future Work
7 Conclusion
References
W35 - Sign Language Understanding
W35 - Sign Language Understanding
ECCV 2022 Sign Spotting Challenge: Dataset, Design and Results
1 Introduction
2 Related Work
3 Challenge Design
3.1 The Dataset
3.2 Evaluation Protocol
3.3 The Baseline
4 Challenge Results and Winning Methods
4.1 The Leaderboard
4.2 Top Winning Approaches: MSSL Track
4.3 Top Winning Approaches: OSLWL Track
4.4 Performance on Marginal Distributions of the Test Set
5 Conclusions
References
Hierarchical I3D for Sign Spotting
1 Introduction
2 Related Work
2.1 Sign Language Recognition
2.2 Sign Spotting
3 Approach
3.1 I3D Feature Extraction Layers
3.2 Hierarchical Network Head
3.3 Learning Objectives
4 Experiments
4.1 LSE_eSaude_UVIGO Dataset
4.2 Evaluation Metric
4.3 Random Sampling Probabilities
4.4 Implementation Details
4.5 Results
4.6 Ablation Studies
5 Conclusion
References
Multi-modal Sign Language Spotting by Multi/One-Shot Learning
1 Introduction
2 Related Work
3 Methods
3.1 Feature Extraction
3.2 MSSL Framework
3.3 OSLWL Framework
4 Experiments
4.1 Experiment Settings
4.2 Ablation on MSSL
4.3 Ablation on OSLWL
5 Conclusion
5.1 Limitation and Future Work
References
Sign Spotting via Multi-modal Fusion and Testing Time Transferring
1 Introduction
2 Related Work
3 Proposed Method
3.1 Data Processing
3.2 Multi-modal Feature Extraction
3.3 Loss Function
3.4 Sign Spotting from Isolated Signs
3.5 Top-K Transferring Technique
4 Experiments
4.1 Dataset
4.2 Implementation Details
4.3 Evaluation Metrics
4.4 Main Results
5 Conclusions
References
W36 - A Challenge for Out-of-Distribution Generalization in Computer Vision
W36 - A Challenge for Out-of-Distribution Generalization in Computer Vision
Domain-Conditioned Normalization for Test-Time Domain Generalization
1 Introduction
2 Related Work
2.1 Domain Generalization
2.2 Normalization in Neural Networks
2.3 Adaptation and Generalization at Test Time
3 Methodology
3.1 Preliminary
3.2 Domain Conditioned Normalization
3.3 Training and Inference
4 Experiments
4.1 Experiment Setup
4.2 Comparison with State-of-the-Art Methods
4.3 Ablation Study
4.4 Further Analysis
5 Conclusions
References
Unleashing the Potential of Adaptation Models via Go-getting Domain Labels
1 Introduction
2 Related Work
3 Adversarial Domain Adaptation with Go-labels
3.1 Prior Knowledge Recap and Problem Definition
3.2 Proposed Go-getting Domain Labels
3.3 Theoretical Insights of Go-labels
4 Experiments
4.1 Validation on Toy Problems
4.2 Experiments on the General UDA Benchmarks
4.3 Comparison with State-of-the-Arts
5 Conclusion
References
ModSelect: Automatic Modality Selection for Synthetic-to-Real Domain Generalization
1 Introduction
2 Related Work
2.1 Multimodal Action Recognition
2.2 Modality Contribution Quantification
2.3 Domain Generalization and Adaptation
3 Approach
3.1 Action Recognition Task
3.2 Datasets
3.3 Modality Extraction and Training
3.4 Quantification Study: Modality Contributions
3.5 ModSelect: Unsupervised Modality Selection
4 Experiments
4.1 Late Fusion: Results
4.2 Quantification Study: Results
4.3 Results from ModSelect: Unsupervised Modality Selection
5 Limitations and Conclusion
References
Consistency Regularization for Domain Adaptation
1 Introduction
2 Related Work
2.1 Unsupervised Domain Adaptation
2.2 Semantic Segmentation
2.3 Consistency Regularization
3 Our Method
3.1 Overall Training
4 Experiments
4.1 Implementation Details
4.2 Results
4.3 Ablation Study
5 Conclusion
References
W37 - Vision With Biased or Scarce Data
W37 - Vision With Biased or Scarce Data
CAT: Controllable Attribute Translation for Fair Facial Attribute Classification
1 Introduction
2 Related Work
3 Approach
4 Experimental Evaluation
4.1 Attributes Study
4.2 Synthetic Attribute-Level Balanced Datasets
4.3 Sex Classification
4.4 Facial Attribute Classification
4.5 Ablation Study
5 Conclusion
References
Weakly Supervised Invariant Representation Learning via Disentangling Known and Unknown Nuisance Factors
1 Introduction
2 Related Work
3 Learning Disentangled and Invariant Representation
3.1 Model Architecture
3.2 Learning Independent Known Nuisance Factors znk
3.3 Learning Invariant Predictive Factors zp
4 Experiments Evaluation
4.1 Benchmarks, Baselines and Metrics
4.2 Comparison with Previous Work
4.3 Ablation Study
5 Conclusion
References
Learning Visual Explanations for DCNN-Based Image Classifiers Using an Attention Mechanism
1 Introduction
2 Related Work
3 Proposed Method
3.1 Problem Formulation
3.2 Training the Attention Mechanism
3.3 Inference of Model Decision's Explanation
4 Experiments
4.1 Dataset
4.2 Experimental Setup
4.3 Evaluation Measures
4.4 Results
5 Conclusion
References
Self-supervised Orientation-Guided Deep Network for Segmentation of Carbon Nanotubes in SEM Imagery
1 Introduction
2 Methods
2.1 Network Architecture
2.2 Generation of Training Labels
3 Experimental Results
3.1 Datasets
3.2 Training Process
3.3 Segmentation Evaluation
4 Conclusions
References
W38 - Visual Object Tracking Challenge
W38 - Visual Object Tracking Challenge
The Tenth Visual Object Tracking VOT2022 Challenge Results
1 Introduction
1.1 The VOT2022 Challenge
2 Performance Evaluation Protocol
2.1 The Short-Term Evaluation Protocols
2.2 The Long-Term Evaluation Protocol
3 Description of Individual Challenges
3.1 VOT-ST2022 Challenge Outline
3.2 VOT-RT2022 Challenge Outline
3.3 VOT-LT2022 Challenge Outline
3.4 VOT-RGBD2022 Challenge Outline
4 The VOT2022 Challenge Results
4.1 The VOT-STs2022 Challenge Results
4.2 The VOT-STb2022 Challenge Results
4.3 The VOT-RTs2022 Challenge Results
4.4 The VOT-RTb2022 Challenge Results
4.5 The VOT-LT2022 Challenge Results
4.6 The VOT-RGBD2022 Challenge Results
4.7 The VOT-D2022 Challenge Results
5 Conclusions
References
Efficient Visual Tracking via Hierarchical Cross-Attention Transformer
1 Introduction
2 Related Work
3 Method
3.1 Overall Architecture
3.2 Feature Sparsification Module
3.3 Hierarchical Cross-Attention Transformer
3.4 Training Loss
4 Experiments
4.1 Implementation Details
4.2 Evaluation on TrackingNet, LaSOT, and GOT-10k Datasets
4.3 Evaluation on VOT2020 Datasets
4.4 Evaluation on Other Datasets
4.5 Ablation Study and Analysis
5 Conclusion
References
Learning Dual-Fused Modality-Aware Representations for RGBD Tracking
1 Introduction
2 Related Work
3 Methodology
3.1 Overview
3.2 Cross-Modal Integration Module (CMIM)
3.3 Specificity Preserving Module (SPM)
3.4 Training Loss
3.5 Implementation Details
3.6 Visualization
4 Experiments
4.1 Experiment Settings
4.2 Main Results
4.3 Ablation Study
5 Conclusions
References
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Computer Vision - ECCV 2022 Workshops

Description

More details

Other editions

Additional editions

Content

System requirements