
Computer Vision - ECCV 2022 Workshops
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
The 367 full papers included in this volume set were carefully reviewed and selected for inclusion in the ECCV 2022 workshop proceedings. They were organized in individual parts as follows:
Part I:
W01 - AI for Space; W02 - Vision for Art; W03 - Adversarial Robustness in the Real World; W04 - Autonomous Vehicle Vision
Part II: W05 - Learning With Limited and Imperfect Data; W06 - Advances in Image Manipulation;
Part III: W07 - Medical Computer Vision; W08 - Computer Vision for Metaverse; W09 - Self-Supervised Learning: What Is Next?;
Part IV: W10 - Self-Supervised Learning for Next-Generation Industry-LevelAutonomous Driving; W11 - ISIC Skin Image Analysis; W12 - Cross-Modal Human-Robot Interaction; W13 - Text in Everything; W14 - BioImage Computing; W15 - Visual Object-Oriented Learning Meets Interaction: Discovery, Representations, and Applications; W16 - AI for Creative Video Editing and Understanding; W17 - Visual Inductive Priors for Data-Efficient Deep Learning; W18 - Mobile Intelligent Photography and Imaging;
Part V: W19 - People Analysis: From Face, Body and Fashion to 3D Virtual Avatars; W20 - Safe Artificial Intelligence for Automated Driving; W21 - Real-World Surveillance: Applications and Challenges; W22 - Affective Behavior Analysis In-the-Wild;
Part VI : W23 - Visual Perception for Navigation in Human Environments: The JackRabbot Human Body Pose Dataset and Benchmark; W24 - Distributed Smart Cameras; W25 - Causality in Vision; W26 - In-Vehicle Sensing and Monitorization; W27 - Assistive Computer Vision and Robotics; W28 - Computational Aspectsof Deep Learning;
Part VII: W29 - Computer Vision for Civil and Infrastructure Engineering; W30 - AI-Enabled Medical Image Analysis: Digital Pathology and Radiology/COVID19; W31 - Compositional and Multimodal Perception;
Part VIII: W32 - Uncertainty Quantification for Computer Vision; W33 - Recovering 6D Object Pose; W34 - Drawings and Abstract Imagery: Representation and Analysis; W35 - Sign Language Understanding; W36 - A Challenge for Out-of-Distribution Generalization in Computer Vision; W37 - Vision With Biased or Scarce Data; W38 - Visual Object Tracking Challenge.
More details
Other editions
Additional editions

Content
- Intro
- Foreword
- Preface
- Organization
- Contents - Part VIII
- W31 - Challenge on Compositional and Multimodal Perception
- W31 - Challenge on Compositional and Multimodal Perception
- YORO - Lightweight End to End Visual Grounding
- 1 Introduction
- 2 Related Work
- 3 YORO Architecture
- 3.1 Multi-modal Inputs
- 3.2 Transformer Encoder
- 3.3 Multi-modal Transformer Outputs
- 3.4 Feature Projector and Detection Heads
- 4 Training Losses
- 5 Experiments
- 5.1 Dataset
- 5.2 Training and Evaluation Details
- 5.3 Ablation Study
- 5.4 Quantitative Comparison
- 5.5 Comparison of Trade-Offs Between Size, Speed, and Accuracy
- 5.6 Qualitative Results
- 6 Conclusion
- References
- W32 - Uncertainty Quantification for Computer Vision
- W32 - Uncertainty Quantification for Computer Vision
- Localization Uncertainty Estimation for Anchor-Free Object Detection
- 1 Introduction
- 2 Related Works
- 2.1 Anchor-Free Object Detection
- 2.2 Uncertainty Estimation
- 3 Uncertainty-Aware Detection (UAD)
- 3.1 Power Likelihood
- 3.2 Uncertainty-Aware Classification
- 3.3 Training and Inference
- 4 Experiments
- 4.1 Ablation Study
- 4.2 Comparison with Other Methods
- 4.3 Discussion
- 5 Conclusion
- References
- Variational Depth Networks: Uncertainty-Aware Monocular Self-supervised Depth Estimation
- 1 Introduction
- 2 Related Work
- 3 Methods
- 3.1 Background and Motivation
- 3.2 Variational Depth Networks
- 4 Experiments
- 4.1 Setup
- 4.2 ScanNet: Uncertainty-Aware Reconstruction
- 4.3 ScanNet: Prior Ablation Study
- 4.4 KITTI: 2D Depth Evaluation
- 5 Conclusions
- References
- Unsupervised Joint Image Transfer and Uncertainty Quantification Using Patch Invariant Networks
- 1 Introduction
- 2 Related Work
- 2.1 Generative Adversarial Networks
- 2.2 Unpaired Image Transfer and Domain Mapping
- 2.3 Uncertainty Quantification
- 3 Method
- 3.1 Preliminaries and GAN Architecture
- 3.2 Patch Invariance
- 3.3 Uncertainty by Loss Attenuation
- 3.4 Implementation Details
- 4 Experiments
- 4.1 Datasets
- 4.2 Compared Methods and Scenarios
- 4.3 Quantitative Evaluation
- 4.4 Qualitative Evaluation
- 4.5 Uncertainty Scores
- 5 Conclusions
- References
- Uncertainty Quantification Using Query-Based Object Detectors
- 1 Introduction
- 2 Background and Related Work
- 2.1 Object Detection
- 2.2 Uncertainty Quantification in Object Detection
- 3 Uncertainty Estimation in Query-Based Detectors
- 3.1 Detection Transformer and AdaMixer
- 3.2 Merging Bounding Boxes
- 3.3 Uncertainty Quantification
- 4 Experiments
- 4.1 Experimental Protocols
- 4.2 Experiment 1: Evaluating Location Uncertainty
- 4.3 Experiment 2: Evaluating Classification Performance
- 4.4 Experiment 3: Evaluating Objectness Uncertainty
- 4.5 Experiment 4: Runtime-Aware Performance Comparison
- 5 Conclusions
- References
- W33 - Recovering 6D Object Pose
- W33 - Recovering 6D Object Pose
- CenDerNet: Center and Curvature Representations for Render-and-Compare 6D Pose Estimation
- 1 Introduction
- 1.1 Context
- 1.2 Related Work
- 1.3 Contributions
- 2 CenDerNet
- 2.1 From Images to Center and Curvature Heatmaps
- 2.2 From Center Heatmaps to 3D Centers
- 2.3 6D Pose Estimation
- 3 Experiments
- 3.1 DIMO
- 3.2 T-LESS
- 4 Conclusions
- References
- Trans6D: Transformer-Based 6D Object Pose Estimation and Refinement
- 1 Introduction
- 2 Related Work
- 2.1 6D Object Pose Estimation
- 2.2 Vision Transformer
- 3 Methodology
- 3.1 Transformer-Based Baselines for 6D Object Pose Estimation
- 3.2 Patch-Aware Feature Fusion
- 3.3 Pure Transformer-Based Pose Refinement
- 3.4 Training
- 4 Experiments
- 4.1 Implementation Details
- 4.2 Datasets
- 4.3 Evaluation Metrics
- 4.4 Ablation Studies
- 4.5 Comparison with State of the Arts
- 5 Conclusions
- References
- Learning to Estimate Multi-view Pose from Object Silhouettes
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Network Architecture
- 3.2 Confidence-Based Loss Function
- 3.3 Training
- 4 Experiments
- 4.1 Datasets
- 4.2 Results
- 5 Conclusion
- References
- TransNet: Category-Level Transparent Object Pose Estimation
- 1 Introduction
- 2 Related Works
- 2.1 Transparent Object Visual Perception for Manipulation
- 2.2 Opaque Object Category-Level Pose Estimation
- 3 TransNet
- 3.1 Architecture Overview
- 3.2 Object Instance Segmentation
- 3.3 Transparent Object Depth Completion
- 3.4 Transparent Object Surface Normal Estimation
- 3.5 Generalized Point Cloud
- 3.6 Transformer Feature Embedding
- 3.7 Pose and Scale Estimation
- 4 Experiments
- 4.1 Comparison with Baseline
- 4.2 Embedding Method Analysis
- 4.3 Ablation Study of Generalized Point Cloud
- 4.4 Depth and Surface Normal Exploration on TransNet
- 5 Conclusions
- References
- W34 - Drawings and Abstract Imagery: Representation and Analysis
- W34 - Drawings and Abstract Imagery: Representation and Analysis
- Fuse and Attend: Generalized Embedding Learning for Art and Sketches
- 1 Introduction
- 2 Proposed Method
- 2.1 Gated Fusion for Representative Refinement
- 2.2 Attention-Based Query Embedding Refinement and Augmentation for Positive Generation
- 2.3 Student-Teacher Based Contrastive Learning
- 3 Related Work and Experiments
- 3.1 Comparison of Our Method RCERM Against the SOTA ERM and Fishr Methods:
- 3.2 Ablation/Analysis of Our Method:
- 3.3 Key Takeaways:
- 4 Conclusion
- References
- 3D Shape Reconstruction from Free-Hand Sketches
- 1 Introduction
- 2 Related Works
- 3 3D Reconstruction from Sketches
- 3.1 Synthetic Sketch Generation
- 3.2 Sketch Standardization
- 3.3 Sketch-Based 3D Reconstruction
- 4 Experimental Results
- 4.1 3D Sketching Dataset
- 4.2 Training Details and Evaluation Metrics
- 4.3 Implementation Details
- 4.4 Results and Comparisons
- 4.5 Sketch Standardization Module
- 4.6 View Estimation Module
- 5 Summary
- References
- Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine
- 1 Introduction
- 2 Background
- 2.1 Metrics to Evaluate Search Engines
- 2.2 Similarity Approaches to Establish Relevance
- 3 Related Work
- 4 Methods
- 5 Results and Discussion
- 6 Future Work
- 7 Conclusion
- References
- W35 - Sign Language Understanding
- W35 - Sign Language Understanding
- ECCV 2022 Sign Spotting Challenge: Dataset, Design and Results
- 1 Introduction
- 2 Related Work
- 3 Challenge Design
- 3.1 The Dataset
- 3.2 Evaluation Protocol
- 3.3 The Baseline
- 4 Challenge Results and Winning Methods
- 4.1 The Leaderboard
- 4.2 Top Winning Approaches: MSSL Track
- 4.3 Top Winning Approaches: OSLWL Track
- 4.4 Performance on Marginal Distributions of the Test Set
- 5 Conclusions
- References
- Hierarchical I3D for Sign Spotting
- 1 Introduction
- 2 Related Work
- 2.1 Sign Language Recognition
- 2.2 Sign Spotting
- 3 Approach
- 3.1 I3D Feature Extraction Layers
- 3.2 Hierarchical Network Head
- 3.3 Learning Objectives
- 4 Experiments
- 4.1 LSE_eSaude_UVIGO Dataset
- 4.2 Evaluation Metric
- 4.3 Random Sampling Probabilities
- 4.4 Implementation Details
- 4.5 Results
- 4.6 Ablation Studies
- 5 Conclusion
- References
- Multi-modal Sign Language Spotting by Multi/One-Shot Learning
- 1 Introduction
- 2 Related Work
- 3 Methods
- 3.1 Feature Extraction
- 3.2 MSSL Framework
- 3.3 OSLWL Framework
- 4 Experiments
- 4.1 Experiment Settings
- 4.2 Ablation on MSSL
- 4.3 Ablation on OSLWL
- 5 Conclusion
- 5.1 Limitation and Future Work
- References
- Sign Spotting via Multi-modal Fusion and Testing Time Transferring
- 1 Introduction
- 2 Related Work
- 3 Proposed Method
- 3.1 Data Processing
- 3.2 Multi-modal Feature Extraction
- 3.3 Loss Function
- 3.4 Sign Spotting from Isolated Signs
- 3.5 Top-K Transferring Technique
- 4 Experiments
- 4.1 Dataset
- 4.2 Implementation Details
- 4.3 Evaluation Metrics
- 4.4 Main Results
- 5 Conclusions
- References
- W36 - A Challenge for Out-of-Distribution Generalization in Computer Vision
- W36 - A Challenge for Out-of-Distribution Generalization in Computer Vision
- Domain-Conditioned Normalization for Test-Time Domain Generalization
- 1 Introduction
- 2 Related Work
- 2.1 Domain Generalization
- 2.2 Normalization in Neural Networks
- 2.3 Adaptation and Generalization at Test Time
- 3 Methodology
- 3.1 Preliminary
- 3.2 Domain Conditioned Normalization
- 3.3 Training and Inference
- 4 Experiments
- 4.1 Experiment Setup
- 4.2 Comparison with State-of-the-Art Methods
- 4.3 Ablation Study
- 4.4 Further Analysis
- 5 Conclusions
- References
- Unleashing the Potential of Adaptation Models via Go-getting Domain Labels
- 1 Introduction
- 2 Related Work
- 3 Adversarial Domain Adaptation with Go-labels
- 3.1 Prior Knowledge Recap and Problem Definition
- 3.2 Proposed Go-getting Domain Labels
- 3.3 Theoretical Insights of Go-labels
- 4 Experiments
- 4.1 Validation on Toy Problems
- 4.2 Experiments on the General UDA Benchmarks
- 4.3 Comparison with State-of-the-Arts
- 5 Conclusion
- References
- ModSelect: Automatic Modality Selection for Synthetic-to-Real Domain Generalization
- 1 Introduction
- 2 Related Work
- 2.1 Multimodal Action Recognition
- 2.2 Modality Contribution Quantification
- 2.3 Domain Generalization and Adaptation
- 3 Approach
- 3.1 Action Recognition Task
- 3.2 Datasets
- 3.3 Modality Extraction and Training
- 3.4 Quantification Study: Modality Contributions
- 3.5 ModSelect: Unsupervised Modality Selection
- 4 Experiments
- 4.1 Late Fusion: Results
- 4.2 Quantification Study: Results
- 4.3 Results from ModSelect: Unsupervised Modality Selection
- 5 Limitations and Conclusion
- References
- Consistency Regularization for Domain Adaptation
- 1 Introduction
- 2 Related Work
- 2.1 Unsupervised Domain Adaptation
- 2.2 Semantic Segmentation
- 2.3 Consistency Regularization
- 3 Our Method
- 3.1 Overall Training
- 4 Experiments
- 4.1 Implementation Details
- 4.2 Results
- 4.3 Ablation Study
- 5 Conclusion
- References
- W37 - Vision With Biased or Scarce Data
- W37 - Vision With Biased or Scarce Data
- CAT: Controllable Attribute Translation for Fair Facial Attribute Classification
- 1 Introduction
- 2 Related Work
- 3 Approach
- 4 Experimental Evaluation
- 4.1 Attributes Study
- 4.2 Synthetic Attribute-Level Balanced Datasets
- 4.3 Sex Classification
- 4.4 Facial Attribute Classification
- 4.5 Ablation Study
- 5 Conclusion
- References
- Weakly Supervised Invariant Representation Learning via Disentangling Known and Unknown Nuisance Factors
- 1 Introduction
- 2 Related Work
- 3 Learning Disentangled and Invariant Representation
- 3.1 Model Architecture
- 3.2 Learning Independent Known Nuisance Factors znk
- 3.3 Learning Invariant Predictive Factors zp
- 4 Experiments Evaluation
- 4.1 Benchmarks, Baselines and Metrics
- 4.2 Comparison with Previous Work
- 4.3 Ablation Study
- 5 Conclusion
- References
- Learning Visual Explanations for DCNN-Based Image Classifiers Using an Attention Mechanism
- 1 Introduction
- 2 Related Work
- 3 Proposed Method
- 3.1 Problem Formulation
- 3.2 Training the Attention Mechanism
- 3.3 Inference of Model Decision's Explanation
- 4 Experiments
- 4.1 Dataset
- 4.2 Experimental Setup
- 4.3 Evaluation Measures
- 4.4 Results
- 5 Conclusion
- References
- Self-supervised Orientation-Guided Deep Network for Segmentation of Carbon Nanotubes in SEM Imagery
- 1 Introduction
- 2 Methods
- 2.1 Network Architecture
- 2.2 Generation of Training Labels
- 3 Experimental Results
- 3.1 Datasets
- 3.2 Training Process
- 3.3 Segmentation Evaluation
- 4 Conclusions
- References
- W38 - Visual Object Tracking Challenge
- W38 - Visual Object Tracking Challenge
- The Tenth Visual Object Tracking VOT2022 Challenge Results
- 1 Introduction
- 1.1 The VOT2022 Challenge
- 2 Performance Evaluation Protocol
- 2.1 The Short-Term Evaluation Protocols
- 2.2 The Long-Term Evaluation Protocol
- 3 Description of Individual Challenges
- 3.1 VOT-ST2022 Challenge Outline
- 3.2 VOT-RT2022 Challenge Outline
- 3.3 VOT-LT2022 Challenge Outline
- 3.4 VOT-RGBD2022 Challenge Outline
- 4 The VOT2022 Challenge Results
- 4.1 The VOT-STs2022 Challenge Results
- 4.2 The VOT-STb2022 Challenge Results
- 4.3 The VOT-RTs2022 Challenge Results
- 4.4 The VOT-RTb2022 Challenge Results
- 4.5 The VOT-LT2022 Challenge Results
- 4.6 The VOT-RGBD2022 Challenge Results
- 4.7 The VOT-D2022 Challenge Results
- 5 Conclusions
- References
- Efficient Visual Tracking via Hierarchical Cross-Attention Transformer
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Overall Architecture
- 3.2 Feature Sparsification Module
- 3.3 Hierarchical Cross-Attention Transformer
- 3.4 Training Loss
- 4 Experiments
- 4.1 Implementation Details
- 4.2 Evaluation on TrackingNet, LaSOT, and GOT-10k Datasets
- 4.3 Evaluation on VOT2020 Datasets
- 4.4 Evaluation on Other Datasets
- 4.5 Ablation Study and Analysis
- 5 Conclusion
- References
- Learning Dual-Fused Modality-Aware Representations for RGBD Tracking
- 1 Introduction
- 2 Related Work
- 3 Methodology
- 3.1 Overview
- 3.2 Cross-Modal Integration Module (CMIM)
- 3.3 Specificity Preserving Module (SPM)
- 3.4 Training Loss
- 3.5 Implementation Details
- 3.6 Visualization
- 4 Experiments
- 4.1 Experiment Settings
- 4.2 Main Results
- 4.3 Ablation Study
- 5 Conclusions
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.