
Computer Vision - ECCV 2022 Workshops
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
The 367 full papers included in this volume set were carefully reviewed and selected for inclusion in the ECCV 2022 workshop proceedings. They were organized in individual parts as follows:
Part I:
W01 - AI for Space; W02 - Vision for Art; W03 - Adversarial Robustness in the Real World; W04 - Autonomous Vehicle Vision
Part II: W05 - Learning With Limited and Imperfect Data; W06 - Advances in Image Manipulation;
Part III: W07 - Medical Computer Vision; W08 - Computer Vision for Metaverse; W09 - Self-Supervised Learning: What Is Next?;
Part IV: W10 - Self-Supervised Learning for Next-Generation Industry-LevelAutonomous Driving; W11 - ISIC Skin Image Analysis; W12 - Cross-Modal Human-Robot Interaction; W13 - Text in Everything; W14 - BioImage Computing; W15 - Visual Object-Oriented Learning Meets Interaction: Discovery, Representations, and Applications; W16 - AI for Creative Video Editing and Understanding; W17 - Visual Inductive Priors for Data-Efficient Deep Learning; W18 - Mobile Intelligent Photography and Imaging;
Part V: W19 - People Analysis: From Face, Body and Fashion to 3D Virtual Avatars; W20 - Safe Artificial Intelligence for Automated Driving; W21 - Real-World Surveillance: Applications and Challenges; W22 - Affective Behavior Analysis In-the-Wild;
Part VI : W23 - Visual Perception for Navigation in Human Environments: The JackRabbot Human Body Pose Dataset and Benchmark; W24 - Distributed Smart Cameras; W25 - Causality in Vision; W26 - In-Vehicle Sensing and Monitorization; W27 - Assistive Computer Vision and Robotics; W28 - Computational Aspectsof Deep Learning;
Part VII: W29 - Computer Vision for Civil and Infrastructure Engineering; W30 - AI-Enabled Medical Image Analysis: Digital Pathology and Radiology/COVID19; W31 - Compositional and Multimodal Perception;
Part VIII: W32 - Uncertainty Quantification for Computer Vision; W33 - Recovering 6D Object Pose; W34 - Drawings and Abstract Imagery: Representation and Analysis; W35 - Sign Language Understanding; W36 - A Challenge for Out-of-Distribution Generalization in Computer Vision; W37 - Vision With Biased or Scarce Data; W38 - Visual Object Tracking Challenge.
More details
Other editions
Additional editions

Content
- Intro
- Foreword
- Preface
- Organization
- Contents - Part IV
- W09 - Self-supervised Learning: What Is Next?
- W09 - Self-supervised Learning: What Is Next?
- Towards Self-Supervised and Weight-preserving Neural Architecture Search
- 1 Introduction
- 2 Related Work
- 3 Methodology
- 3.1 Preliminary: Differentiable NAS
- 3.2 Towards Weight-preserving
- 3.3 Towards Self-supervised Learning
- 3.4 Network Inflation Challenge
- 3.5 Searching with SSWP-NAS
- 4 Experiments
- 4.1 Experimental Settings
- 4.2 Benchmarking SSWP-NAS
- 4.3 Ablation Study
- 4.4 Weight-preserving Benefits Semi-supervised Learning
- 5 Conclusion
- References
- MoQuad: Motion-focused Quadruple Construction for Video Contrastive Learning
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 MoQuad Sample Construction
- 3.2 Extra Training Strategies for MoQuad
- 4 Experiments
- 4.1 Experimental Settings
- 4.2 Comparison with State of the Arts
- 4.3 Ablation Studies
- 4.4 More Analyses
- 5 Conclusion
- References
- On the Effectiveness of ViT Features as Local Semantic Descriptors
- 1 Introduction
- 2 Related Work
- 3 ViT Features as Local Patch Descriptors
- 3.1 Properties of ViT's Features
- 4 Deep ViT Features Applied to Vision Tasks
- 5 Results
- 5.1 Part Co-segmentation
- 5.2 Co-segmentation
- 5.3 Point Correspondences
- 6 Conclusion
- References
- Anomaly Detection Requires Better Representations
- 1 Introduction
- 2 Related Work
- 3 Anomaly Detection as a Downstream Task for Representation Learning
- 4 Successful Representation Learning Enables Anomaly Detection
- 5 Gaps in Anomaly Detection Point to Bottlenecks in Representations Learning
- 5.1 Masked-Autoencoder: Advances in Self-Supervised Learning Do Not Always Imply Better Anomaly Detection
- 5.2 Complex Datasets: Current Representations Struggle on Scenes, Finegrained Classes, Multiple Objects
- 5.3 Unidentifiability: Representations for Anomaly Detection May Be Ambiguous Without Further Guidance
- 5.4 3D Point Clouds: Self-supervised Representations Do Not Always Improve over Handcrafted Ones
- 5.5 Tabular Data: When Representations Do Not Improve over the Original Data
- 6 Final Remarks
- A Appendix
- A.1 Anomaly detection comparison of MAE and DINO
- A.2 Multi-modal datasets
- A.3 Tabular domain
- References
- Leveraging Self-Supervised Training for Unintentional Action Recognition
- 1 Introduction
- 2 Related Work
- 3 Approach to Exploit the UA Inherent Biases
- 3.1 Framework Overview
- 3.2 Temporal Transformations of Inherent Biases of Unintentional Actions (T2IBUA )
- 4 Multi-Stage Learning for Unintentional Action Recognition
- 4.1 Transformer Block
- 4.2 [Stage 1] Frame2Clip (F2C) Learning
- 4.3 [Stage 2] Frame2Clip2Video (F2C2V) Learning
- 4.4 [Stage 3] Downstream Transfer to Unintentional Action Tasks
- 5 Experimental Results
- 5.1 Comparison to State-of-the-art
- 5.2 Ablation Study
- 6 Conclusion
- References
- A Study on Self-Supervised Object Detection Pretraining
- 1 Introduction
- 2 Related Work
- 2.1 Self-Supervised Learning from Images
- 2.2 Object Detection
- 3 Approach
- 3.1 View Construction and Box Sampling
- 3.2 SSL Backbone
- 3.3 Comparison with Prior Work
- 4 Experiments
- 4.1 Experimental Setup
- 4.2 Effect of Box Sampling Strategies
- 4.3 Effect of Methods to Extract Box Features
- 4.4 Effect of Multiple Views
- 4.5 Effect of Box Localization Auxiliary Task
- 5 Conclusion
- References
- Internet Curiosity: Directed Unsupervised Learning on Uncurated Internet Data
- 1 Introduction
- 2 Internet Curiosity
- 2.1 Image Search
- 2.2 Self-Supervised Training
- 2.3 ``Densifying'' the Target Dataset via Curiosity
- 2.4 Generating Queries
- 3 Results
- References
- W10 - Self-supervised Learning for Next-Generation Industry-Level Autonomous Driving
- W10 - Self-supervised Learning for Next-Generation Industry-Level Autonomous Driving
- Towards Autonomous Grading in the Real World
- 1 Introduction
- 2 Related Work
- 2.1 Bulldozer Automation
- 2.2 Sand Simulation
- 2.3 Sim-to-Real
- 3 Method
- 3.1 Problem Formulation
- 3.2 Dozer Simulation
- 3.3 Baseline Algorithm
- 3.4 Privileged Behavioral Cloning
- 3.5 Scaled Prototype Environment
- 4 Experiments
- 4.1 Simulation Results
- 4.2 Scaled Prototype Environment Results
- 5 Conclusions
- References
- Bootstrapping Autonomous Lane Changes with Self-supervised Augmented Runs
- 1 Introduction
- 1.1 Challenge
- 1.2 Related Works
- 2 Problem Formulation
- 2.1 States of Lanes and Surrounding Vehicles
- 2.2 Formulation as a Learning Problem
- 3 Sample Preparation by Augmented Run
- 3.1 Extracting Anchor Features from Real Runs
- 3.2 Auto-labeling from Augmented Runs
- 4 Supervised Learning
- 4.1 Interval-Level Feature Aggregation
- 4.2 Classification
- 5 Experiments
- 5.1 Performance of Proposed Approach
- 5.2 Performance of Alternative Approach
- 6 Conclusion
- References
- W11 - Skin Image Analysis
- W11 - ISIC Skin Image Analysis
- Artifact-Based Domain Generalization of Skin Lesion Models
- 1 Introduction
- 2 Background
- 3 Methodology
- 3.1 Trap Sets
- 3.2 Artifact-Based Environments
- 3.3 NoiseCrop: Test-Time Feature Selection
- 4 Results
- 4.1 Data
- 4.2 Model Selection and Implementation Details
- 4.3 Debiasing of Skin Lesion Models
- 4.4 Ablation Study
- 4.5 Out-of-Distribution Evaluation
- 4.6 Qualitative Analysis
- 5 Related Work
- 6 Conclusion
- References
- An Evaluation of Self-supervised Pre-training for Skin-Lesion Analysis
- 1 Introduction
- 2 Related Work
- 2.1 Self-supervised Learning for Visual Tasks
- 2.2 Self-supervised Learning on Medical Tasks
- 2.3 Self-supervised Learning on Skin Lesion Analysis
- 3 Materials and Methods
- 3.1 Datasets
- 3.2 Experimental Design
- 3.3 SSL UCL/SCL FT Pipelines
- 3.4 Implementation Details
- 4 Results
- 4.1 Self-supervision Schemes vs. Baseline Comparison
- 4.2 Systematic Evaluation of Pipelines
- 4.3 Low Training Data Scenario
- 4.4 Qualitative Analysis
- 5 Conclusions
- References
- Skin_Hair Dataset: Setting the Benchmark for Effective Hair Inpainting Methods for Improving the Image Quality of Dermoscopic Images
- 1 Introduction
- 2 Related Work
- 3 Skin_Hair Dataset
- 4 Effective Hair Inpainting Algorithms
- 4.1 Navier-Stokes
- 4.2 Telea
- 4.3 Hair_SinGAN Architecture
- 4.4 R-MNet Method
- 5 Result Analysis
- 6 Conclusions
- References
- FairDisCo: Fairer AI in Dermatology via Disentanglement Contrastive Learning
- 1 Introduction
- 2 Related Works
- 2.1 Skin Lesion Diagnosis
- 2.2 Fairness
- 3 Methodology
- 3.1 Proposed FairDisCo Model
- 3.2 An Investigation for Three Approaches
- 4 Experiments
- 4.1 Results on the Fitzpatrick17k Dataset
- 4.2 Results on the DDI Dataset
- 4.3 Loss Analysis
- 5 Conclusion
- References
- CIRCLe: Color Invariant Representation Learning for Unbiased Classification of Skin Lesions
- 1 Introduction
- 2 Method
- 2.1 Problem Definition
- 2.2 Feature Extractor and Classifier
- 2.3 Regularization Network
- 3 Experiments
- 3.1 Dataset
- 3.2 Implementation Details
- 3.3 Metrics
- 3.4 Models
- 4 Results and Analysis
- 4.1 Classification and Fairness Performance
- 4.2 Domain Adaptation Performance
- 4.3 Classification Performance Relation with Training Size
- 5 Discussion and Future Work
- 6 Conclusion
- References
- W12 - Cross-Modal Human-Robot Interaction
- W12 - Cross-Modal Human-Robot Interaction
- Distinctive Image Captioning via CLIP Guided Group Optimization
- 1 Introduction
- 2 Related Work
- 2.1 Image Captioning
- 2.2 Objectives for Image Captioning
- 2.3 Metrics for Distinctive Image Captioning
- 3 Methodology
- 3.1 Similar Image Group
- 3.2 Metrics
- 3.3 Group Embedding Gap Reward
- 4 Experiments
- 4.1 Implementation Details
- 4.2 Main Results
- 4.3 Comparison with State-of-the-Art
- 4.4 Ablation Study
- 4.5 User Study
- 4.6 Qualitative Results
- 5 Conclusions
- References
- W13 - Text in Everything
- W13 - Text in Everything
- OCR-IDL: OCR Annotations for Industry Document Library Dataset
- 1 Introduction
- 2 Related Work
- 3 OCR-IDL Dataset
- 3.1 Data Collection
- 3.2 Comparison to Existing Datasets
- 3.3 Dataset Statistics
- 4 Conclusion
- References
- Self-paced Learning to Improve Text Row Detection in Historical Documents with Missing Labels
- 1 Introduction
- 2 Related Work
- 3 Method
- 4 Experiments
- 4.1 Data Sets
- 4.2 Evaluation Setup
- 4.3 Results
- 5 Conclusion
- References
- On Calibration of Scene-Text Recognition Models
- 1 Introduction
- 2 Related Work
- 3 Background
- 4 Sequence-Level Calibration
- 5 Experiments
- 5.1 Experimental Setup
- 5.2 Results and Analysis
- 6 Conclusion
- References
- End-to-End Document Recognition and Understanding with Dessurt
- 1 Introduction
- 2 Related Work
- 2.1 LayoutLM Family
- 2.2 End-to-End Models
- 3 Model
- 4 Pre-training Procedure
- 4.1 IIT-CDIP Dataset
- 4.2 Synthetic Wikipedia
- 4.3 Synthetic Handwriting
- 4.4 Synthetic Forms
- 4.5 Distillation
- 4.6 Training
- 5 Experiments
- 5.1 RVL-CDIP
- 5.2 DocVQA and HW-SQuAD
- 5.3 FUNSD and NAF
- 5.4 IAM Database
- 5.5 Ablation
- 6 Conclusion
- References
- Task Grouping for Multilingual Text Recognition
- 1 Introduction
- 2 Related Work
- 2.1 Multilingual Text Spotting
- 2.2 Multitask Learning and Grouping
- 3 Methodology
- 3.1 Grouping Module
- 3.2 Integrated Loss
- 3.3 Integrated Loss with a Base Loss Coefficient
- 3.4 Grouping Loss
- 4 Experimentals
- 4.1 Datasets
- 4.2 Model Training
- 4.3 Task Grouping Results
- 4.4 Ablation Study
- 4.5 Task Assignment on Models with Different Hyper-parameters
- 4.6 E2E Text Recognition
- 5 Conclusions
- References
- Incorporating Self-attention Mechanism and Multi-task Learning into Scene Text Detection
- 1 Introduction
- 2 Related Work
- 2.1 Mask R-CNN
- 2.2 Attention Based Methods
- 2.3 Multi-task Learning Based Methods
- 3 Methodology
- 3.1 Self-attention Mechanism-based Backbone
- 3.2 Multi-task Cascade Refinement Text Detection
- 4 Experiments
- 4.1 Experiment Setup
- 4.2 Main Results
- 4.3 Ablation Studies
- 4.4 Visualization
- 4.5 Inference Speed
- 4.6 Error Analysis
- 5 Conclusion
- References
- Doc2Graph: A Task Agnostic Document Understanding Framework Based on Graph Neural Networks
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Documents Graph Structure
- 3.2 Node and Edge Features
- 3.3 Architecture
- 4 Experiments and Results
- 4.1 Proposed Model
- 4.2 FUNSD
- 4.3 RVL-CDIP Invoices
- 5 Conclusion
- References
- MUST-VQA: MUltilingual Scene-Text VQA
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Task Definition
- 3.2 Visual Encoder
- 3.3 Textual Encoders
- 3.4 Baselines
- 4 Experiments
- 4.1 Implementation Details
- 4.2 TextVQA Results
- 4.3 ST-VQA Results
- 5 Analysis
- 6 Conclusions and Future Work
- References
- Out-of-Vocabulary Challenge Report
- 1 Introduction
- 2 Related Work
- 3 Competition Protocol
- 4 The OOV Dataset
- 4.1 Dataset Analysis
- 5 The OOV Challenge
- 5.1 Task 1
- 5.2 Task 2
- 5.3 Baselines
- 5.4 Evaluation Metrics
- 6 Results
- 6.1 Task 1
- 6.2 Task 2
- 7 Analysis
- 7.1 Task 1
- 7.2 Task 2
- 8 Conclusion and Future Work
- References
- W14 - BioImage Computing
- W14 - BioImage Computing
- Towards Structured Noise Models for Unsupervised Denoising
- 1 Introduction
- 2 Related Work
- 2.1 Self- And Unsupervised Methods for Removing Structured Noise
- 2.2 Noise Modelling
- 3 Background
- 3.1 DivNoising and HDN
- 3.2 Pixel-Independent Noise Models
- 3.3 Signal-Independent Noise Models (a Simplification)
- 3.4 Deep Autoregressive Models
- 4 Methods
- 5 Experiments
- 5.1 Synthetic Noise Datasets
- 5.2 Photoacoustic Dataset
- 5.3 Training the Noise Model
- 5.4 Training HDN
- 5.5 Denoising with Autoregressive Noise Models
- 5.6 Evaluating the Noise Model
- 5.7 Choice of Autoregressive Pixel Ordering
- 6 Conclusion
- References
- Comparison of Semi-supervised Learning Methods for High Content Screening Quality Control
- 1 Introduction
- 2 Related Work
- 2.1 Transfer Learning
- 2.2 Self-supervised Learning
- 3 Method
- 3.1 Data
- 3.2 Encoder Training
- 3.3 Downstream Classification Tasks
- 3.4 Evaluation Criteria
- 4 Results
- 4.1 Evaluations on Downstream Tasks
- 4.2 Effect of a Decreasing Amount of Annotated Data
- 4.3 Effect of a Domain Shift
- 5 Conclusion
- References
- Discriminative Attribution from Paired Images
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Creation of Counterfactuals
- 3.2 Discriminative Attribution
- 3.3 Evaluation of Attribution Maps
- 4 Experiments
- 5 Discussion
- References
- Learning with Minimal Effort: Leveraging in Silico Labeling for Cell and Nucleus Segmentation*-4pt
- 1 Introduction
- 2 Materials and Methods
- 2.1 Image Acquisition
- 2.2 Nucleus Segmentation
- 2.3 Cell Segmentation
- 3 Results
- 3.1 Evaluation Metrics
- 3.2 Nucleus Segmentation
- 3.3 Cell Segmentation
- 4 Discussion
- 5 Conclusion
- References
- Towards Better Guided Attention and Human Knowledge Insertion in Deep Convolutional Neural Networks
- 1 Introduction
- 2 Related Work
- 3 Methods
- 3.1 Multi-Scale Attention Branch Network
- 3.2 Puzzle Module to Improve Fine-grained Recognition
- 3.3 Embedding Human Knowledge with Copy-Replace Augmentation
- 4 Experiments
- 4.1 Image Classification
- 4.2 Fine-grained Recognition
- 4.3 Attention Editing Performance
- 5 Conclusion
- References
- Characterization of AI Model Configurations for Model Reuse
- 1 Introduction
- 2 Methods
- 3 Experimental Results
- 4 Discussion
- 5 Summary
- 6 Disclaimer
- References
- Empirical Evaluation of Deep Learning Approaches for Landmark Detection in Fish Bioimages
- 1 Introduction
- 2 Related Work
- 3 Dataset Description
- 3.1 Zebrafish Microscropy Dataset
- 3.2 Medaka Microscopy Dataset
- 3.3 Seabream Radiography Dataset
- 4 Method Description
- 4.1 Direct Coordinates Regression
- 4.2 Heatmap-Based Regression
- 4.3 Training and Prediction Phases
- 4.4 Network Architectures
- 4.5 Experimental Protocol and Implementation
- 5 Results and Discussion
- 6 Conclusions
- References
- PointFISH: Learning Point Cloud Representations for RNA Localization Patterns
- 1 Introduction
- 2 Related Work
- 3 Problem Statement
- 4 PointFISH
- 4.1 Input Preparation
- 4.2 Model Architecture
- 5 Experiment
- 5.1 Training on Simulated Patterns
- 5.2 Analysis of the Embeddings Provided by PointFISH
- 5.3 Ablation Studies
- 6 Discussion
- 7 Conclusion
- References
- N2V2 - Fixing Noise2Void Checkerboard Artifacts with Modified Sampling Strategies and a Tweaked Network Architecture
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 A Modified Network Architecture for N2V2
- 3.2 New Sampling Strategies to Cover Blind-Spots
- 4 Evaluation
- 4.1 Dataset Descriptions and Training Details
- 4.2 Evaluation Metrics
- 4.3 Results on Mouse SP3, SP6, and SP12 (Salt&Pepper Noise)
- 4.4 Evaluation Flywing G70, Mouse G20, BSD68
- 4.5 Evaluation of Real Noisy Data: Convallaria_95 and Convallaria_1
- 5 Discussion and Conclusions
- References
- W15 - Visual Object-Oriented Learning Meets Interaction: Discovery, Representations, and Applications
- W15 - Visual Object-Oriented Learning Meets Interaction: Discovery, Representations, and Applications
- Object Detection in Aerial Images with Uncertainty-Aware Graph Network
- 1 Introduction
- 2 Related Works
- 3 Method
- 3.1 Uncertainty-Aware Initial Object Detection
- 3.2 Uncertainty-Based Spatial-Semantic Graph Generation
- 3.3 Feature Refinement via GNNs with Spatial-Semantic Graph
- 3.4 Final Detection Pipeline and Training Losses
- 4 Experiments
- 4.1 Datasets
- 4.2 Experimental Setups
- 4.3 Quantitative Results
- 4.4 Analyses
- 5 Conclusion
- References
- W16 - AI for Creative Video Editing and Understanding
- W16 - AI for Creative Video Editing and Understanding
- STC: Spatio-Temporal Contrastive Learning for Video Instance Segmentation
- 1 Introduction
- 2 Related Works
- 2.1 Instance Segmentation
- 2.2 Video Instance Segmentation
- 2.3 Contrastive Learning
- 3 Method
- 3.1 Mask Generation for Still-Image
- 3.2 Proposed Framework for VIS
- 3.3 Spatio-Temporal Contrastive Learning
- 3.4 Temporal Consistency
- 3.5 Training and Inference
- 4 Experiments
- 4.1 Dataset
- 4.2 Metrics
- 4.3 Implementation Details
- 4.4 Main Results
- 4.5 Ablation Studies
- 4.6 Visualizations
- 5 Conclusion
- References
- Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks
- 1 Introduction
- 2 Related Work
- 3 Formulation
- 4 Evaluation Benchmark
- 4.1 Crafting Evaluation Datasets
- 4.2 Evaluation of Existing Methods
- 5 Method
- 5.1 Spatial-aware Multi-Aspect Debiasing
- 5.2 Exploiting Web Media with OmniDebias
- 6 Experiments
- 6.1 Experiment Setting
- 6.2 Main Results
- 6.3 Spatial-aware Multi-Aspect Debiasing
- 6.4 Exploiting Web Media with OmniDebias
- 7 Conclusion
- References
- SegTAD: Precise Temporal Action Detection via Semantic Segmentation
- 1 Introduction
- 2 Related Work
- 2.1 Temporal Action Detection
- 2.2 Object Detection and Semantic Segmentation
- 3 Proposed SegTAD
- 3.1 Problem Formulation and SegTAD Framework
- 3.2 1D Semantic Segmentation Network
- 3.3 Proposal Detection Network
- 3.4 Training and Inference
- 4 Experimental Results
- 4.1 Datasets and Implementation Details
- 4.2 Comparison to State-of-the-Art
- 4.3 Ablation Study
- 4.4 Visualization of Segmentation Output
- 5 Conclusion
- References
- Text-Driven Stylization of Video Objects
- 1 Introduction
- 2 Related Work
- 2.1 Video Editing
- 2.2 Text-Based Stylization
- 3 Method
- 3.1 CLIP
- 3.2 Neural Layered Atlases (NLA)
- 3.3 Our Stylization Pipeline
- 4 Experiments
- 4.1 Varied Stylizations
- 4.2 Prompt Specificity
- 4.3 Text Augmentation
- 4.4 Ablation Study
- 4.5 Limitations
- 5 Conclusion
- References
- MND: A New Dataset and Benchmark of Movie Scenes Classified by Their Narrative Function
- 1 Introduction
- 1.1 Background
- 1.2 Research Objectives and Contributions
- 2 Background and Related Work
- 2.1 Introduction to Story Models
- 2.2 Related Datasets and Research
- 3 The Story Model, 15 Story Beats
- 4 The MND Dataset
- 4.1 Data Collection - Movies and Scenes
- 4.2 Collecting Story Beats Labels for the Scenes
- 4.3 Dataset Analytics
- 5 A MND Task: Movie Scenes Classification by Their Narrative Function
- 5.1 Data Pre-processing
- 5.2 Feature Engineering
- 5.3 Baseline Approach
- 5.4 Classification Experiment and Baseline Results
- 6 Conclusion and Future Research
- References
- Are All Combinations Equal? Combining Textual and Visual Features with Multiple Space Learning for Text-Based Video Retrieval
- 1 Introduction
- 2 Related Work
- 3 Proposed Approach
- 3.1 Overall Architecture
- 3.2 Multiple Space Learning
- 3.3 Dual Softmax Inference
- 3.4 Specifics of Textual Information Processing
- 3.5 Specifics of Visual Information Processing
- 4 Experimental Results
- 4.1 Datasets and Experimental Setup
- 4.2 Results and Comparisons
- 4.3 Ablation Study
- 5 Conclusions
- References
- Scene-Adaptive Temporal Stabilisation for Video Colourisation Using Deep Video Priors
- 1 Introduction
- 2 Related Work
- 2.1 Video Colourisation
- 2.2 Deep Video Prior
- 2.3 Few-Shot Learning
- 3 Method
- 3.1 Extension of DVP to Multiple Scenes
- 3.2 Few-Shot Training Strategy
- 3.3 Network Architecture
- 4 Experiments
- 4.1 Training Strategy
- 4.2 Evaluation Metrics
- 4.3 Results
- 4.4 Ablations
- 5 Conclusions
- References
- Movie Lens: Discovering and Characterizing Editing Patterns in the Analysis of Short Movie Sequences
- 1 Introduction
- 2 Related Works
- 3 Data
- 4 Methodology
- 4.1 Label Estimation Phase
- 4.2 Editing Patterns Analysis
- 4.3 Technical Details
- 5 Preliminary Results
- 5.1 Character-Environment Relationship
- 5.2 Environment Description
- 5.3 Character-Character Interaction
- 5.4 Undefined Classes
- 5.5 Misclassified Sequences
- 6 Discussion
- References
- W17 - Visual Inductive Priors for Data-Efficient Deep Learning
- W17 - Visual Inductive Priors for Data-Efficient Deep Learning
- SKDCGN: Source-free Knowledge Distillation of Counterfactual Generative Networks Using cGANs
- 1 Introduction
- 2 Related Work
- 3 Approach
- 3.1 SKDCGN
- 4 Experiments and Results
- 4.1 Datasets
- 4.2 Baseline Model: CGN with Generator Replaced by TinyGAN Generator
- 4.3 Results of SKDCGN
- 4.4 Improving the SKDCGN Model
- 4.5 Additional Results: Study of the Shape IM
- 5 Discussion and Conclusion
- 6 Future Work
- References
- C-3PO: Towards Rotation Equivariant Feature Detection and Description
- 1 Introduction
- 2 Background
- 2.1 Theory
- 2.2 Related Work
- 3 Methodology
- 4 Experiments
- 5 Conclusion
- References
- Towards Flexible Inductive Bias via Progressive Reparameterization Scheduling
- 1 Introduction
- 2 Related Work
- 2.1 Convolution Neural Networks
- 2.2 Vision Transformers
- 2.3 Vision Transformers and Convolutions
- 3 Preliminaries
- 4 Inductive Bias Analysis of Various Architectures
- 4.1 Our Hypothesis
- 4.2 Data Scale Experiment
- 4.3 Fourier Analysis
- 5 Reparameterization Can Interpolate Inductive Biases
- 5.1 Experimental Settings
- 5.2 Interpolation of Convolutional Inductive Bias
- 6 Progressive Reparameterization Scheduling
- 7 Conclusion
- References
- Zero-Shot Image Enhancement with Renovated Laplacian Pyramid*-4pt
- 1 Introduction
- 2 Related Work
- 2.1 Traditional Signal Processing Method and Deep Learning
- 2.2 Laplacian Pyramid and Image Restoration
- 3 Multiscale Laplacian Enhancement for Image Manipulation
- 3.1 Formulation of Multiscale Laplacian Enhancement
- 3.2 Internal Results of Multiscale Laplacian Enhancement
- 3.3 Comparison with Unsharp Masking Filter
- 3.4 Ablation Study of MLE
- 3.5 Application of MLE to Underwater Images
- 4 Zero-Shot Attention Network with Multiscale Laplacian Enhancement (ZA-MLE)
- 4.1 Process of ZA-MLE
- 4.2 Loss Function
- 5 Experiment
- 5.1 Experimental Setting
- 5.2 Results and Discussions of ZA-MLE
- 5.3 Ablation Study of Loss Function
- 6 Conclusion
- References
- Beyond a Video Frame Interpolator: A Space Decoupled Learning Approach to Continuous Image Transition
- 1 Introduction
- 2 Related Work
- 2.1 Video Frame Interpolation (VFI)
- 2.2 Continuous Image Transition (CIT)
- 3 Proposed Method
- 3.1 Problem Formulation
- 3.2 Space Decoupled Learning
- 3.3 Training Strategy
- 4 Experiments and Applications
- 4.1 Datasets and Training Settings for VFI
- 4.2 Comparisons with State-of-the-Arts
- 4.3 Ablation Experiments
- 4.4 Applications Beyond VFI
- 5 Conclusion
- References
- Diversified Dynamic Routing for Vision Tasks
- 1 Introduction
- 2 Related Work
- 3 DivDR: Diversified Dynamic Routing
- 3.1 Dynamic Routing Preliminaries
- 3.2 Metric Learning in A-space
- 4 Experiments
- 4.1 Datasets
- 4.2 Implementation Details
- 4.3 Semantic Segmentation
- 4.4 Object Detection and Instance Segmentation
- 5 Discussion and Future Work
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.