
Computer Vision - ECCV 2022 Workshops
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
The 367 full papers included in this volume set were carefully reviewed and selected for inclusion in the ECCV 2022 workshop proceedings. They were organized in individual parts as follows:
Part I:
W01 - AI for Space; W02 - Vision for Art; W03 - Adversarial Robustness in the Real World; W04 - Autonomous Vehicle Vision
Part II: W05 - Learning With Limited and Imperfect Data; W06 - Advances in Image Manipulation;
Part III: W07 - Medical Computer Vision; W08 - Computer Vision for Metaverse; W09 - Self-Supervised Learning: What Is Next?;
Part IV: W10 - Self-Supervised Learning for Next-Generation Industry-LevelAutonomous Driving; W11 - ISIC Skin Image Analysis; W12 - Cross-Modal Human-Robot Interaction; W13 - Text in Everything; W14 - BioImage Computing; W15 - Visual Object-Oriented Learning Meets Interaction: Discovery, Representations, and Applications; W16 - AI for Creative Video Editing and Understanding; W17 - Visual Inductive Priors for Data-Efficient Deep Learning; W18 - Mobile Intelligent Photography and Imaging;
Part V: W19 - People Analysis: From Face, Body and Fashion to 3D Virtual Avatars; W20 - Safe Artificial Intelligence for Automated Driving; W21 - Real-World Surveillance: Applications and Challenges; W22 - Affective Behavior Analysis In-the-Wild;
Part VI : W23 - Visual Perception for Navigation in Human Environments: The JackRabbot Human Body Pose Dataset and Benchmark; W24 - Distributed Smart Cameras; W25 - Causality in Vision; W26 - In-Vehicle Sensing and Monitorization; W27 - Assistive Computer Vision and Robotics; W28 - Computational Aspectsof Deep Learning;
Part VII: W29 - Computer Vision for Civil and Infrastructure Engineering; W30 - AI-Enabled Medical Image Analysis: Digital Pathology and Radiology/COVID19; W31 - Compositional and Multimodal Perception;
Part VIII: W32 - Uncertainty Quantification for Computer Vision; W33 - Recovering 6D Object Pose; W34 - Drawings and Abstract Imagery: Representation and Analysis; W35 - Sign Language Understanding; W36 - A Challenge for Out-of-Distribution Generalization in Computer Vision; W37 - Vision With Biased or Scarce Data; W38 - Visual Object Tracking Challenge.
More details
Other editions
Additional editions

Content
- Intro
- Foreword
- Preface
- Organization
- Contents - Part V
- W18 - Challenge on Mobile Intelligent Photography and Imaging
- W18 - Challenge on Mobile Intelligent Photography and Imaging
- MIPI 2022 Challenge on RGB+ToF Depth Completion: Dataset and Report*-4pt
- 1 Introduction
- 2 Challenge
- 2.1 Problem Definition
- 2.2 Dataset: TetrasRGBD
- 2.3 Challenge Phases
- 2.4 Scoring System
- 2.5 Running Time Evaluation
- 3 Challenge Results
- 4 Challenge Methods
- 4.1 ZoomNeXt
- 4.2 GAMEON
- 4.3 Singer
- 4.4 NPU-CVR
- 4.5 JingAM
- 4.6 Anonymous
- 4.7 MainHouse113
- 4.8 UCLA Vision Lab
- 5 Conclusions
- A Teams and Affiliations
- References
- MIPI 2022 Challenge on Quad-Bayer Re-mosaic: Dataset and Report
- 1 Introduction
- 2 Challenge
- 2.1 Problem Definition
- 2.2 Dataset: Tetras-Quad
- 2.3 Challenge Phases
- 2.4 Scoring System
- 3 Challenge Results
- 4 Challenge Methods
- 4.1 MegNR
- 4.2 IMEC-IPI & NPU-MPI
- 4.3 HITZST01
- 4.4 BITSpectral
- 4.5 JHC-SJTU
- 4.6 Op-summer-po
- 5 Conclusions
- A Teams and Affiliations
- References
- MIPI 2022 Challenge on RGBW Sensor Re-mosaic: Dataset and Report
- 1 Introduction
- 2 Challenge
- 2.1 Problem Definition
- 2.2 Dataset: Tetras-RGBW-RMSC
- 2.3 Challenge Phases
- 2.4 Scoring System
- 3 Challenge Results
- 4 Challenge Methods
- 4.1 Op-summer-po
- 4.2 HIT-IIL
- 4.3 Eating, Drinking and Playing
- 5 Conclusions
- A Teams and Affiliations
- References
- MIPI 2022 Challenge on RGBW Sensor Fusion: Dataset and Report
- 1 Introduction
- 2 Challenge
- 2.1 Problem Definition
- 2.2 Dataset: Tetras-RGBW-Fusion
- 2.3 Challenge Phases
- 2.4 Scoring System
- 3 Challenge Results
- 4 Challenge Methods
- 4.1 BITSpectral
- 4.2 BIVLab
- 4.3 HIT-IIL
- 4.4 Jzsherlock
- 4.5 LLCKP
- 4.6 MegNR
- 5 Conclusions
- A Teams and Affiliations
- References
- MIPI 2022 Challenge on Under-Display Camera Image Restoration: Methods and Results
- 1 Introduction
- 2 MIPI 2022 Under-Display Camera Image Restoration
- 2.1 Datasets
- 2.2 Evaluation
- 2.3 Challenge Phase
- 3 Challenge Results
- 4 Challenge Methods and Teams
- 5 Conclusions
- References
- Continuous Spectral Reconstruction from RGB Images via Implicit Neural Representation
- 1 Introduction
- 2 Related Work
- 2.1 Implicit Neural Representation
- 2.2 Spectral Reconstruction from RGB Images
- 3 Neural Spectral Reconstruction
- 3.1 Overview
- 3.2 Spectral Profile Interpolation
- 3.3 Neural Attention Mapping
- 3.4 Loss Function
- 4 Experiments
- 4.1 Comparison to State-of-the-Art Methods
- 4.2 Continuous Spectral Reconstruction
- 4.3 Extreme Spectral Reconstruction
- 4.4 Spectral Super Resolution
- 4.5 Ablation Studies
- 5 Conclusion
- References
- Event-Based Image Deblurring with Dynamic Motion Awareness
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 System Overview
- 3.2 Deblur Module
- 3.3 Multi-scale Coarse-to-Fine Approach
- 3.4 Loss Function
- 4 RGBlur+E Dataset
- 5 Experiments
- 5.1 Experimental Settings
- 5.2 Ablation of the Effectiveness of the Components
- 5.3 Comparison with State-of-the-Art Models
- 6 Conclusions
- References
- UDC-UNet: Under-Display Camera Image Restoration via U-shape Dynamic Network
- 1 Introduction
- 2 Related Work
- 2.1 UDC Restoration
- 2.2 Image Restoration
- 3 Methodology
- 3.1 Problem Formulation
- 3.2 Network Structure
- 3.3 Loss Function
- 4 Experiments
- 4.1 Experimental Setup
- 4.2 Ablation Study
- 4.3 Comparison with State-of-the-Art Methods
- 5 Results of the MIPI UDC Image Restoration Challenge
- 6 Conclusions
- References
- Enhanced Coarse-to-Fine Network for Image Restoration from Under-Display Cameras
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Enhanced Encoder
- 3.2 Enhanced Decoder
- 3.3 Cross-Gating Fusion Module
- 3.4 Loss Functions
- 4 Experiments
- 4.1 Implementation Details
- 4.2 Dataset
- 4.3 Evaluation Metrics
- 4.4 Comparations
- 4.5 Ablation Study
- 5 Conclusions
- References
- Learning to Joint Remosaic and Denoise in Quad Bayer CFA via Universal Multi-scale Channel Attention Network
- 1 Introduction
- 2 Related Work
- 2.1 Color Filter Arrays
- 2.2 Denoise Raw Images
- 3 Proposed Method
- 3.1 Problem Formulation
- 3.2 Network Structure
- 3.3 Overall Framework
- 3.4 Loss Function
- 4 Experiment
- 4.1 Datasets
- 4.2 Evaluation Metrics
- 4.3 Implementation Details
- 4.4 Testing Results of MIPI 2022 Challenge on Quad Joint Remosaic and Denoise
- 4.5 Ablation Study
- 4.6 Qualitative Evaluation
- 5 Conclusion
- References
- Learning an Efficient Multimodal Depth Completion Model
- 1 Introduction
- 2 Related Work
- 2.1 Unguided Depth Completion
- 2.2 RGB-Guided Depth Completion
- 3 Methods
- 3.1 Global and Local Depth Prediction with Fusion
- 3.2 Funnel Convolutional Spatial Propagation Network
- 3.3 Loss Functions
- 4 Experiments
- 4.1 Datasets and Metrics
- 4.2 Implementational Details
- 4.3 Experimental Results
- 5 Conclusions
- References
- Learning Rich Information for Quad Bayer Remosaicing and Denoising
- 1 Introduction
- 2 Related Works
- 2.1 Denoising
- 2.2 ISP and Demosaicing
- 3 Proposed Method
- 3.1 Quad Bayer Pre-processing
- 3.2 Network for Jointly Remosaicing and Denoising
- 3.3 Two-Stage Training Strategy
- 4 Experimental Results
- 4.1 Dataset
- 4.2 Implementation Details
- 4.3 Evaluation Metrics
- 4.4 Ablation Study
- 4.5 Model Complexity and Runtime
- 4.6 Challenge Submission
- 4.7 Limitations
- 5 Conclusions
- References
- Depth Completion Using Laplacian Pyramid-Based Depth Residuals
- 1 Introduction
- 2 Related Work
- 3 Our Approach
- 3.1 Network Atructure
- 3.2 Global-Local Refinement Network
- 3.3 Affinity Decay Spatial Propagation Network
- 3.4 The Training Loss
- 4 Experiments
- 4.1 Implementation Details
- 4.2 Datasets
- 4.3 Metrics
- 4.4 Evaluation on KITTI Dataset
- 4.5 Ablation Study on ToF Synthetic Dataset
- 5 Conclusion
- References
- W19 - Challenge on People Analysis: From Face, Body and Fashion to 3D Virtual Avatars
- W19 - Challenge on People Analysis: From Face, Body and Fashion to 3D Virtual Avatars
- PSUMNet: Unified Modality Part Streams Are All You Need for Efficient Pose-Based Action Recognition
- 1 Introduction
- 2 Related Work
- 3 Methodology
- 3.1 Part Stream Factorization
- 3.2 PSUMNet
- 3.3 Multi Modality Data Generator (MMDG)
- 3.4 Spatio Temporal Relational Module (STRM)
- 4 Experiments
- 4.1 Datasets
- 4.2 Implementation and Optimization Details
- 4.3 Results
- 4.4 Analysis
- 4.5 Ablations
- 5 Conclusion
- References
- YOLO5Face: Why Reinventing a Face Detector
- 1 Introduction
- 2 Related Work
- 2.1 Object Detection
- 2.2 Face Detection
- 2.3 YOLO
- 3 YOLO5Face Face Detector
- 3.1 Network Architecture
- 3.2 Landmark Regression
- 3.3 Stem Block Structure
- 3.4 SPP with Smaller Kernels
- 3.5 P6 Output Block
- 3.6 ShuffleNetV2 as Backbone
- 4 Experiments
- 4.1 Dataset
- 4.2 Implementation Details
- 4.3 Ablation Study
- 4.4 YOLO5Face for Face Recognition
- 4.5 YOLO5Face on WiderFace Dataset
- 4.6 YOLO5Face on FDDB Dataset
- 5 Conclusion
- References
- Counterfactual Fairness for Facial Expression Recognition
- 1 Introduction
- 2 Literature Review
- 2.1 Fairness in Machine Learning
- 2.2 Facial Affect Fairness
- 2.3 Counterfactuals and Bias
- 3 Methodology
- 3.1 Notation and Problem Definition
- 3.2 Counterfactual Fairness
- 3.3 Counterfactual Image Generation
- 3.4 Baseline Approach
- 3.5 Pre-processing: Data Augmentation with Counterfactuals
- 3.6 In-processing: Contrastive Counterfactual Fairness
- 3.7 Post-processing: Reject Option Classification
- 4 Experimental Setup
- 4.1 Dataset
- 4.2 Implementation and Training Details
- 4.3 Evaluation Measures
- 5 Results
- 5.1 An Analysis of Dataset Bias and Counterfactual Bias
- 5.2 Bias Mitigation Results with Counterfactual Images
- 6 Conclusion and Discussion
- References
- Improved Cross-Dataset Facial Expression Recognition by Handling Data Imbalance and Feature Confusion
- 1 Introduction
- 2 Related Work
- 3 Problem Definition and Notations
- 4 Proposed Approach
- 4.1 Handling Data Imbalance in DIFC
- 4.2 Handling Feature Confusion in DIFC
- 4.3 Choosing Baseline UDA Approach
- 4.4 Integrating DIFC with Baseline UDA
- 5 Experiments
- 5.1 Datasets Used
- 5.2 Implementation Details
- 5.3 Results on Benchmark Datasets
- 5.4 Additional Analysis
- 6 Conclusion
- References
- Video-Based Gait Analysis for Spinal Deformity
- 1 Introduction
- 1.1 State-of-the-Art Review
- 1.2 Motivation and Contribution
- 2 Proposed System
- 2.1 Pose Estimation
- 2.2 BiLSTM-Based Network
- 3 Spinal Deformity Dataset
- 4 Results
- 5 Conclusions
- References
- TSCom-Net: Coarse-to-Fine 3D Textured Shape Completion Network
- 1 Introduction
- 2 Related Works
- 3 Proposed Approach - TSCom-Net
- 3.1 Joint Shape and Texture Completion
- 3.2 Texture Refinement
- 4 Experimental Results
- 4.1 Network Training Details
- 4.2 Results and Evaluation
- 4.3 Ablation Study
- 5 Conclusion
- References
- .26em plus .1em minus .1emDeep Learning-Based Assessment of Facial Periodic Affect in Work-Like Settings
- 1 Introduction
- 2 A Protocol for Human Facial Behaviour Data Acquisition in Work-Like Settings
- 3 The WorkingAge Facial Behaviour Dataset
- 4 Periodical Facial Affect Recognition
- 5 Experiments
- 5.1 Experimental Setup
- 5.2 Baseline Results of Leave-One-Site-Out Cross-Validation
- 5.3 Ablation Studies
- 6 Conclusion
- References
- Supervision by Landmarks: An Enhanced Facial De-occlusion Network for VR-Based Applications
- 1 Introduction
- 2 Related Work
- 2.1 Facial De-occlusion and HMD Removal Methods
- 2.2 Structure-Guided Image Inpainting
- 3 Proposed Method
- 3.1 The Architecture
- 3.2 Spatial Supervision Using Landmarks
- 3.3 Loss Functions
- 4 Experiments and Results
- 4.1 Dataset and Training Settings
- 4.2 Results
- 5 Ablation Studies
- 6 Conclusion
- References
- Consistency-Based Self-supervised Learning for Temporal Anomaly Localization
- 1 Introduction
- 2 Related Work
- 3 Proposed Method
- 3.1 Model
- 3.2 Training Objective
- 3.3 Temporal Proposal
- 4 Experiments
- 5 Conclusions
- References
- Perspective Reconstruction of Human Faces by Joint Mesh and Landmark Regression*-4pt
- 1 Introduction
- 2 Our Method
- 2.1 3D Face Geometric Reconstruction
- 2.2 6DoF Estimation
- 3 Experimental Results
- 3.1 Dataset
- 3.2 Evaluation Metrics
- 3.3 Implementation Details
- 3.4 Ablation Study
- 3.5 Benchmark Results
- 3.6 Result Visualization
- 4 Conclusion
- References
- Pixel2ISDF: Implicit Signed Distance Fields Based Human Body Model from Multi-view and Multi-pose Images
- 1 Introduction
- 2 Related Work
- 3 Methodology
- 3.1 Multi-view and Multi-pose Image Feature Encoding
- 3.2 Structured Latent Codes
- 3.3 Implicit Neural Shape Field
- 3.4 Loss Function
- 4 Experiments
- 4.1 Datasets
- 4.2 Image Preprocessing
- 4.3 SMPLX Estimation and Optimization
- 4.4 Implementation Details
- 4.5 Results
- 5 Conclusion
- References
- UnconFuse: Avatar Reconstruction from Unconstrained Images*-6pt
- 1 Method
- 1.1 Data Preprocessing
- 1.2 Model Design and Configuration
- 2 Results
- 3 Discussion
- References
- HiFace: Hybrid Task Learning for Face Reconstruction from Single Image
- 1 Introduction
- 2 Related Work
- 2.1 3DMM Based
- 2.2 Model-Free
- 3 Method
- 3.1 Architecture
- 3.2 Vertex Refinement Module
- 3.3 Multi-task Learning Module
- 4 Experiments
- 4.1 Dataset
- 4.2 Implementation Details
- 4.3 Comparisons and Ablation Studies
- 4.4 Discussions
- 4.5 Conclusion
- References
- Multi-view Canonical Pose 3D Human Body Reconstruction Based on Volumetric TSDF
- 1 Introduction
- 2 Dataset
- 2.1 Data Analysis
- 2.2 Data Preprocessing
- 3 Method
- 3.1 Network Architecture
- 3.2 TSDF
- 3.3 Loss Function
- 3.4 Implementation Details
- 4 Experiment
- 5 Conclusion
- References
- End to End Face Reconstruction via Differentiable PnP
- 1 Introduction
- 2 Methodology
- 2.1 Data Preparation
- 2.2 Facial Landmark Detection
- 2.3 3D Face Reconstruction
- 2.4 PnPLoss
- 2.5 Inference Phase
- 3 Experiments
- 3.1 Experimental Details
- 3.2 Head Pose Estimation
- 3.3 Face Reconsturction
- 3.4 Finetuning with PnP Layer
- 3.5 Qualitative Results
- 4 Tricks
- 4.1 Flip and Merge
- 5 Conclusions
- References
- W20 - Safe Artificial Intelligence for Automated Driving
- W20 - Safe Artificial Intelligence for Automated Driving
- One Ontology to Rule Them All: Corner Case Scenarios for Autonomous Driving
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Master Ontology
- 3.2 Scenario Ontology Generation
- 3.3 Scenario Simulation
- 4 Evaluation
- 4.1 Scenario Ontologies
- 5 Conclusion
- References
- Parametric and Multivariate Uncertainty Calibration for Regression and Object Detection
- 1 Introduction
- 2 Definitions for Regression Calibration and Related Work
- 3 Joint Parametric Regression Calibration
- 4 Measuring Miscalibration
- 5 Experiments
- 6 Conclusion
- References
- Reliable Multimodal Trajectory Prediction via Error Aligned Uncertainty Optimization
- 1 Introduction
- 2 Related Work
- 3 Setup for Vehicle Trajectory Prediction
- 4 Error Aligned Uncertainty Optimization
- 5 Experiments and Results
- 5.1 Multimodal Vehicle Trajectory Prediction
- 5.2 UCI Regression
- 5.3 Hyperparameter Selection
- 6 Conclusions
- References
- PAI3D: Painting Adaptive Instance-Prior for 3D Object Detection
- 1 Introduction
- 2 Related Work
- 2.1 LiDAR Based 3D Object Detection
- 2.2 Multi-modal Fusion Based 3D Object Detection
- 3 Painting Adaptive Instance for Multi-modal 3D Object Detection
- 3.1 Instance Painter
- 3.2 Adaptive Projection Refiner
- 3.3 Additional Enhancements for 3D Object Detection
- 4 Experiments
- 4.1 The NuScenes Dataset
- 4.2 Training and Evaluation Settings
- 4.3 nuScenes Test Set Results
- 4.4 Ablation Study
- 5 Conclusions
- References
- Validation of Pedestrian Detectors by Classification of Visual Detection Impairing Factors
- 1 Introduction
- 2 Related Works
- 3 Methodology
- 3.1 Synthetic Data Generation
- 3.2 Visual Detection Impairing Factors
- 3.3 Classification of Detectable Pedestrians
- 4 Results and Discussion
- 4.1 Evaluation of Pedestrian Detection Data Biases
- 4.2 Detection Impact of Visual Impairment Factors
- 5 Conclusions
- References
- Probing Contextual Diversity for Dense Out-of-Distribution Detection
- 1 Introduction
- 2 Related Work
- 3 Multi-head Context Networks
- 3.1 Contextual Diversity in Semantic Segmentation Networks
- 3.2 Probing Contextual Diversity
- 3.3 Out-of-Distribution Detection with MOoSe
- 4 Experiments
- 4.1 Datasets and Benchmarks
- 4.2 Evaluation Metrics
- 4.3 Experimental Setup
- 4.4 Comparison with Ensembles
- 4.5 Comparison with the State of the Art
- 5 Analysis
- 5.1 Quantifying Diversity: Variance and Mutual Information
- 5.2 Context as a Source of Diversity
- 5.3 Effect of Contextual Diversity on OoD Detection
- 6 Conclusion
- References
- Adversarial Vulnerability of Temporal Feature Networks for Object Detection
- 1 Introduction
- 2 Related Work
- 2.1 Adversarial Attacks
- 2.2 Adversarial Training
- 3 Adversarial Attacks
- 3.1 Threat Model
- 3.2 Adversarial Training
- 4 Experimental Setup
- 4.1 Dataset
- 4.2 Baselines
- 4.3 Adversarial Noise Attack
- 4.4 Adversarial Patch Attack
- 4.5 Adversarial Training
- 5 Evaluation
- 5.1 Attacks on 1-Class Vs. on 2-Class Baselines
- 5.2 Impact of Temporal Horizon
- 5.3 Robustness of the Adversarially Trained Models
- 5.4 Robustness of the AT-Trained Models Against Per-instance Attacks
- 6 Conclusion
- References
- Towards Improved Intermediate Layer Variational Inference for Uncertainty Estimation
- 1 Introduction
- 2 Related Work
- 2.1 State-of-the-Art Uncertainty Estimation Approaches
- 2.2 Dirichlet Distributions for Uncertainty Estimation
- 2.3 Out-of-distribution Detection
- 3 Methodology
- 3.1 Dirichlet Models
- 3.2 Proposed Loss Function
- 4 Experiments and Results
- 4.1 OoD Methodology
- 4.2 Full Training with All Classes
- 5 Conclusion and Future Work
- References
- Explainable Sparse Attention for Memory-Based Trajectory Predictors
- 1 Introduction
- 2 Related Works
- 2.1 Trajectory Prediction
- 2.2 Memory and Attention
- 3 Memory-Based Trajectory Predictors
- 4 Method
- 4.1 Sparsemax vs Softmax
- 5 Experiments
- 5.1 Evaluation Metrics and Datasets
- 5.2 Results
- 5.3 Explainability
- 6 Conclusions
- References
- Cycle-Consistent World Models for Domain Independent Latent Imagination
- 1 Introduction
- 2 Preliminaries
- 3 Cycle-Consistent World Models
- 4 Related Work
- 5 Experiments
- 6 Conclusion
- References
- W21 - Real-World Surveillance: Applications and Challenges
- W21 - Real-World Surveillance: Applications and Challenges
- Strengthening Skeletal Action Recognizers via Leveraging Temporal Patterns
- 1 Introduction
- 2 Related Work
- 2.1 Recognizing Skeleton-Based Actions
- 2.2 Capturing Chronological Information
- 2.3 Noise Alleviation
- 3 Proposed Approach
- 3.1 Discrete Cosine Encoding
- 3.2 Chronological Loss Function
- 3.3 Framework
- 4 Experiments
- 4.1 Datasets
- 4.2 Experimental Setups
- 4.3 Ablation Studies
- 4.4 Improvement Analysis
- 4.5 Noise Alleviation
- 4.6 Compatibility with Existing Models
- 4.7 Comparison with SOTA Accuracy
- 5 Conclusion
- References
- Which Expert Knows Best? Modulating Soft Learning with Online Batch Confidence for Domain Adaptive Person Re-Identification
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Supervised Pre-training on Source Domain
- 3.2 Unsupervised Adaptation on Target Domain
- 4 Experiments
- 4.1 Datasets and Evaluation
- 4.2 Architectures and Setup
- 4.3 Comparison with State-of-the-Art Methods
- 4.4 Ablation Study
- 5 Conclusions
- References
- Cross-Modality Attention and Multimodal Fusion Transformer for Pedestrian Detection
- 1 Introduction
- 2 Related Work
- 2.1 Multimodal Pedestrian Detection
- 2.2 Multimodal Transformers
- 3 Proposed Method
- 3.1 Cross-Modality Attention Transformer
- 3.2 Multimodal Fusion
- 4 Experiments
- 4.1 Dataset and Implementation Details
- 4.2 Quantitative Results
- 4.3 Ablation Study
- 5 Conclusions
- References
- See Finer, See More: Implicit Modality Alignment for Text-Based Person Retrieval
- 1 Introduction
- 2 Related Work
- 3 Methodology
- 3.1 Overview
- 3.2 Unified Visual-Textual Network
- 3.3 Implicit Semantic Alignment
- 3.4 Loss Function
- 4 Experiment
- 4.1 Experimental Setup
- 4.2 Comparison with State-of-the-Art Methods
- 4.3 Ablation Study
- 4.4 Qualitative Results
- 4.5 Computational Efficiency Analysis
- 5 Discussion
- 6 Conclusion
- 7 Broader Impact
- References
- Look at Adjacent Frames: Video Anomaly Detection Without Offline Training
- 1 Introduction
- 2 Related Work
- 3 Proposed Solution
- 3.1 Workflow
- 3.2 Incremental Learner
- 4 Experiments
- 4.1 Performance
- 4.2 Further Studies
- 5 Conclusion
- References
- SOMPT22: A Surveillance Oriented Multi-pedestrian Tracking Dataset
- 1 Introduction
- 2 Related Work
- 2.1 MOT Methods
- 2.2 Datasets
- 2.3 Person Detection Datasets
- 3 Problem Description
- 4 SOMPT22 Dataset
- 4.1 Dataset Construction
- 4.2 Dataset Statistics
- 4.3 Evaluation Metrics
- 5 Experiments
- 5.1 Experiment Setup
- 5.2 Benchmark Results
- 6 Conclusion and Future Work
- References
- Detection of Fights in Videos: A Comparison Study of Anomaly Detection and Action Recognition
- 1 Introduction
- 2 Related Work
- 2.1 General Anomaly Detection
- 2.2 Fight Detection Using Action Recognition
- 3 Proposed Methods
- 3.1 Action Recognition of Fights
- 3.2 Anomaly Detection of Fights
- 3.3 Iterative Anomaly Detection and Action Recognition
- 4 Experiments
- 4.1 Datasets
- 4.2 Implementation Details
- 4.3 Action Recognition Results
- 4.4 Anomaly Detection or Action Recognition?
- 4.5 Iterative Anomaly Detection and Action Recognition
- 4.6 Comparison with SOTA Results
- 5 Conclusion
- References
- Privacy-Preserving Person Detection Using Low-Resolution Infrared Cameras
- 1 Introduction
- 2 Related Work
- 3 Person Detection with Different Levels of Supervision
- 3.1 Detection Through Thresholding
- 3.2 Unsupervised Anomaly-Based Detection Using Auto-encoders
- 3.3 Weakly-Supervised Detection Using Class Activation Mapping
- 3.4 Fully-Supervised Detection Using Single Shot Detectors
- 4 Experimental Methodology
- 4.1 Datasets
- 4.2 Implementation Details
- 4.3 Performance Metrics
- 5 Results and Discussion
- 6 Conclusions
- References
- Gait Recognition from Occluded Sequences in Surveillance Sites
- 1 Introduction
- 2 Related Work
- 3 Proposed Approach
- 3.1 Occlusion Detection Using VGG-16
- 3.2 Occlusion Reconstruction Through RGait-Net
- 4 Experimental Analysis
- 4.1 Training Details
- 4.2 Results
- 4.3 Comparative Analysis
- 5 Conclusions and Future Work
- References
- Visible-Infrared Person Re-Identification Using Privileged Intermediate Information
- 1 Introduction
- 2 Related Work
- 3 Proposed Method
- 3.1 LUPI Framework
- 3.2 Intermediate Domain Generation
- 3.3 Loss Functions
- 4 Results and Discussion
- 4.1 Experimental Methodology
- 4.2 Comparison with the State-of-Art
- 4.3 Ablation Study
- 5 Conclusions
- References
- Video in 10 Bits: Few-Bit VideoQA for Efficiency and Privacy
- 1 Introduction
- 2 Related Work
- 3 Few-Bit VideoQA
- 3.1 Problem Formulation
- 3.2 Approach: Task-Specific Feature Compression
- 4 Experiments
- 4.1 Few-Bit VideoQA Results
- 4.2 Qualitative Analysis
- 5 Applications of Few-Bit VideoQA
- 5.1 Tiny Datasets
- 5.2 Privacy Advantages from Tiny Features
- 6 Conclusion
- References
- ChaLearn LAP Seasons in Drift Challenge: Dataset, Design and Results
- 1 Introduction
- 2 Related Work
- 3 Challenge Design
- 3.1 The Dataset
- 3.2 Evaluation Protocol
- 3.3 The Baseline
- 4 Challenge Results and Winning Methods
- 4.1 The Leaderboard
- 4.2 Top-1: Team GroundTruth
- 4.3 Top-2: Team Heboyong
- 4.4 What Challenge the Models the Most?
- 5 Conclusions
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.