
Computer Vision - ECCV 2020
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
The 1360 revised papers presented in these proceedings were carefully reviewed and selected from a total of 5025 submissions. The papers deal with topics such as computer vision; machine learning; deep neural networks; reinforcement learning; object recognition; image classification; image processing; object detection; semantic segmentation; human pose estimation; 3d reconstruction; stereo vision; computational photography; neural networks; image coding; image reconstruction; object recognition; motion estimation.
More details
Other editions
Additional editions

Content
- Intro
- Foreword
- Preface
- Organization
- Contents - Part XIII
- Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards
- 1 Introduction
- 2 Related Work
- 3 The FAshion CAptioning Dataset
- 3.1 Data Collection, Labeling and Pre-Processing
- 3.2 Comparison with Other Datasets
- 4 Respecting Semantics for Fashion Captioning
- 4.1 Basic Problem Formulation
- 4.2 Attribute Embedding
- 4.3 Increasing the Accuracy of Captioning with Semantic Rewards
- 4.4 Joint Training of MLE and RL.
- 5 Experiments
- 5.1 Basic Setting
- 5.2 Performance Evaluations
- 6 Conclusion
- References
- Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder
- 1 Introduction
- 2 Related Works
- 3 Visually-Grounded Question Encoder (VGQE)
- 3.1 VGQE Cell
- 3.2 Using VGQE Cell to Encode the Question
- 3.3 Baseline VQA Architecture
- 4 Experiments and Results
- 4.1 Experimental Setup
- 4.2 Results
- 4.3 Performance of VGQE on Other Baselines
- 4.4 Performance of VGQE on the Standard VQAv2 Benchmark
- 5 Conclusion
- References
- Unsupervised Cross-Modal Alignment for Multi-person 3D Pose Estimation
- 1 Introduction
- 2 Related Work
- 3 Approaches
- 3.1 Architecture
- 3.2 Learning Cross-Modal Latent Space
- 3.3 Learning Beyond the Teacher Network
- 4 Experiments
- 4.1 Implementation Details
- 4.2 Ablation Studies
- 4.3 Datasets and Quantitative Evaluation
- 5 Discussion
- 6 Conclusion
- References
- Class-Incremental Domain Adaptation
- 1 Introduction
- 2 Background
- 3 Class-Incremental Domain Adaptation
- 3.1 Foresighted Source-Model Training
- 3.2 Class-Incremental DA on the Target Domain
- 4 Experiments
- 4.1 Discussion
- 5 Conclusion
- References
- Anti-bandit Neural Architecture Search for Model Defense
- 1 Introduction
- 2 Related Work
- 3 Anti-bandit Neural Architecture Search
- 3.1 Search Space
- 3.2 Adversarial Optimization for ABanditNAS
- 3.3 Anti-bandit
- 3.4 Anti-bandit Strategy for ABanditNAS
- 4 Experiments
- 4.1 Experiment Protocol
- 4.2 Results on Different Datasets
- 4.3 Ablation Study
- 5 Conclusion
- References
- Wavelet-Based Dual-Branch Network for Image Demoiréing
- 1 Introduction
- 2 Related Work
- 3 Our Method
- 3.1 Working in the Wavelet Domain
- 3.2 Dense Branch
- 3.3 Dilation Branch
- 3.4 Direction Perception Module
- 4 Experiments
- 4.1 Implementation Details
- 4.2 Datasets and State-of-the-Art Methods
- 4.3 Comparison with the State-of-the-Art
- 4.4 Ablation Studies
- 4.5 Extension to Deraining and Deraindrop
- 5 Conclusion
- References
- Low Light Video Enhancement Using Synthetic Data Produced with an Intermediate Domain Mapping
- 1 Introduction
- 2 Related Work
- 3 Learning the Low-Light Video RAW-to-RGB Mapping
- 3.1 Synthetic Data Generation Using an Intermediate Domain
- 3.2 Training Low-Light RAW-to-RGB Forward Models
- 3.3 GAN Architectures for Data Generation
- 4 Experimental Results
- 4.1 Datasets and Implementation Details
- 4.2 Synthetic Data Quality Evaluation
- 4.3 Output Image and Video Quality Evaluation
- 4.4 Real Training Data Quantity and Ratios
- 5 Conclusions
- References
- Non-local Spatial Propagation Network for Depth Completion
- 1 Introduction
- 2 Related Work
- 3 Non-local Spatial Propagation
- 3.1 Local Spatial Propagation Network
- 3.2 Non-local Spatial Propagation Network.
- 4 Confidence-Incorporated Affinity Learning
- 4.1 Affinity Normalization
- 4.2 Confidence-Incorporated Affinity Normalization
- 5 Depth Completion Network
- 5.1 Network Architecture
- 5.2 Loss Function
- 6 Experimental Results
- 6.1 NYU Depth V2
- 6.2 KITTI Depth Completion
- 6.3 Ablation Studies
- 7 Conclusion
- References
- DanbooRegion: An Illustration Region Dataset
- 1 Introduction
- 2 Background
- 3 The DanbooRegion Dataset
- 4 Feasibility-Based Assignment Recommendation
- 5 Benchmark
- 6 Application
- 7 Conclusion
- References
- Event Enhanced High-Quality Image Recovery
- 1 Introduction
- 2 Related Works
- 3 Problem Statement
- 3.1 Events and Intensity Images
- 3.2 Event Enhanced Degeneration Model
- 4 Event Enhanced High-Quality Image Recovery
- 4.1 Event-Enhanced Sparse Learning
- 4.2 Network
- 4.3 Network Training
- 5 High Frame-Rate Video Generation
- 6 Dataset Preparation
- 7 Experiments
- 7.1 Intensity Reconstruction Experiments
- 7.2 High Frame-Rate Video Experiments
- 8 Conclusion
- References
- PackDet: Packed Long-Head Object Detector
- 1 Introduction
- 2 Related Work
- 3 PackDet: Packed Long-Head Object Detector
- 3.1 Network Architecture
- 3.2 Packing Operator
- 3.3 Packed Long Head
- 3.4 Learning Targets
- 3.5 Loss Functions
- 3.6 Implementation Details
- 4 Experiments
- 4.1 Ablation Study
- 4.2 Comparison to State of the Art
- 5 Conclusions
- References
- A Generic Graph-Based Neural Architecture Encoding Scheme for Predictor-Based NAS
- 1 Introduction
- 2 Related Work
- 2.1 Architecture Evaluation Module
- 2.2 Architecture Searching Module
- 2.3 Neural Architecture Encoders
- 3 Method
- 3.1 Predictor-Based Neural Architecture Search
- 3.2 GATES: A Generic Neural Architecture Encoder
- 3.3 Neural Architecture Search Utilizing the Predictor
- 4 Experiments
- 4.1 Predictor Evaluation on NAS-Bench-101
- 4.2 Predictor Evaluation on NAS-Bench-201
- 4.3 Neural Architecture Search on NAS-Bench-101
- 4.4 Neural Architecture Search in the ENAS Search Space
- 5 Conclusion
- References
- Learning Semantic Neural Tree for Human Parsing
- 1 Introduction
- 2 Related Work
- 3 Methodology
- 3.1 Architecture
- 3.2 Loss Function
- 3.3 Handling Multiple Human Parsing
- 4 Experiment
- 4.1 Single Human Parsing
- 4.2 Multiple Human Parsing
- 4.3 Ablation Study
- 5 Conclusion
- References
- Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation
- 1 Introduction
- 2 Related Works
- 3 Proposed Approach
- 3.1 Overview
- 3.2 Het Construction
- 3.3 Structured Context Encoding and Scene Graph Generation
- 3.4 Relation Ranking Module
- 3.5 Loss Function
- 4 Experimental Evaluation
- 4.1 Dataset, Evaluation and Settings
- 4.2 Ablation Studies
- 4.3 Comparisons with State-of-the-Arts
- 4.4 Analyses About het
- 5 Experiments on Image Captioning
- 6 Conclusion
- References
- Burst Denoising via Temporally Shifted Wavelet Transforms
- 1 Introduction
- 2 Background and Related Work
- 3 Methodology
- 3.1 Overview
- 3.2 Features Matter in Burst Denoising
- 3.3 Temporal Fusion of Deep Features
- 3.4 Loss Function
- 4 Data Preparation
- 4.1 Camera Pipeline
- 4.2 Datasets and Synthetic Burst Generation
- 5 Experimental Results
- 5.1 Overview
- 5.2 Ablation Study
- 5.3 Burst Denoising Qualitative Evaluation
- 5.4 Burst Denoising Quantitative Evaluation
- 5.5 Algorithm Efficiency
- 5.6 Limitations
- 5.7 Generalization to Real Burst Captures
- 6 Conclusion and Future Work
- References
- JSSR: A Joint Synthesis, Segmentation, and Registration System for 3D Multi-modal Image Alignment of Large-Scale Pathological CT Scans
- 1 Introduction
- 2 Related Work
- 3 Methodology
- 3.1 Unpaired Image Synthesis
- 3.2 Multi-modal Image Registration
- 3.3 Multi-modal Image Segmentation
- 3.4 Joint Optimization Strategy
- 4 Experiments
- 4.1 Baseline
- 4.2 Implementation Details
- 4.3 Main Results
- 5 Ablation and Discussion
- 6 Conclusion
- References
- SimAug: Learning Robust Representations from Simulation for Trajectory Prediction
- 1 Introduction
- 2 Related Work
- 3 Approach
- 3.1 Problem Formulation
- 3.2 Training Data Generation from Simulation
- 3.3 Multi-view Simulation Augmentation (SimAug)
- 3.4 Backbone Model for Trajectory Prediction
- 4 Experiments
- 4.1 Evaluation Metrics
- 4.2 Main Results
- 4.3 State-of-the-Art Comparison on Stanford Drone Dataset
- 4.4 State-of-the-Art Comparison on VIRAT/ActEV
- 4.5 Ablation Experiments
- 5 Conclusion
- References
- ScribbleBox: Interactive Annotation Framework for Video Object Segmentation
- 1 Introduction
- 2 Related Work
- 3 Our Approach
- 3.1 Interactive Tracking Annotation
- 3.2 Interactive Segmentation Annotation
- 4 Experimental Results
- 4.1 In-Domain Annotation
- 4.2 Out-of-Domain Annotation
- 4.3 User Study
- 5 Conclusion
- References
- Rethinking Pseudo-LiDAR Representation
- 1 Introduction
- 2 Related Work
- 2.1 3D Detectors Based on Image Representation
- 2.2 3D Detectors Based on Pseudo-LiDAR Representation
- 3 Delving into Pseudo-LiDAR Representation
- 3.1 Review of Pseudo-LiDAR Based Detectors
- 3.2 PatchNet-Vanilla: Equivalent Implementation of Pseudo-LiDAR
- 3.3 Preliminary Conclusion
- 4 PatchNet
- 5 Experiments
- 5.1 Setup
- 5.2 Investigation of Pseudo-LiDAR Representation
- 5.3 Boosting the Performance of PatchNet
- 5.4 Comparing with State-of-the-Art Methods
- 5.5 Qualitative Results
- 6 Conclusions
- References
- Deep Multi Depth Panoramas for View Synthesis
- 1 Introduction
- 2 Related Work
- 3 Multi Depth Panoramas
- 4 Reconstructing and Rendering MDPs
- 4.1 Reconstructing Per-View MPIs from Images
- 4.2 Per-View MPIs to Per-View MDPs
- 4.3 Per-View MDP Blending
- 4.4 Differentiable MDP Rendering with Forward Splatting
- 5 Implementation Details
- 5.1 Dataset
- 5.2 Training
- 6 Results
- 7 Conclusion
- References
- MINI-Net: Multiple Instance Ranking Network for Video Highlight Detection
- 1 Introduction
- 2 Related Work
- 3 Approach
- 3.1 Vision-Audio Fusion Module fF()
- 3.2 Highlight Estimation Module fE()
- 3.3 Bag Classification Module fC()
- 3.4 Objective Functions
- 4 Experiments
- 4.1 Datasets and Metrics
- 4.2 Compared Methods
- 4.3 Highlight Detection Results
- 4.4 Ablation Studies
- 5 Conclusion
- References
- ContactPose: A Dataset of Grasps with Object Contact and Hand Pose
- 1 Introduction
- 2 Related Work
- 3 The ContactPose Dataset
- 3.1 Data Capture Protocol and Equipment
- 3.2 Grasp Capture Without Hand Markers
- 4 Data Analysis
- 5 Contact Modeling Experiments
- 6 Results
- 7 Conclusion and Future Work
- References
- API-Net: Robust Generative Classifier via a Single Discriminator
- 1 Introduction
- 2 Related Work
- 3 The Proposed Method
- 3.1 Motivation
- 3.2 Anti-Perturbation Inference Net
- 3.3 Optimization
- 4 Experiments
- 4.1 Experimental Settings
- 4.2 Robustness
- 4.3 Ablation Study
- 5 Discussion and Conclusion
- References
- Bias-Based Universal Adversarial Patch Attack for Automatic Check-Out
- 1 Introduction
- 2 Related Work
- 2.1 Adversarial Attacks
- 2.2 Automatic Check-Out
- 3 Proposed Framework
- 3.1 Problem Definition
- 3.2 The Framework
- 3.3 Perceptually Biased Prior Generation
- 3.4 Training with Semantically Biased Prototypes
- 4 Experiments
- 4.1 Dataset and Evaluation Metrics
- 4.2 Experimental Settings
- 4.3 Digital-World Attack
- 4.4 Real-World Attack
- 4.5 Generalization Ability
- 4.6 Analysis of Textural Priors
- 4.7 Ablation Study
- 5 Conclusions
- References
- Imbalanced Continual Learning with Partitioning Reservoir Sampling
- 1 Introduction
- 2 Motivation: Fatal Forgetting on the Tail Classes
- 3 Approach
- 3.1 Problem Formulation
- 3.2 Conventional Reservoir Sampling
- 3.3 Fundamental Problems in Imbalanced Learning
- 3.4 Partitioning Reservoir Sampling
- 4 Related Work
- 5 The Multi-label Sequential Datasets
- 5.1 The COCOseq
- 5.2 The NUS-WIDEseq
- 6 Experiments
- 6.1 Experimental Design
- 6.2 Results
- 7 Conclusion
- References
- Guided Collaborative Training for Pixel-Wise Semi-Supervised Learning
- 1 Introduction
- 2 Related Work
- 2.1 SSL for Image Classification
- 2.2 SSL for Pixel-Wise Tasks
- 2.3 Prediction Confidence in SSL
- 2.4 Perturbations in SSL
- 3 Guided Collaborative Training
- 3.1 Overview of GCT
- 3.2 Flaw Detector
- 3.3 Dynamic Consistency Constraint
- 3.4 Flaw Correction Constraint
- 4 Experiments
- 4.1 Semantic Segmentation Experiments
- 4.2 Real Image Denoising Experiments
- 4.3 Portrait Image Matting Experiments
- 4.4 Night Image Enhancement Experiments
- 4.5 Ablation Experiments
- 5 Conclusions
- References
- Stacking Networks Dynamically for Image Restoration Based on the Plug-and-Play Framework
- 1 Introduction
- 2 Related Work
- 2.1 Plug-and-Play Methods
- 2.2 Image Denoising
- 2.3 Non-uniform Blind Deblurring
- 3 Proposed Methods
- 3.1 Overview
- 3.2 Deep Plug-and-Play
- 3.3 Deep Prior
- 3.4 Adaptive Update Scheme
- 4 Experiments
- 4.1 Image Denoising
- 4.2 Image Deblurring
- 4.3 Analysis of Convergence
- 4.4 Mathematical Explanations
- 5 Conclusion
- References
- Efficient Transfer Learning via Joint Adaptation of Network Architecture and Weight
- 1 Introduction
- 2 Related Work
- 2.1 Transfer Learning
- 2.2 Neural Architecture Search
- 3 Method
- 3.1 Problem Setting
- 3.2 Source Super-Network Training
- 3.3 Neural Architecture Search on Target
- 3.4 Neural Weight Search on Target
- 3.5 Generalization over Diverse Structures
- 4 Experiment
- 4.1 Objection Detection
- 4.2 Fine-Grained Classification
- 4.3 Semantic Segmentation
- 5 Conclusion
- References
- Spatial Attention Pyramid Network for Unsupervised Domain Adaptation
- 1 Introduction
- 2 Related Works
- 3 Spatial Attention Pyramid Network
- 4 Experiment
- 4.1 Domain Adaptation for Detection
- 4.2 Domain Adaptation for Segmentation
- 4.3 Ablation Study
- 5 Conclusions
- References
- GSIR: Generalizable 3D Shape Interpretation and Reconstruction
- 1 Introduction
- 2 Related Work
- 3 Approach
- 3.1 Single-View Depth Estimation Module
- 3.2 Structure Interpretation Module
- 3.3 Structure-Guided Shape Completion Module
- 3.4 Voxel Refinement Module
- 3.5 Interpretation Consistency
- 3.6 Technical Details
- 4 Experiments
- 4.1 3D Shape Interpretation
- 4.2 Structure-Guided Shape Completion
- 4.3 3D Shape Reconstruction
- 4.4 Shape Interpretation with Consistency
- 4.5 Generalization to Real Images
- 4.6 Ablation Study
- 4.7 Shape Manipulation
- 5 Conclusion
- References
- Weakly Supervised 3D Object Detection from Lidar Point Cloud
- 1 Introduction
- 2 Related Work
- 3 Data Annotation Strategy for Our Weak Supervision
- 4 Proposed Algorithm
- 4.1 Learn to Generate Cylindrical Proposals from Click Annotations
- 4.2 Learn to Refine Proposals from a Few Well-Labeled Instances
- 4.3 Implementation Detail
- 5 Experiment
- 5.1 Experimental Setup
- 5.2 Quantitative and Qualitative Performance
- 5.3 Diagnostic Experiment
- 5.4 Performance as an Annotation Tool
- 6 Conclusion and Discussion
- References
- Two-Phase Pseudo Label Densification for Self-training Based Domain Adaptation
- 1 Introduction
- 2 Related Works
- 3 Preliminaries
- 3.1 Problem Setting
- 3.2 Self-training for UDA
- 3.3 Noisy Label Handling
- 4 Method
- 4.1 1st Phase: Voting Based Densification
- 4.2 2nd Phase: Easy-Hard Classification Based Densification
- 5 Experiments
- 5.1 Dataset
- 5.2 Implementation Details
- 5.3 Main Results
- 5.4 Ablation Study
- 5.5 Parameter Analysis
- 5.6 Loss Function Analysis
- 6 Conclusions
- References
- Adaptive Offline Quintuplet Loss for Image-Text Matching
- 1 Introduction
- 2 Related Work
- 3 Methods
- 3.1 Triplet Loss for Image-Text Matching
- 3.2 Offline Quintuplet Loss
- 3.3 Adaptive and Hierarchical Penalization
- 4 Experiments
- 4.1 Dataset and Experiment Settings
- 4.2 Implementation Details
- 4.3 Results on MS-COCO and Flickr30K
- 4.4 Ablation Study and Visualization
- 5 Conclusion
- References
- Learning Object Placement by Inpainting for Compositional Data Augmentation
- 1 Introduction
- 2 Related Work
- 2.1 Learning Object Placements
- 2.2 Data Augmentation for Object Detection
- 3 Methods
- 3.1 Data Acquisition by Inpainting
- 3.2 Learning Object Placements
- 3.3 Data Augmentation
- 4 Experiments
- 4.1 Baselines
- 4.2 Object Placements
- 4.3 Overfitting Inpainting Artifacts?
- 4.4 Data Augmentation for Object Detection
- 4.5 Data Augmentation for Instance Segmentation
- 4.6 Feature Representation Learning
- 5 Conclusion
- References
- Deep Vectorization of Technical Drawings
- 1 Introduction
- 2 Related Work
- 3 Our Vectorization System
- 3.1 Preprocessing of the Input Raster Image
- 3.2 Initial Estimation of Primitives
- 3.3 Refinement of the Estimated Primitives
- 3.4 Merging Estimations from All Patches
- 4 Experimental Evaluation
- 4.1 Clean Line Drawings
- 4.2 Degraded Line Drawings
- 4.3 Ablation Study
- 5 Conclusion
- References
- CAD-Deform: Deformable Fitting of CAD Models to 3D Scans
- 1 Introduction
- 2 Related Work
- 3 Overview of CAD-Deform Framework
- 4 Data-Driven Shape Deformation
- 4.1 Deformation Energy
- 4.2 Quadratic Terms
- 4.3 Data Term
- 4.4 Optimization
- 5 Datasets
- 6 Results
- 6.1 Evaluation Setup
- 6.2 Fitting Accuracy: How Well Do CAD Deformations Fit?
- 6.3 CAD Quality: How CAD-like Are Deformed Models?
- 6.4 Ablation Study
- 6.5 Shape Morphing Results
- 7 Conclusion
- A Statistics on the Used Datasets
- B Optimization Details
- C Qualitative Fitting Results
- D Morphing
- E PartNet Annotation
- F Fitting Accuracy Analysis
- G Perceptual Assessment and User Study Details
- References
- An Image Enhancing Pattern-Based Sparsity for Real-Time Inference on Mobile Devices
- 1 Introduction
- 2 Background
- 3 Overview
- 4 Pattern Library - Theory and Design
- 4.1 A Unique Perspective on Weight Pruning
- 4.2 Pattern Library Design
- 5 Pattern-Aware Network Pruning Framework for Pattern Library Extraction
- 5.1 Pattern Library Extraction - A Single Step
- 5.2 Pattern Library Extraction - Overall
- 6 Connectivity Sparsity and the New Sparsity Induced Inference Framework
- 6.1 Connectivity Sparsity
- 6.2 Compiler-Assisted Inference Framework for Real-Time Execution
- 7 Experimental Results
- 7.1 Pattern Library Extraction Result
- 7.2 Visualization Demonstration and Accuracy Analysis for Pattern Pruning
- 7.3 Connectivity Pruning and Overall Model Compression Results
- 7.4 Performance Evaluation on Mobile Platform
- 8 Conclusion
- References
- AutoTrajectory: Label-Free Trajectory Extraction and Prediction from Videos Using Dynamic Points
- 1 Introduction
- 2 Related Work
- 2.1 Trajectory Prediction
- 2.2 Supervised Multi-object Tracking
- 2.3 Unsupervised Learning for Dynamic Modeling
- 3 Our Approach
- 3.1 Problem Definition
- 3.2 Method Overview
- 3.3 Dynamic-Point Modeling
- 3.4 Dynamic-to-Instance Aggregation
- 3.5 Instance Matching
- 3.6 Trajectory Prediction
- 3.7 Optimization
- 3.8 Network Architecture
- 4 Experiments
- 4.1 Implementation Details
- 4.2 Datasets
- 4.3 Results
- 4.4 Ablation Study
- 4.5 Limitations and Future Work
- 5 Conclusion
- References
- Multi-agent Embodied Question Answering in Interactive Environments
- 1 Introduction
- 2 Related Work
- 2.1 Question Answering in Embodied Environments
- 2.2 Multi-agent Systems
- 2.3 3D Computer Vision
- 2.4 Environments and Datasets
- 3 Overview of the Proposed Framework
- 4 Multi-agent 3D Reconstruction in Interactive Environments
- 4.1 Data Structure in Support of Interactive Environments
- 4.2 Structural Memory and Semantic Memory
- 4.3 Scanning Boundaries and Scanning Tasks
- 4.4 Viewpoint-Voxel Coverage Matrix
- 4.5 Termination Condition
- 4.6 Multi-agent 3D Reconstruction
- 5 Question Answering with 3D-CNN and LSTM
- 5.1 3D-CNN Scene Encoder and Question Encoder
- 5.2 Question Answering Model
- 5.3 Termination Model
- 5.4 Training the QA Model and the Termination Model
- 6 Experimental Results
- 6.1 Single-Agent IQA
- 6.2 Multi-agent IQA on IQuADv1 Dataset
- 6.3 Qualitative Examples
- 7 Conclusion
- References
- Conditional Sequential Modulation for Efficient Global Image Retouching
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Analysis of Retouching Operations
- 3.2 Conditional Sequential Retouching Network
- 4 Experiments
- 4.1 Comparison with State-of-the-Art Methods
- 4.2 Multiple Styles and Strength Control
- 4.3 Ablation Study
- References
- Segmenting Transparent Objects in the Wild*-1pc
- 1 Introduction
- 2 Related Work
- 3 Trans10K Dataset and Annotation
- 3.1 Data Description
- 3.2 Annotation
- 3.3 Dataset Complexity
- 3.4 Evaluation Metrics
- 4 Proposed Method
- 4.1 Network Architecture
- 4.2 Boundary Attention Module
- 4.3 Decoder
- 4.4 Loss Function
- 5 Experiments
- 5.1 Implementation Details
- 5.2 Ablation Studies
- 5.3 Comparison to the State-of-the-Arts
- 6 Conclusion
- References
- Length-Controllable Image Captioning
- 1 Introduction
- 2 Background and Related Works
- 2.1 Autoregressive Image Captioning (AIC)
- 2.2 Diverse and Controllable Image Captioning
- 2.3 Non-autoregressive Text Generation
- 3 Method
- 3.1 Acquisition of Length Information
- 3.2 Non-autoregressive Length-Controllable Decoding
- 4 Experiments
- 4.1 Dataset and Metrics
- 4.2 Implementation Details
- 4.3 Performance on AoANet and VLP
- 4.4 Performance on LaBERT
- 4.5 Performance Analysis of LaBERT
- 4.6 Controllability and Diversity Analysis
- 5 Conclusion
- References
- Few-Shot Semantic Segmentation with Democratic Attention Networks
- 1 Introduction
- 2 Related Work
- 2.1 Semantic Segmentation
- 2.2 Few-Shot Learning
- 2.3 Few-Shot Semantic Segmentation
- 3 Democratic Attention Network
- 3.1 Problem Definition
- 3.2 Architecture Overview
- 3.3 Democratized Graph Attention
- 3.4 Multi-scale Guidance
- 4 Experiments
- 4.1 PASCAL-5i
- 4.2 COCO-20i
- 4.3 FSS-1000
- 4.4 Ablation Study
- 5 Conclusion
- References
- Defocus Blur Detection via Depth Distillation
- 1 Introduction
- 2 Related Works
- 3 Methods
- 3.1 Depth Distillation
- 3.2 Network Structure
- 3.3 Loss Function
- 4 Experiments
- 4.1 Comparisons with State-of-the-Art Methods
- 4.2 Ablation Studies of Network Structure
- 4.3 Failure Cases
- 5 Conclusions
- References
- Motion Guided 3D Pose Estimation from Videos
- 1 Introduction
- 2 Related Work
- 3 Approach
- 3.1 Motion Loss
- 3.2 U-Shaped Graph Convolutional Networks
- 4 Experiments
- 4.1 Dataset
- 4.2 Evaluation Metric
- 4.3 Ablation Study
- 4.4 Comparison with State-of-the-art
- 5 Conclusion
- References
- Reflection Separation via Multi-bounce Polarization State Tracing
- 1 Introduction
- 2 Related Work
- 3 Physically-Based Image Formation Model
- 3.1 Polarization Image Formation Model
- 3.2 Polarization Simulation Engine
- 4 Proposed Method
- 4.1 Network Architecture
- 4.2 Perceptual and Simulation-Based Loss Function
- 5 Experiments
- 5.1 Datasets
- 5.2 Visual Comparison in Synthetic and Real Scene
- 5.3 Quantitative Evaluation
- 5.4 Ablation Study
- 6 Conclusion
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.