
Computer Vision - ECCV 2020
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
The 1360 revised papers presented in these proceedings were carefully reviewed and selected from a total of 5025 submissions. The papers deal with topics such as computer vision; machine learning; deep neural networks; reinforcement learning; object recognition; image classification; image processing; object detection; semantic segmentation; human pose estimation; 3d reconstruction; stereo vision; computational photography; neural networks; image coding; image reconstruction; object recognition; motion estimation.
More details
Other editions
Additional editions

Content
- Intro
- Foreword
- Preface
- Organization
- Contents - Part XXIV
- Deep Novel View Synthesis from Colored 3D Point Clouds
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Architecture
- 3.2 Training Loss
- 4 Experiments
- 4.1 Cascaded Outputs Comparison
- 4.2 Scene Revealing Evaluation
- 4.3 Novel View Synthesis Evaluation
- 4.4 Results on the KITTI Dataset
- 5 Conclusion
- References
- Consensus-Aware Visual-Semantic Embedding for Image-Text Matching
- 1 Introduction
- 2 Related Work
- 2.1 Knowledge Based Deep Learning
- 2.2 Image-Text Matching
- 3 Consensus-Aware Visual-Semantic Embedding
- 3.1 Exploit Consensus Knowledge to Enhance Concept Representations
- 3.2 Consensus-Aware Representation Learning
- 3.3 Training and Inference
- 4 Experiments
- 4.1 Dataset and Settings
- 4.2 Implementation Details
- 4.3 Comparison to State-of-the-art
- 4.4 Ablation Studies
- 4.5 Further Analysis
- 5 Conclusions
- References
- Spatial Hierarchy Aware Residual Pyramid Network for Time-of-Flight Depth Denoising
- 1 Introduction
- 2 Related Work
- 3 ToF Imaging Model
- 4 Spatial Hierarchy Aware Residual Pyramid Network
- 4.1 Residual Regression Module
- 4.2 Residual Fusion Module
- 4.3 Depth Refinement Module
- 4.4 Loss Function
- 5 Experiments
- 5.1 Datasets
- 5.2 Data Pre-processing
- 5.3 Training Settings
- 5.4 Ablation Studies
- 5.5 Results on Synthetic Datasets
- 5.6 Results on the Realistic Dataset
- 6 Conclusion
- References
- Sat2Graph: Road Graph Extraction Through Graph-Tensor Encoding
- 1 Introduction
- 2 Related Work
- 3 Sat2Graph
- 3.1 Graph-Tensor Encoding (GTE)
- 3.2 Training Sat2Graph
- 4 Evaluation
- 4.1 Datasets
- 4.2 Baselines
- 4.3 Implementation Details
- 4.4 Evaluation Metrics
- 4.5 Quantitative Evaluation
- 4.6 Qualitative Evaluation
- 5 Discussion
- 6 Conclusion
- References
- Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition
- 1 Introduction
- 2 Related Work
- 3 Dataset
- 4 Methodology
- 4.1 Preservation of Audio Source Knowledge
- 4.2 Audiovisual Representation for Multi-task
- 4.3 Sound Events in Different Scenes
- 5 Experiments
- 5.1 Implementation Details
- 5.2 Aerial Scene Recognition
- 5.3 Ablation Study
- 6 Conclusions
- References
- Polarimetric Multi-view Inverse Rendering
- 1 Introduction
- 2 Related Work
- 3 Polarimetric Ambiguities in Surface Normal Prediction
- 3.1 Polarimetric Calculation
- 3.2 Ambiguities
- 4 Polarimetric Multi-view Inverse Rendering
- 4.1 Color Polarization Sensor Data Processing
- 4.2 Initial Geometric Reconstruction
- 4.3 Photometric and Polarimetric Optimization
- 5 Experimental Results
- 5.1 Implementation Details
- 5.2 Comparison Using Synthetic Data
- 5.3 Comparison Using Real Data
- 5.4 Refinement for Polarimetric MVS ch6cui2017polarimetric
- 6 Conclusions
- References
- SideInfNet: A Deep Neural Network for Semi-Automatic Semantic Segmentation with Side Information
- 1 Introduction
- 2 Related Work
- 2.1 Interactive Segmentation
- 2.2 Semantic Segmentation with Side Information
- 3 SideInfNet
- 3.1 CNN-Based Semantic Segmentation
- 3.2 Side Information Feature Map Construction
- 3.3 Fusion Weight Learning
- 3.4 Adaptive Architecture
- 4 Experiments and Results
- 4.1 Zone Segmentation
- 4.2 BreAst Cancer Histology Segmentation
- 4.3 Urban Segmentation
- 4.4 Varying Levels of Side Information
- 4.5 SideInfNet with Another CNN Backbone
- 5 Conclusion
- References
- Improving Face Recognition by Clustering Unlabeled Faces in the Wild
- 1 Introduction
- 2 Related Work
- 3 Learning from Unlabeled Faces
- 3.1 Separating Overlapping Identities
- 3.2 Clustering Faces with GCN
- 3.3 Joint Data Re-training with Clustering Uncertainty
- 4 Experiments
- 4.1 Controlled Disjoint: MS-Celeb-1M Splits
- 4.2 Controlled Overlap: Overlapping Identities
- 4.3 Semi-controlled: Limited Labeled, Large-Scale Unlabeled Data
- 4.4 Soft Labels for Clustering Uncertainty
- 4.5 Uncontrolled: Large-Scale Labeled and Unlabeled Data
- 5 Conclusion
- References
- NeuRoRA: Neural Robust Rotation Averaging
- 1 Introduction
- 2 Related Works
- 3 Multiple Rotation Averaging
- 4 Learning to Predict Absolute Orientations
- 4.1 The Network Design Choice
- 4.2 View-Graph Cleaning Network
- 4.3 Bootstrapping Absolute Orientations
- 4.4 Fine-Tuning Network
- 4.5 Training
- 5 Results
- 6 Discussion
- References
- SG-VAE: Scene Grammar Variational Autoencoder to Generate New Indoor Scenes
- 1 Introduction
- 2 Deep Generative Model for Scene Generation
- 2.1 Scene-Grammar Variational Autoencoder
- 2.2 CFG of Indoor Scenes
- 2.3 The VAE Network
- 3 Discovery of the Scene Grammar
- 3.1 Data-Driven Relationship Discovery
- 3.2 Creating a CFG from the Causal Graph
- 4 Experiments
- 4.1 Comparison with Baselines on Other Datasets
- 4.2 Scene Layout Estimation from the RGB-D Image
- 5 Related Works
- 6 Conclusion
- References
- Unsupervised Learning of Optical Flow with Deep Feature Similarity
- 1 Introduction
- 2 Related Work
- 3 Our Approach
- 3.1 Background on Unsupervised Optical Flow Learning
- 3.2 Feature Similarity from Multiple Layers
- 3.3 Learning Optical Flow with Feature Similarity
- 4 Experimental Results
- 4.1 Evaluation on Benchmarks
- 5 Conclusion
- References
- Blended Grammar Network for Human Parsing
- 1 Introduction
- 2 Related Work
- 3 Blended Grammar Network
- 3.1 Grammar
- 3.2 Blended Grammar Network Structure
- 3.3 Part-Aware CRNN
- 3.4 Loss Function
- 4 Experiments
- 4.1 Datasets
- 4.2 Implementation Details
- 4.3 Ablation Study
- 4.4 Comparison with State-of-the-Art
- 5 Conclusion
- References
- P2Net: Patch-Match and Plane-Regularization for Unsupervised Indoor Depth Estimation
- 1 Introduction
- 2 Related Work
- 2.1 Supervised Depth Estimation
- 2.2 Unsupervised Depth Estimation
- 2.3 Piece-Wise Planar Scene Reconstruction
- 3 Method
- 3.1 Overview
- 3.2 Keypoints Extraction
- 3.3 Patch-Based Multi-view Photometric Consistency Error
- 3.4 Planar Consistency Loss
- 3.5 Loss Function
- 4 Experiments
- 4.1 Implementation Details
- 4.2 Datasets
- 4.3 Ablation Experiments
- 5 Conclusion
- References
- Efficient Attention Mechanism for Visual Dialog that Can Handle All the Interactions Between Multiple Inputs
- 1 Introduction
- 2 Related Work
- 2.1 Attention Mechanisms for Vision-Language Tasks
- 2.2 Visual Dialog
- 3 Efficient Attention Mechanism for Many Utilities
- 3.1 Attention Mechanism of Transformer
- 3.2 Application to Bi-modal Tasks
- 3.3 Light-Weight Transformer for Many Inputs
- 3.4 Interactions Between All Utilities
- 4 Implementation Details for Visual Dialog
- 4.1 Problem Definition
- 4.2 Representation of Utilities
- 4.3 Overall Network Design
- 4.4 Design of Decoders
- 4.5 Multi-task Learning
- 5 Experimental Results
- 5.1 Experimental Setup
- 5.2 Comparison with State-of-the-Art Methods
- 5.3 Ablation Study
- 5.4 Visualization of Generated Attention
- 6 Summary and Conclusion
- References
- Adaptive Mixture Regression Network with Local Counting Map for Crowd Counting
- 1 Introduction
- 2 Related Work
- 2.1 Density Estimation Based Approaches
- 2.2 Direct Counting Regression Approaches
- 3 Proposed Method
- 3.1 Local Counting Map
- 3.2 Scale-Aware Module
- 3.3 Mixture Regression Module
- 3.4 Adaptive Soft Interval Module
- 4 Experiments
- 4.1 Datasets
- 4.2 Implementation Details
- 4.3 Comparisons with State of the Art
- 4.4 Ablation Studies
- 5 Conclusion
- References
- BIRNAT: Bidirectional Recurrent Neural Networks with Adversarial Training for Video Snapshot Compressive Imaging
- 1 Introduction
- 1.1 Video Snapshot Compressive Imaging
- 1.2 Related Work
- 1.3 Contributions and Organization of This Paper
- 2 Mathematical Model of Video SCI
- 3 Proposed Network for Reconstruction
- 3.1 Measurement Energy Normalization
- 3.2 AttRes-CNN
- 3.3 Bidirectional Recurrent Reconstruction Network
- 3.4 Optimization
- 4 Experiments
- 4.1 Training, Testing Datasets and Experimental Settings
- 4.2 Results on Simulation Datasets
- 4.3 Results on Real SCI Data
- 5 Conclusions
- References
- Ultra Fast Structure-Aware Deep Lane Detection
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 New Formulation for Lane Detection
- 3.2 Lane Structural Loss
- 3.3 Feature Aggregation
- 4 Experiments
- 4.1 Experimental Setting
- 4.2 Ablation Study
- 4.3 Results
- 5 Conclusion
- References
- Cross-Identity Motion Transfer for Arbitrary Objects Through Pose-Attentive Video Reassembling
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Overview
- 3.2 Network Architecture
- 3.3 Training and Losses
- 3.4 Inference
- 4 Experiments
- 4.1 Experimental Setup
- 4.2 Experimental Results
- 4.3 Human Evaluation
- 4.4 Analysis
- 5 Conclusion
- References
- Domain Adaptive Object Detection via Asymmetric Tri-Way Faster-RCNN
- 1 Introduction
- 2 Related Work
- 3 The Proposed ATF Approach
- 3.1 Network Architecture of ATF
- 3.2 Principle of the Chief Net
- 3.3 Principle of the Ancillary Net
- 3.4 Training Loss of Our ATF
- 4 Experiments
- 4.1 Implementation Details
- 4.2 Datasets
- 4.3 Cross-Domain Detection in Different Visibility and Cameras
- 4.4 Cross-Domain Detection on Large Domain Shift
- 4.5 Cross-Domain Detection from Synthetic to Real
- 4.6 Analysis and Discussion
- 5 Conclusions
- References
- Exclusivity-Consistency Regularized Knowledge Distillation for Face Recognition
- 1 Introduction
- 2 Related Work
- 3 Proposed Formulation
- 3.1 Weight Exclusivity
- 3.2 Feature Consistency
- 3.3 Exclusivity-Consistency Regularized Knowledge Distillation
- 4 Experiments
- 4.1 Datasets
- 4.2 Experimental Settings
- 4.3 Ablation Study and Exploratory Experiments
- 4.4 Comparison to State-of-the-Art Methods
- 5 Conclusion
- References
- Learning Camera-Aware Noise Models
- 1 Introduction
- 2 Related Work
- 3 Proposed Method
- 3.1 Noise-Generating Network
- 3.2 Camera-Encoding Network
- 3.3 Learning
- 4 Experimental Results
- 4.1 Quantitative and Qualitative Results
- 4.2 Ablation Studies
- 4.3 Robustness Analysis of the Camera-Encoding Network
- 5 Application to Real Image Denoising
- 5.1 Real-World Image Denoising
- 5.2 Camera-Specific Denoising Networks
- 6 Conclusions
- References
- Towards Precise Completion of Deformable Shapes
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Overview
- 3.2 Encoder
- 3.3 Generator
- 3.4 Loss Function
- 3.5 Training Procedure
- 3.6 Implementation Considerations
- 4 Experiments
- 4.1 Datasets
- 4.2 Methods in Comparison
- 4.3 Single View Completion
- 4.4 Non-rigid Partial Correspondences
- 4.5 Real Scans
- 5 Conclusions
- References
- Iterative Distance-Aware Similarity Matrix Convolution with Mutual-Supervised Point Elimination for Efficient Point Cloud Registration
- 1 Introduction
- 2 Related Work
- 3 Model
- 3.1 Notation
- 3.2 Similarity Matrix Convolution
- 3.3 Two-Stage Point Elimination
- 3.4 Mutual-Supervision Loss
- 3.5 Balanced Sampling for Training
- 4 Experiments
- 4.1 Experimental Setup
- 4.2 Results
- 4.3 Efficiency
- 4.4 Ablation Study
- 5 Conclusions
- References
- Pairwise Similarity Knowledge Transfer for Weakly Supervised Object Localization
- 1 Introduction
- 2 Related Work
- 3 Problem Description and Background
- 4 Proposed Method
- 4.1 Re-localization
- 4.2 Knowledge Transfer
- 4.3 Network Architectures
- 5 Experiments
- 5.1 COCO 2017 Dataset
- 5.2 ILSVRC 2013 Detection Dataset
- 6 Conclusion
- References
- Environment-Agnostic Multitask Learning for Natural Language Grounded Navigation
- 1 Introduction
- 2 Background
- 3 Environment-Agnostic Multitask Learning
- 3.1 Overview
- 3.2 Multitask Reinforcement Learning
- 3.3 Environment-Agnostic Representation Learning
- 4 Model Architecture
- 5 Experiments
- 5.1 Experimental Setup
- 5.2 Environment-Agnostic Multitask Learning
- 5.3 Multitask Learning
- 5.4 Environment-Agnostic Learning
- 5.5 Reward Shaping for NDH Task
- 6 Conclusion
- References
- TPFN: Applying Outer Product Along Time to Multimodal Sentiment Analysis Fusion on Incomplete Data
- 1 Introduction
- 1.1 Related Works
- 2 Preliminaries
- 3 Time Product Fusion Network
- 3.1 Main Idea: Outer Product Through Time
- 3.2 Low-Rank Inference Module
- 3.3 Low-Rank Regularization
- 4 Experiments
- 4.1 Performance on MOSI and MOSEI
- 4.2 Effect of Time Window Size k
- 4.3 Effect of Regularization Parameter
- 4.4 Discussion on CP-rank
- 5 Conclusions
- References
- ProxyNCA++: Revisiting and Revitalizing Proxy Neighborhood Component Analysis
- 1 Introduction
- 2 Related Work
- 3 Methods
- 3.1 Neighborhood Component Analysis (NCA)
- 3.2 ProxyNCA
- 3.3 Aligning with NCA by Optimizing Proxy Assignment Probability
- 3.4 About Temperature Scaling
- 3.5 About Global Pooling
- 3.6 About Fast Moving Proxies
- 3.7 Layer Norm (Norm) and Class Balanced Sampling (CBS)
- 4 Experiments
- 4.1 Experimental Setup
- 4.2 Evaluation
- 4.3 Ablation Study
- 5 Conclusion
- References
- Learning with Privileged Information for Efficient Image Super-Resolution
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Teacher
- 3.2 Student
- 4 Experiments
- 4.1 Experimental Details
- 4.2 Ablation Studies
- 4.3 Analysis on Compact Features
- 4.4 Results
- 5 Conclusion
- References
- Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive Person Re-identification
- 1 Introduction
- 2 Related Work
- 3 Proposed Method
- 3.1 Formulation
- 3.2 Self-adaptive Classification
- 3.3 Memory-Based Temporal-Guided Cluster
- 4 Experiment
- 4.1 Dataset
- 4.2 Implementation Details
- 4.3 Ablation Study
- 4.4 Comparison with State-of-the-Art Methods
- 5 Conclusion
- References
- Autoencoder-Based Graph Construction for Semi-supervised Learning
- 1 Introduction
- 2 Related Work
- 2.1 Semi-supervised Learning
- 2.2 Graph-Based SSL
- 2.3 Matrix Completion
- 3 Problem Formulation
- 4 Our Approach
- 4.1 Learning the Graph with Autoencoder
- 4.2 Simultaneous Training
- 5 Experiments
- 5.1 Performance Evaluations
- 5.2 Comparison to Graph-Based SSL ch30Tempens,ch30SNTG,ch30GSCNN
- 5.3 Ablation Study
- 5.4 Wide ResNet Results
- 6 Conclusion
- References
- Virtual Multi-view Fusion for 3D Semantic Segmentation
- 1 Introduction
- 2 Related Work
- 3 Method Overview
- 4 Virtual View Selection
- 5 Multiview Fusion
- 5.1 2D Semantic Segmentation Model
- 5.2 3D Fusion of 2D Semantic Features
- 6 Experiments
- 6.1 Evaluation on ScanNet Dataset
- 6.2 Evaluation on Stanford 3D Indoor Spaces (S3DIS)
- 6.3 Ablation Studies
- 7 Conclusion
- References
- Decoupling GCN with DropGraph Module for Skeleton-Based Action Recognition
- 1 Introduction
- 2 Background
- 3 Approach
- 3.1 Decoupling Graph Convolutional Network
- 3.2 Attention-Guided DropGraph
- 4 Experiments
- 4.1 Datasets and Model Configuration
- 4.2 Ablation Study
- 4.3 Comparisons to the State-of-the-Art
- 5 Conclusion
- References
- Deep Shape from Polarization
- 1 Introduction
- 2 Related Work
- 3 Proposed Method
- 3.1 Image Formation and Physical Solution
- 3.2 Learning with Physics
- 4 Dataset and Implementation Details
- 4.1 Dataset
- 4.2 Software Implementation
- 5 Experimental Results
- 5.1 Comparisons to Physics-Based SfP
- 5.2 Robustness to Lighting Variations
- 5.3 Importance of Polarization
- 5.4 Importance of Physics Revealed by Ablating Priors
- 5.5 Quantitative Evaluation on Our Test Set
- 5.6 Qualitative Evaluation on Our Test Set
- 6 Discussion
- References
- A Boundary Based Out-of-Distribution Classifier for Generalized Zero-Shot Learning
- 1 Introduction
- 2 Related Work
- 3 Revisit Spherical Variational Auto-Encoders
- 4 Proposed Approach
- 4.1 Problem Formulation
- 4.2 Boundary Based Out-of-Distribution Classifier
- 4.3 Implementation Details
- 5 Experiments
- 5.1 Datasets, Evaluation and Baselines
- 5.2 Out-of-Distribution Classification
- 5.3 Comparison with State-of-the-Arts
- 5.4 Model Analysis
- 6 Conclusions
- References
- Mind the Discriminability: Asymmetric Adversarial Domain Adaptation
- 1 Introduction
- 2 Revisiting Transferability and Discriminability in Symmetric Adversarial Domain Adaptation
- 2.1 The Theory of Domain Adaptation
- 2.2 Limitations and Insights
- 3 Asymmetric Adversarial Domain Adaptation
- 3.1 The Learning Framework
- 3.2 Discussions and Theories
- 4 Experiments
- 4.1 Experimental Setup
- 4.2 Overall Results
- 4.3 Spectral Analysis Using SVD
- 4.4 Analytics and Discussions
- 5 Related Work
- 6 Conclusion
- References
- SeqXY2SeqZ: Structure Learning for 3D Shapes by Sequentially Predicting 1D Occupancy Segments from 2D Coordinates
- 1 Introduction
- 2 Related Work
- 3 Overview
- 4 Voxel Tubelization
- 5 SeqXY2SeqZ
- 6 Experiments and Analysis
- 6.1 Representation Ability
- 6.2 Single Image 3D Reconstruction
- 6.3 Ablation Studies and Analysis
- 7 Conclusion
- References
- Simultaneous Detection and Tracking with Motion Modelling for Multiple Object Tracking
- 1 Introduction
- 2 Related Work
- 3 Omni-MOT Dataset
- 4 Deep Motion Modeling Network
- 4.1 Data Preparation
- 4.2 Anchor Tubes
- 4.3 Motion Model
- 4.4 Architecture
- 4.5 Deployment
- 5 Experiments
- 6 Conclusion
- References
- Deep FusionNet for Point Cloud Semantic Segmentation
- 1 Introduction
- 2 Related Work
- 2.1 Voxel-Based Networks
- 2.2 Point-Based Networks
- 3 FusionNet
- 3.1 Point Cloud Representation
- 3.2 Neighborhood Aggregations
- 3.3 Inner-Voxel Aggregation
- 3.4 Down/Up-Sampling
- 3.5 Architecture and Sparse Implementation
- 4 Experiments
- 4.1 Datasets
- 4.2 Ablation Study
- 4.3 Benchmark Evaluations
- 4.4 Analysis of the Voxel-Based ``Mini-PointNet''
- 4.5 Efficiency for Large-Scale Point Cloud Processing
- 5 Conclusions and Future Work
- References
- Deep Material Recognition in Light-Fields via Disentanglement of Spatial and Angular Information
- 1 Introduction
- 2 Related Work
- 3 Light-Field Imaging Model
- 3.1 Light-Field Geometry
- 3.2 Analysis of Baseline Methods
- 4 Methods
- 4.1 Representative Selection
- 4.2 Angular Registration
- 4.3 Network Architecture
- 5 Experimental Results
- 5.1 Data and Training Procedure
- 5.2 Overall Performance
- 5.3 Ablation Studies
- 5.4 Visualization
- 6 Conclusion
- References
- Dual Adversarial Network for Deep Active Learning
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Dual Adversarial Network for Deep Active Learning
- 3.2 Training Procedure of DAAL
- 4 Experiments
- 4.1 Datasets
- 4.2 Experimental Settings
- 4.3 Image Classification on CIFAR10/100
- 4.4 Semantic Segmentation on Cityscapes
- 4.5 Performance Analysis
- 4.6 Ablation Study
- 5 Conclusion
- References
- Fully Convolutional Networks for Continuous Sign Language Recognition
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Main Stream Design
- 3.2 Gloss Feature Enhancement
- 4 Experiments
- 4.1 Experimental Setup
- 4.2 Results
- 4.3 Ablation Studies
- 4.4 Online Recognition
- 5 Conclusions
- References
- Self-adapting Confidence Estimation for Stereo
- 1 Introduction
- 2 Related Work
- 3 Learning a Confidence Measure Out-of-the-Box
- 3.1 Taxonomy of Stereo Matching Systems
- 3.2 Self-supervision Cues for Black-Box Models
- 3.3 Multi-modal Binary Cross Entropy
- 4 Experimental Results
- 4.1 Implementation Details
- 4.2 Ablation Study
- 4.3 Comparison with Offline Methods
- 4.4 Self-adapting In-the-wild
- 5 Conclusion
- References
- Deep Surface Normal Estimation on the 2-Sphere with Confidence Guided Semantic Attention
- 1 Introduction
- 2 Related Work
- 3 Surface Normal Estimation on the 2-Sphere
- 3.1 Network Architecture
- 3.2 Multi-scale Mutual Feature Fusion
- 3.3 2-Sphere vs. 3D Euclidean Space
- 3.4 Loss Function
- 4 Confidence Guided Semantic Attention
- 4.1 Confidence Map for Raw Depth
- 4.2 The CGSA Module
- 5 Experiments
- 5.1 Implementation Details
- 5.2 Comparisons with the State-of-the-Arts
- 5.3 Ablation Study
- 6 Conclusion and Future Work
- References
- AutoSTR: Efficient Backbone Search for Scene Text Recognition
- 1 Introduction
- 2 Related Works
- 2.1 Scene Text Recognition (STR)
- 2.2 Neural Architecture Search (NAS)
- 3 Methodology
- 3.1 Problem Formulation
- 3.2 Search Space
- 3.3 Search Algorithm
- 3.4 Comparison with Other NAS Works
- 4 Experiments
- 4.1 Datasets
- 4.2 Implementation Details
- 4.3 Comparison with State of the Art
- 4.4 Case Study: Searched Backbones and Discussion
- 4.5 Comparison with Other NAS Approaches
- 4.6 Ablation Study
- 5 Conclusion
- References
- Mitigating Embedding and Class Assignment Mismatch in Unsupervised Image Classification
- 1 Introduction
- 2 Related Work
- 2.1 Unsupervised Embedding
- 2.2 Unsupervised Classification
- 3 Model
- 3.1 Stage 1: Unsupervised Deep Embedding
- 3.2 Stage 2: Unsupervised Class Assignment with Refining Pretrained Embeddings
- 4 Experiments
- 4.1 Image Classification Task
- 4.2 Component Analyses
- 4.3 Improvement over SOTA
- 4.4 Implication on Semi-supervised Learning
- 5 Conclusion
- References
- Adversarial Training with Bi-directional Likelihood Regularization for Visual Classification
- 1 Introduction
- 2 Related Work
- 2.1 Adversarial Attacks
- 2.2 Defensive Methods
- 3 Proposed Algorithm
- 3.1 Preliminaries
- 3.2 Modeling the Feature Distribution
- 3.3 Likelihood Regularization for Features
- 4 Experiments
- 4.1 MNIST
- 4.2 CIFAR
- 4.3 Evolution of the Likelihood Regularization
- 4.4 Hyper-parameter Analysis
- 4.5 Adversaries for Training
- 5 Conclusion
- References
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.