
MultiMedia Modeling
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
The two-volume set LNCS 12572 and 1273 constitutes the thoroughly refereed proceedings of the 27th International Conference on MultiMedia Modeling, MMM 2021, held in Prague, Czech Republic, in June2021.
Of the 211 submitted regular papers, 40 papers were selected for oral presentation and 33 for poster presentation; 16 special session papers were accepted as well as 2 papers for a demo presentation and 17 papers for participation at the Video Browser Showdown 2021. The papers cover topics such as: multimedia indexing; multimedia mining; multimedia abstraction and summarization; multimedia annotation, tagging and recommendation; multimodal analysis for retrieval applications; semantic analysis of multimedia and contextual data; multimedia fusion methods; multimedia hyperlinking; media content browsing and retrieval tools; media representation and algorithms; audio, image, video processing, coding and compression; multimedia sensors and interaction modes; multimedia privacy, security and content protection; multimedia standards and related issues; advances in multimedia networking and streaming; multimedia databases, content delivery and transport; wireless and mobile multimedia networking; multi-camera and multi-view systems; augmented and virtual reality, virtual environments; real-time and interactive multimedia applications; mobile multimedia applications; multimedia web applications; multimedia authoring and personalization; interactive multimedia and interfaces; sensor networks; social and educational multimedia applications; and emerging trends.
More details
Other editions
Additional editions

Content
- Intro
- Preface
- Organization
- Contents - Part I
- Contents - Part II
- Crossed-Time Delay Neural Network for Speaker Recognition
- 1 Introduction
- 2 Baseline Models
- 3 Crossed-Time Delay Neural Network
- 3.1 Crossed-Time Delay Layer
- 3.2 Statistical Concatenation
- 4 Experiments
- 4.1 Preprocessing
- 4.2 Model Configuration
- 4.3 Training Parameters Settings
- 4.4 Embedding Extraction and Verification
- 5 Results
- 5.1 VoxCeleb1
- 5.2 Vcc2016
- 6 Conclusion
- References
- An Asymmetric Two-Sided Penalty Term for CT-GAN
- 1 Introduction
- 2 Background
- 2.1 WGAN
- 2.2 WGAN-GP
- 2.3 CT-GAN
- 3 Our Approach
- 3.1 Asymmetric Two-Sided Penalty
- 3.2 WGAN with Asymmetric Two-Sided Penalty
- 4 Experiments
- 4.1 Datasets and Evaluation
- 4.2 Results
- 5 Conclusion
- References
- Fast Discrete Matrix Factorization Hashing for Large-Scale Cross-Modal Retrieval
- 1 Introduction
- 2 Proposed Method
- 2.1 Problem Formulation
- 2.2 Fast Discrete Matrix Factorization Hashing
- 2.3 Optimization Algorithm
- 2.4 Out-of-Sample Extension
- 3 Experiment
- 3.1 Experiment Settings
- 3.2 Experimental Results
- 3.3 Parameter Sensitivity Analysis
- 3.4 Time Cost Analysis
- 4 Conclusion
- References
- Fast Optimal Transport Artistic Style Transfer
- 1 Introduction
- 2 Related Work
- 3 Methodology
- 3.1 Fast Style Transfer Framework
- 3.2 Learn to Style Transfer via Optimal Transport
- 3.3 Optimization Objectives
- 4 Experiments
- 4.1 Implementation Details
- 4.2 Qualitative Analysis
- 4.3 Quantitative Analysis
- 4.4 Ablation Study
- 5 Conclusion
- References
- Stacked Sparse Autoencoder for Audio Object Coding
- 1 Introduction
- 2 Related Work
- 3 Proposed Approach
- 3.1 Structure of SSAE-SAOC
- 3.2 Architecture of Stacked Sparse Autoencoder
- 4 Experimental Evaluation
- 4.1 Experiments Conditions
- 4.2 SSAE Model Training
- 4.3 Test Results and Data Analysis
- 5 Conclusions
- References
- A Collaborative Multi-modal Fusion Method Based on Random Variational Information Bottleneck for Gesture Recognition
- 1 Introduction
- 2 Related Work
- 3 Methodology
- 3.1 Variational Information Bottleneck
- 3.2 Random Variational Information Bottleneck
- 4 Experiment
- 4.1 Data Processing
- 4.2 Experimental Analysis
- 5 Conclusion
- References
- Frame Aggregation and Multi-modal Fusion Framework for Video-Based Person Recognition
- 1 Introduction
- 2 Related Work
- 3 Our Framework
- 3.1 Overview
- 3.2 AttentionVLAD for Frame Aggregation
- 3.3 MLMA for Multi-modal Fusion
- 4 Experiments
- 4.1 Dataset
- 4.2 Results
- 4.3 Implementation Details
- 4.4 Ablation Study
- 5 Conclusion
- References
- An Adaptive Face-Iris Multimodal Identification System Based on Quality Assessment Network
- 1 Introduction
- 2 Proposed System
- 2.1 Preprocessing
- 2.2 Feature Extraction
- 2.3 Matching
- 2.4 FaceIrisQANet
- 2.5 Fusion and Decision
- 3 Experiments and Results
- 3.1 Face-Iris Multimodal Datasets
- 3.2 Comparison with Unimodal Biometrics
- 3.3 Comparison with Non Quality-Based Fusion Rules
- 3.4 Comparison with Other QA Approaches
- 4 Conclusion
- References
- Thermal Face Recognition Based on Multi-scale Image Synthesis
- 1 Introduction
- 2 Related Works
- 2.1 Feature Embedding
- 2.2 Image Transformation
- 3 Thermal Face Recognition
- 3.1 Baseline Model
- 3.2 Proposed Model
- 4 Evaluation
- 4.1 Dataset
- 4.2 Evaluation Protocol
- 4.3 Performance of Thermal Face Recognition
- 5 Conclusion
- References
- Contrastive Learning in Frequency Domain for Non-I.I.D. Image Classification
- 1 Introduction
- 2 Related Work
- 2.1 Non-I.I.D. Image Classification
- 2.2 Contrastive Learning
- 2.3 Learning in the Frequency Domain
- 3 Proposed Method
- 3.1 Contrastive Learning in Frequency Domain for Pre-training
- 3.2 Image Classification with Fine-Tuning
- 4 Experiment
- 4.1 Datasets Description
- 4.2 Experimental Setup
- 4.3 Experimental Results
- 5 Conclusion
- References
- Group Activity Recognition by Exploiting Position Distribution and Appearance Relation
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Framework
- 3.2 Position Distribution Network
- 3.3 Appearance Relation Network
- 4 Experiments
- 4.1 Datasets and Settings
- 4.2 Comparison to the State-of-the-Art
- 4.3 Ablation Studies
- 5 Conclusion
- References
- Multi-branch and Multi-scale Attention Learning for Fine-Grained Visual Categorization
- 1 Introduction
- 2 Related Works
- 2.1 By End-to-End Feature Encoding
- 2.2 By Localization-Classification Subnetworks
- 3 Method
- 3.1 Attention Object Location Module (AOLM)
- 3.2 Attention Part Proposal Module(APPM)
- 3.3 Architecture of MMAL-Net
- 4 Experiments
- 4.1 Datasets
- 4.2 Implementation Details
- 4.3 Performance Comparison
- 4.4 Ablation Studies
- 4.5 Object Localization Performance
- 4.6 Model and Inference Time Complexity
- 4.7 Visualization of Object and Part Regions
- 5 Conclusion
- References
- Dense Attention-Guided Network for Boundary-Aware Salient Object Detection
- 1 Introduction
- 2 Related Work
- 3 Proposed DANet
- 3.1 Dense Attention-Guided Architecture
- 3.2 Residual Attention Module
- 3.3 Feature Aggregation Module
- 3.4 Loss Function
- 4 Experiments
- 4.1 Datasets and Evaluation Metrics
- 4.2 Implementation Details
- 4.3 Comparison with State-of-the-Arts
- 4.4 Ablation Study
- 5 Conclusion
- References
- Generative Image Inpainting by Hybrid Contextual Attention Network
- 1 Introduction
- 2 Related Work
- 3 Hybrid Contextual Attention Network (HCA-Net)
- 3.1 Architecture of HCA-Net
- 3.2 Hybrid Contextual Attention Module
- 3.3 Loss Function
- 4 Experiments
- 4.1 Datasets
- 4.2 Training Details
- 4.3 Performance Evaluation
- 4.4 Ablation Study
- 5 Conclusion and Future Work
- References
- Atypical Lyrics Completion Considering Musical Audio Signals
- 1 Introduction
- 2 Related Work
- 3 Lyrics-Audio Data
- 3.1 Bag-of-Audio-Words
- 4 Atypical Word Completion Model Considering Audio
- 4.1 Model Construction
- 5 Experiments
- 5.1 Comparison Methods
- 5.2 Settings
- 5.3 Results
- 5.4 Ratio of Suggested Atypical Words
- 6 Discussions
- 6.1 Effect of Multilayer Perceptron (MLP)
- 6.2 Examples of Suggested Words
- 7 Conclusion and Future Work
- References
- Improving Supervised Cross-modal Retrieval with Semantic Graph Embedding
- 1 Introduction
- 2 Related Work
- 3 Proposed Method
- 3.1 Problem Formulation
- 3.2 Framework of the Proposed Method
- 3.3 Implementation Details
- 4 Experiment
- 4.1 Datasets and Feature
- 4.2 Evaluation Metric
- 4.3 Comparison Results
- 4.4 Model Analysis
- 4.5 Impact of Different Components
- 5 Conclusion
- References
- Confidence-Based Global Attention Guided Network for Image Inpainting
- 1 Introduction
- 2 Related Work
- 2.1 Learning-Based Image Inpainting
- 2.2 Attention-Based Image Inpainting
- 3 Approach
- 3.1 Overview
- 3.2 Confidence-Based Global Attention Layer
- 3.3 Attention Guided Decoder
- 3.4 Multi-scale Gated Block
- 3.5 Loss Function
- 4 Experiments
- 4.1 Experiment Settings
- 4.2 Qualitative Comparisons
- 4.3 Quantitative Comparisons
- 4.4 Ablation Study
- 5 Conclusion
- References
- Multi-task Deep Learning for No-Reference Screen Content Image Quality Assessment
- 1 Introduction
- 2 The Proposed SCI-IQA Method
- 2.1 Patch-Level IQA Score Prediction
- 2.2 Image-Level IQA Score Generation
- 3 Experimental Results
- 3.1 Databases and Evaluation Methodology
- 3.2 Performance Comparison
- 4 Conclusion
- References
- Language Person Search with Pair-Based Weighting Loss
- 1 Introduction
- 2 Related Work
- 2.1 Language Person Search
- 2.2 Deep Metric Learning
- 3 The Proposed Approach
- 3.1 Baseline Framework
- 3.2 Pair-Based Weighting Loss
- 3.3 Objective Function
- 4 Experiment
- 4.1 Dataset and Implementation Details
- 4.2 Performance Comparison
- 4.3 Visualization of Retrieval Results
- 5 Conclusion
- References
- DeepFusion: Deep Ensembles for Domain Independent System Fusion
- 1 Introduction
- 2 Proposed Method
- 2.1 Dense Networks
- 2.2 Dense Networks with Attention Layers
- 2.3 Dense Networks with Cross-Space-Fusion Layers
- 3 Experimental Setup
- 3.1 Training Protocol
- 3.2 Data Sets
- 3.3 Evaluation
- 4 Results and Discussion
- 4.1 Ablation Studies
- 4.2 Results
- 5 Conclusions and Future Work
- References
- Illuminate Low-Light Image via Coarse-to-fine Multi-level Network
- 1 Introduction
- 2 Method
- 2.1 Coarse-to-fine Multi-level Decouple Network
- 2.2 Discriminate Attention Mechanism
- 2.3 Coarse-to-fine Multi-level Fusion Network
- 3 Experiment
- 3.1 Implementation Details
- 3.2 Comparison
- 3.3 Ablation Study
- 4 Conclusion
- References
- MM-Net: Learning Adaptive Meta-metric for Few-Shot Biometric Recognition
- 1 Introduction
- 2 Related Work
- 3 Proposed Approach
- 3.1 Task Formulation
- 3.2 Model Description
- 3.3 Network Architecture
- 3.4 Training Strategy
- 4 Experiments
- 4.1 Dataset
- 4.2 Experimental Settings
- 4.3 Experimental Results
- 4.4 Discussion
- 5 Conclusion
- References
- A Sentiment Similarity-Oriented Attention Model with Multi-task Learning for Text-Based Emotion Recognition
- 1 Introduction
- 2 Sentiment Similarity-Oriented Attention Model with Multi-task Learning
- 2.1 Sentence Encoder
- 2.2 Sentiment Similarity-Oriented Attention
- 2.3 Multi-task Learning
- 3 Experiments and Analysis
- 3.1 Database and Lexicon
- 3.2 Experimental Setup
- 3.3 Experimental Results and Analysis
- 4 Conclusion
- References
- Locating Visual Explanations for Video Question Answering
- 1 Introduction
- 2 Related Work
- 3 A New Dataset: Activity-QA
- 3.1 Dataset Collection
- 3.2 Dataset Analysis
- 4 The Proposed Model
- 4.1 Problem Formulation
- 4.2 Visual Encoder
- 4.3 Question Encoder
- 4.4 Multi-modal Fusion Module
- 4.5 Prediction Module
- 4.6 Multi-task Loss
- 5 Experiment
- 5.1 Training Sample
- 5.2 Evaluation Metric
- 5.3 Baseline Methods
- 5.4 Experiments on Activity-QA
- 5.5 Experiments on TVQA
- 5.6 Experiments on Traditional VideoQA Task
- 6 Conclusion
- References
- Global Cognition and Local Perception Network for Blind Image Deblurring
- 1 Introduction
- 2 Related Work
- 2.1 Image Deblurring
- 2.2 Attention Mechanism
- 3 Proposed Method
- 3.1 Network Architecture
- 3.2 Global Cognition Module
- 3.3 Local Perception Module
- 3.4 Loss
- 4 Experiments
- 4.1 Details
- 4.2 Metric
- 4.3 Evaluation
- 4.4 Ablation Study
- 5 Conclusion
- References
- Multi-grained Fusion for Conditional Image Retrieval
- 1 Introduction
- 2 Related Work
- 3 Proposed Method
- 3.1 Multi-grained Fusion Module
- 3.2 Online Groups Matching Loss
- 4 Experiments
- 4.1 Experimental Settings
- 4.2 Comparison with Other State-of-the-art Approaches
- 4.3 Ablation Study
- 5 Conclusion
- References
- A Hybrid Music Recommendation Algorithm Based on Attention Mechanism
- 1 Introduction
- 2 Related Works
- 2.1 Content-Based and Hybrid Music Recommendation
- 2.2 Attention-Based Recommendation System
- 3 Method
- 3.1 Generalized Matrix Factorization Layer
- 3.2 Multi-layer Perceptron Layer
- 3.3 Audio Attention Layer
- 3.4 Prediction Layer
- 4 Experiments
- 4.1 Dataset Descriptions
- 4.2 Experimental Settings
- 4.3 Performance Comparison (RQ1)
- 4.4 Whether the Attention Mechanism is Effective (RQ2)
- 5 Conclusions and Future Work
- References
- Few-Shot Learning with Unlabeled Outlier Exposure
- 1 Introduction
- 2 Background
- 2.1 Problem Definition
- 2.2 Prototypical Networks
- 2.3 Learning Vector Quantization
- 3 Few-Shot Learning with Unlabeled Outlier Exposure
- 3.1 Outlier Exposure Loss
- 3.2 Adaptive LVQ Mechanism
- 3.3 Total Loss
- 4 Experiments
- 4.1 Datasets
- 4.2 Implementation Details
- 4.3 Results
- 5 Conclusion
- References
- Fine-Grained Video Deblurring with Event Camera
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Formulation
- 3.2 Overall Architecture
- 3.3 Event Representations
- 3.4 Deblurring Model
- 4 Experiments
- 4.1 Inplementation
- 4.2 Experimental Results
- 5 Conclusion
- References
- Discriminative and Selective Pseudo-Labeling for Domain Adaptation
- 1 Introduction
- 2 Related Work
- 3 Proposed Method
- 3.1 Notations and Problem Definition
- 3.2 Marginal Distributions Alignment
- 3.3 Discriminative Subspace Learning
- 3.4 Pseudo Labels Prediction
- 4 Experiments
- 4.1 Datasets
- 4.2 Experimental Setting
- 4.3 Experimental Results and Analysis
- 4.4 Parameter Sensitivity and Visualization Analysis
- 5 Conclusion
- References
- Multi-level Gate Feature Aggregation with Spatially Adaptive Batch-Instance Normalization for Semantic Image Synthesis
- 1 Introduction
- 2 Related Work
- 2.1 Normalization
- 2.2 Multi-level Feature Fusion
- 2.3 Gating Mechanism
- 3 The Proposed Method
- 3.1 Multi-level Gate Feature Aggregation
- 3.2 Spatially-Adaptive Batch-Instance Normalization
- 3.3 Loss Function
- 4 Experiment
- 4.1 Datasets
- 4.2 Evaluation Metrics
- 4.3 Baselines
- 4.4 Quantitative Comparisons
- 4.5 Qualitative Results
- 4.6 Human Evaluation
- 4.7 Ablation Study
- 5 Conclusion
- References
- Robust Multispectral Pedestrian Detection via Uncertainty-Aware Cross-Modal Learning
- 1 Introduction
- 2 Related Work
- 2.1 Multispectral Pedestrian Detection
- 2.2 Aleatoric Uncertainty
- 3 Proposed Method
- 3.1 Preliminaries
- 3.2 Uncertainty-Aware Cross-Modal Fusion (UCF)
- 3.3 Uncertainty-Aware Feature Learning (UFL)
- 4 Experiments
- 4.1 Experimental Setup
- 4.2 Performance Comparison
- 4.3 Effectiveness Comparison Between Prediction Score and Object Region Uncertainty
- 4.4 Ablation Studies
- 5 Conclusion
- References
- Time-Dependent Body Gesture Representation for Video Emotion Recognition
- 1 Introduction
- 2 Related Work
- 3 Proposed Method
- 3.1 Body Joints Marked
- 3.2 Body Gestures Representation
- 3.3 ACCNN Model
- 4 Experimental Results
- 4.1 Dataset
- 4.2 Experimental Setup
- 4.3 Experimental Results
- 5 Conclusion and Future Work
- References
- MusiCoder: A Universal Music-Acoustic Encoder Based on Transformer
- 1 Introduction
- 2 Related Work
- 3 MusiCoder Model
- 3.1 Input Representation
- 3.2 Transformer Encoder
- 3.3 Pre-training Objectives
- 3.4 MusiCoder Model Setting
- 4 Experiment Setup
- 4.1 Dataset Collection and Preprocess
- 4.2 Training Setup
- 5 Results
- 5.1 Music Genre Classification
- 5.2 Music Auto-tagging
- 6 Conclusion
- References
- DANet: Deformable Alignment Network for Video Inpainting
- 1 Introduction
- 2 Related Work
- 2.1 Image Inpainting
- 2.2 Video Inpainting
- 3 Proposed Method
- 3.1 Deformable Alignment Network Architecture
- 3.2 Loss Functions of Deformable Alignment Network
- 4 Experiments
- 4.1 Datasets Synthesis and Training Details
- 4.2 Ablation Study
- 4.3 Experiments on DAVIS Dataset
- 5 Conclusions and Future Work
- References
- Deep Centralized Cross-modal Retrieval
- 1 Introduction
- 2 Related Work
- 3 Our Approach
- 3.1 Problem Formulation
- 3.2 Network Architecture
- 3.3 Objective Function
- 3.4 Adaptive Learning Schema
- 3.5 Implementation Details
- 4 Experiment
- 4.1 Datasets and Features
- 4.2 Evaluation Metric
- 4.3 Comparison with State-of-the-Art Methods
- 4.4 Visualization of the Learned Representation
- 4.5 Ablation Study
- 5 Conclusion
- References
- Shot Boundary Detection Through Multi-stage Deep Convolution Neural Network
- 1 Introduction
- 2 Related Works
- 3 Methodology
- 3.1 Candidate Shot Boundary Detection
- 3.2 Abrupt Detection
- 3.3 Gradual Transition Detection
- 4 Experiments
- 4.1 Evaluation of Candidate Shot Boundary Detection
- 4.2 Evaluation of Abrupt Detection
- 4.3 Evaluation of Gradual Transition Detection
- 5 Conclusion
- References
- Towards Optimal Multirate Encoding for HTTP Adaptive Streaming
- 1 Introduction
- 2 Related Works
- 3 Single Reference Multirate Encoding
- 3.1 Different Single Reference Multirate Encoding Modes
- 3.2 Experimental Evaluation
- 4 Improved Encoding Mode
- 5 Conclusion
- References
- Fast Mode Decision Algorithm for Intra Encoding of the 3rd Generation Audio Video Coding Standard
- 1 Introduction
- 2 Acceleration Strategies
- 2.1 Progressive Rough Mode Search
- 2.2 Shared Intra Mode RDO Candidates List
- 2.3 RDO Count Constraints
- 3 Experimental Results and Analysis
- 3.1 Progressive Rough Mode Search
- 3.2 Shared Intra Mode RDO Candidates List
- 3.3 RDO Count Constraints
- 3.4 Overall Performance
- 4 Conclusion
- References
- Graph Structure Reasoning Network for Face Alignment and Reconstruction
- 1 Introduction
- 2 Proposed Method
- 2.1 Graph Structure Reasoning Network
- 2.2 Graph Represents Reasoning Loss Function
- 3 Experiments
- 3.1 Data Augmentation and Implementation Details
- 3.2 Comparative Evaluation
- 4 Conclusion
- References
- Game Input with Delay - A Model of the Time Distribution for Selecting a Moving Target with a Mouse
- 1 Introduction
- 2 Related Work
- 2.1 Models of User Input
- 2.2 Game Actions
- 3 Datasets
- 3.1 Game
- 3.2 Procedure
- 4 Modeling Selection Time
- 4.1 Pre-processing
- 4.2 Modeling
- 4.3 Player Skills
- 4.4 Validation
- 5 Evaluation
- 5.1 Player Performance Versus Delay
- 5.2 Win Rate Versus Delay - Skill
- 5.3 Win Rate Versus Delay - Target Speed
- 6 Limitations
- 7 Conclusion
- References
- Unsupervised Temporal Attention Summarization Model for User Created Videos
- 1 Introduction
- 2 Related Work
- 3 Problem Formulation
- 4 Main Components of Our Model
- 4.1 Feedforward Properties Reward (FP-R)
- 4.2 Bi-LSTM- and LSTM-Based Temporal Attention Model (Bi-LTAM)
- 4.3 Generating Video Summarization
- 4.4 Training and Optimization
- 5 Experiments
- 5.1 Implementation Details
- 5.2 Experiment Results
- 6 Conclusions
- References
- Learning from the Negativity: Deep Negative Correlation Meta-Learning for Adversarial Image Classification
- 1 Introduction
- 2 Deep Negative Correlation Meta-Learning Based Approach
- 2.1 Problem Setup
- 2.2 Network Formulation
- 2.3 Network Architecture
- 3 Experiments and Results
- 3.1 Experimental Settings
- 3.2 Comparison Experiment
- 3.3 Ablation Experiment
- 4 Conclusion
- References
- Learning 3D-Craft Generation with Predictive Action Neural Network
- 1 Introduction
- 2 Predictive Action Neural Network
- 2.1 Problem Definition
- 2.2 Ordered Geometry Encoder
- 2.3 Attentive Sequence Encoder
- 2.4 Late Feature Fusion
- 2.5 Two Stream Predictor
- 2.6 Training by Effective Sampling
- 3 Experiments
- 3.1 Dataset
- 3.2 Implementation Details
- 3.3 Order-Aware Generation Task
- 3.4 Order Recovery Task
- 3.5 Ablation Study
- 4 Conclusions
- References
- Unsupervised Multi-shot Person Re-identification via Dynamic Bi-directional Normalized Sparse Representation
- 1 Introduction
- 2 Proposed Method
- 2.1 Normalized Sparse Representation
- 2.2 Bi-directional NSR for Label Estimation
- 2.3 Dynamic Bi-directional Normalized Spare Representation
- 2.4 Person Sequence Matching
- 3 Experimental Results
- 3.1 Implementation Details
- 3.2 Comparison with the State-of-the-Art Methods
- 3.3 Ablation Study
- 3.4 Analysis
- 4 Conclusion
- References
- Classifier Belief Optimization for Visual Categorization
- 1 Introduction
- 2 Related Work
- 3 Classifier Belief Optimization
- 3.1 Definitions and Strategies
- 3.2 Key Procedure of Our Approach
- 4 Experiments
- 4.1 Datasets and Settings
- 4.2 Results and Analysis
- 5 Conclusion
- References
- Fine-Grained Generation for Zero-Shot Learning
- 1 Introduction
- 2 Related Work
- 3 Our Proposed Approach
- 3.1 Definitions and Notations
- 3.2 Overall Procedure
- 3.3 Sample-Level Attribute Generation via GAN (SLA-GAN)
- 3.4 Fine-Grained Sample Generation via GAN (FGS-GAN)
- 4 Experiments
- 4.1 Experiment Settings
- 4.2 Implementation Details
- 4.3 Effects of the New Sample-Level Attributes
- 4.4 Effects of the New Fine-Grained Samples
- 4.5 Results Analysis
- 5 Conclusion
- References
- Fine-Grained Image-Text Retrieval via Complementary Feature Learning
- 1 Introduction
- 2 Related Works
- 3 Method
- 3.1 Problem Formulation
- 3.2 Complementary Features for Image
- 3.3 Complementary Features for Text
- 3.4 Pairwise Dictionary Alignment
- 3.5 Optimization
- 3.6 Score Function
- 4 Experiments
- 4.1 Datasets and Evaluation Metrics
- 4.2 Implementation Details
- 4.3 Comparisons with State-of-the-Art Methods
- 4.4 Instance-Specific Fine-Grained Image-Text Retrieval
- 4.5 Ablation Study
- 4.6 Analysis of Loss Function
- 4.7 Qualitative Results
- 5 Conclusion
- References
- Considering Human Perception and Memory in Interactive Multimedia Retrieval Evaluations
- 1 Introduction
- 2 Related Work
- 3 Methodology
- 4 Implementation
- 5 Evaluation
- 6 Results
- 7 Conclusion and Outlook
- References
- Learning Multi-level Interaction Relations and Feature Representations for Group Activity Recognition
- 1 Introduction
- 2 Approach
- 2.1 The Overview of Our Model
- 2.2 Key-Actor Based Group Pooling Layer
- 2.3 Key-Actor Based Group Unpooling Layer
- 2.4 Multi-level Features of the Group Activity
- 3 Experiments
- 3.1 Datasets and Implementation Details
- 3.2 Ablation Studies
- 3.3 Comparison with the State-of-the-Art
- 4 Conclusion
- References
- A Structured Feature Learning Model for Clothing Keypoints Localization
- 1 Introduction
- 2 Related Works
- 2.1 Clothing Keypoint Localization
- 2.2 Structural Features Learning
- 3 Our Approach
- 3.1 Keypoints Localization Framework
- 3.2 SFLM Model
- 4 Experiments
- 4.1 Datasets and Evaluation
- 4.2 Experimental Details and Results
- 5 Conclusion
- References
- Automatic Pose Quality Assessment for Adaptive Human Pose Refinement
- 1 Introduction
- 2 Related Works
- 2.1 General Human Pose Estimation Methods
- 2.2 Human Pose Refinement Methods
- 3 Method
- 3.1 Human Pose Quality Assessment Model
- 3.2 Base Human Pose Refinement Model
- 3.3 Adaptive Human Pose Refinement
- 4 Experiments
- 4.1 Data Preprocessing
- 4.2 Human Pose Quality Assessment
- 4.3 Human Pose Refinement
- 5 Conclusion and Discussion
- References
- Deep Attributed Network Embedding with Community Information
- 1 Introduction
- 2 Related Work
- 3 Problem Definition
- 4 The Model
- 4.1 Construct Community Triplets
- 4.2 Framework of DNE
- 4.3 DNEC
- 5 Experiment
- 5.1 Experimental Settings
- 5.2 Results and Analysis
- 6 Conclusion
- References
- An Acceleration Framework for Super-Resolution Network via Region Difficulty Self-adaption
- 1 Introduction
- 2 Related Work
- 2.1 Single Image SR and Video SR
- 2.2 Network Acceleration
- 3 Motivation
- 4 Our Architecture
- 4.1 Lightweight Classification Network
- 4.2 Two-Branch SR Network
- 4.3 Training Phase
- 4.4 Inference Phase
- 5 Experiment
- 6 Conclusion
- References
- Spatial Gradient Guided Learning and Semantic Relation Transfer for Facial Landmark Detection
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Grad Loss
- 3.2 MLE for Offset Location
- 3.3 Multi-scale Feature Fusion
- 3.4 Regression Head Design
- 4 Experiments
- 4.1 Datasets
- 4.2 Evaluation Metrics
- 4.3 Implementation Details
- 4.4 Evaluation on Benchmarks
- 4.5 Quality Results
- 5 Ablation Study
- 6 Conclusion
- References
- DVRCNN: Dark Video Post-processing Method for VVC
- 1 Introduction
- 2 Related Works
- 2.1 In-Loop Filters
- 2.2 Post-processing Filters
- 3 Proposed Approach
- 3.1 Overall Framework of Our Proposed Method
- 3.2 Proposed Network Structure
- 3.3 Loss Function
- 4 Experiments and Discussion
- 4.1 Training Models
- 4.2 Comparison with VVC
- 4.3 Validation
- 5 Conclusion
- References
- An Efficient Image Transmission Pipeline for Multimedia Services
- 1 Introduction
- 2 Related Work
- 2.1 Users' Concerns about Data Usage
- 2.2 Image Compression Methods
- 2.3 Image Resampling Methods
- 3 Features in Image Resamplings
- 4 Image Transmission Pipeline
- 4.1 Gradient-Aware Image Resampling
- 4.2 High-Frequency Features Extraction
- 4.3 Quadtree-Based Adaptive Binning
- 4.4 Image Reconstruction
- 5 Evaluation
- 5.1 Default Settings
- 5.2 Image Quality Study
- 5.3 Data Usage Analysis
- 6 Conclusion
- References
- Gaussian Mixture Model Based Semi-supervised Sparse Representation for Face Recognition
- 1 Introduction
- 2 GMM Based Semi-supervised Sparse Representation
- 2.1 Proposed Semi-supervised Sparse Representation
- 2.2 Construction of Dictionary in GSSR
- 3 Experimental Results
- 3.1 Databases and Settings
- 3.2 Experiments on AR Database
- 3.3 Experiments on LFW Database
- 3.4 Experiments on PIE Database
- 4 Discussion and Conclusions
- References
- Correction to: Crossed-Time Delay Neural Network for Speaker Recognition
- Correction to: Chapter "Crossed-Time Delay Neural Network for Speaker Recognition" in: J. Lokoc et al. (Eds.): MultiMedia Modeling, LNCS 12572, https://doi.org/10.1007/978-3-030-67832-6_1
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.