MultiMedia Modeling

Name: MultiMedia Modeling | 27th International Conference, MMM 2021, Prague, Czech Republic, June 22-24, 2021, Proceedings, Part II
Brand: Springer
Price: 53.49 EUR
Availability: OnlineOnly

27th International Conference, MMM 2021, Prague, Czech Republic, June 22-24, 2021, Proceedings, Part II

Jakub Lokoc Tomás Skopal Klaus Schoeffmann Vasileios Mezaris Xirong Li Stefanos Vrochidis Ioannis Patras(Editor)

Springer (Publisher)

Published on 22. January 2021

XXV, 501 pages

E-Book

PDF with digital watermarking

System requirements

978-3-030-67835-7 (ISBN)

€53.49incl. 7% vat

System requirements

for PDF with digital watermarking

E-Book Single Licence

Available for download

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

The two-volume set LNCS 12572 and 1273 constitutes the thoroughly refereed proceedings of the 27th International Conference on MultiMedia Modeling, MMM 2021, held in Prague, Czech Republic, in June2021.

Of the 211 submitted regular papers, 40 papers were selected for oral presentation and 33 for poster presentation; 16 special session papers were accepted as well as 2 papers for a demo presentation and 17 papers for participation at the Video Browser Showdown 2021. The papers cover topics such as: multimedia indexing; multimedia mining; multimedia abstraction and summarization; multimedia annotation, tagging and recommendation; multimodal analysis for retrieval applications; semantic analysis of multimedia and contextual data; multimedia fusion methods; multimedia hyperlinking; media content browsing and retrieval tools; media representation and algorithms; audio, image, video processing, coding and compression; multimedia sensors and interaction modes; multimedia privacy, security and content protection; multimedia standards and related issues; advances in multimedia networking and streaming; multimedia databases, content delivery and transport; wireless and mobile multimedia networking; multi-camera and multi-view systems; augmented and virtual reality, virtual environments; real-time and interactive multimedia applications; mobile multimedia applications; multimedia web applications; multimedia authoring and personalization; interactive multimedia and interfaces; sensor networks; social and educational multimedia applications; and emerging trends.

More details

Other editions

Content

Intro
Preface
Organization
Contents - Part II
Contents - Part I
MSCANet: Adaptive Multi-scale Context Aggregation Network for Congested Crowd Counting
1 Introduction
2 Related Work
3 Proposed Method
3.1 Multi-scale Context Aggregation Module
3.2 Multi-scale Context Aggregation Network
3.3 Compared to Other Context Modules
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Evaluation Metrics
4.4 Comparison with State-of-the-Arts
4.5 Ablation Study
5 Conclusion
References
Tropical Cyclones Tracking Based on Satellite Cloud Images: Database and Comprehensive Study
1 Introduction
2 The Proposed TCTSCI Database
2.1 Data Preprocessing
2.2 Annotation
2.3 Attributes
3 Evaluation
3.1 Evaluation Metric
3.2 Evaluated Trackers
3.3 Evaluation Results with OPE
3.4 Evaluation Results with EAO
3.5 Analysis
4 Conclusion
References
Image Registration Improved by Generative Adversarial Networks
1 Introduction
2 Proposed Method
2.1 Background
2.2 Proposed Network Structure
2.3 Loss Function
3 Experiments
3.1 Dataset
3.2 Implementation Details
3.3 Results
4 Conclusion
References
Deep 3D Modeling of Human Bodies from Freehand Sketching
1 Introduction
2 Related Work
3 Our Method
3.1 Intermediate Skeleton Construction
3.2 Joint-Wise Pose Regression
3.3 Loss
4 Experiments and Discussion
4.1 Dataset
4.2 Network Details and Training Settings
4.3 Results and Discussion
4.4 Body Modeling by Freehand Sketching
5 Conclusions
References
Two-Stage Real-Time Multi-object Tracking with Candidate Selection
1 Introduction
2 Related Work
2.1 Tracking-by-Detection Methods
2.2 Simultaneous Detection and Tracking Methods
3 Proposed Method
3.1 Backbone Network
3.2 Two Branches
3.3 Candidate Selection
3.4 Cascade Data Association
4 Experiments
4.1 Datasets and Metrics
4.2 Implementation Details
4.3 Experimental Results
5 Conclusion
References
Tell as You Imagine: Sentence Imageability-Aware Image Captioning
1 Introduction
2 Related Work
3 Image Captioning Considering Imageability
3.1 Data Augmentation
3.2 Sentence Imageability Calculation
3.3 Image Captioning
4 Evaluation
4.1 Environment
4.2 Analysis on the Sentence Imageability Scores
4.3 Evaluation of Image Captioning Results
4.4 Subjective Evaluation
5 Conclusion
References
Deep Face Swapping via Cross-Identity Adversarial Training
1 Introduction
2 Related Works
3 Our Approach
3.1 Network Architecture
3.2 Model Objective
4 Implementation
5 Experiments and Analysis
5.1 Qualitative Results
5.2 Quantitative Results
5.3 Ablation Study
5.4 Difficult Cases
6 Conclusion
References
Res2-Unet: An Enhanced Network for Generalized Nuclear Segmentation in Pathological Images
1 Introduction
2 Proposed Method
2.1 Pre-processing and Patch Extraction
2.2 Res2-Unet
2.3 Post-processing
2.4 Loss Function
3 Experiments
3.1 Dataset and Implementation Detail
3.2 Experiments on MoNuSeg
3.3 Experiments on TN-BC
3.4 Ablation Study
4 Conclusion
References
Automatic Diagnosis of Glaucoma on Color Fundus Images Using Adaptive Mask Deep Network
1 Introduction
2 Method
3 Experiments
3.1 Experimental Setup
3.2 Results and Discussion
4 Conclusion
References
Initialize with Mask: For More Efficient Federated Learning
1 Introduction
2 Related Work
3 Federated Mask
3.1 Motivation
3.2 Initialize with Fisher Information Matrix
3.3 Local Training with MMD Constraint
3.4 FedMask Implementation
4 Experiments
4.1 Experimental Setup
4.2 Results and Analysis
5 Conclusion
References
Unsupervised Gaze: Exploration of Geometric Constraints for 3D Gaze Estimation
1 Introduction
2 Related Work
3 Unsupervised Gaze Estimation Framework
4 Experiments
4.1 Dataset
4.2 Training Configuration
4.3 Qualitative Results and Analysis
4.4 Quantitative Results and Analysis
5 Conclusion
References
Median-Pooling Grad-CAM: An Efficient Inference Level Visual Explanation for CNN Networks in Remote Sensing Image Classification
1 Introduction
2 Gradient-Based Visual Explanation on Remote Sensing Image Classification
2.1 Introduction to Grad-CAM Techniques
2.2 Performance Analysis of Grad-CAM Techniques
3 Approach
3.1 Median-Pooling Grad-CAM
3.2 Computation Complexity
3.3 Evaluation Metric of Confidence Drop
4 Experiments
4.1 Datasets
4.2 Results on Remote Sensing Images
5 Conclusion
References
Multi-granularity Recurrent Attention Graph Neural Network for Few-Shot Learning
1 Introduction
2 Related Works
3 The Proposed Method
3.1 Model
3.2 Training
4 Experiments
4.1 Datasets
4.2 Experimental Setups
4.3 Few-Shot Classification
4.4 Ablation Studies
5 Conclusion
References
EEG Emotion Recognition Based on Channel Attention for E-Healthcare Applications
1 Introduction
2 Emotional EEG Databases
3 Method
3.1 Feature Extraction
3.2 Attention Mechanisms and Deep Residual Networks
4 Experiments
4.1 Experiment Setup
4.2 Performance Comparison Among Relevant Methods
4.3 Conclusion
References
The MovieWall: A New Interface for Browsing Large Video Collections
1 Introduction
2 Related Work
3 Pilot Study
4 Detailed User Study
4.1 Implementation and Study Design
4.2 Results and Discussion
5 Conclusion
References
Keystroke Dynamics as Part of Lifelogging
1 Introduction
2 Keystroke Dynamics
3 Collecting Keystroke Data
4 Data Analysis
5 Conclusions
References
HTAD: A Home-Tasks Activities Dataset with Wrist-Accelerometer and Audio Features
1 Introduction
2 Dataset Details
3 Feature Extraction
3.1 Accelerometer Features
3.2 Audio Features
4 Dataset Structure
5 Baseline Experiments
6 Conclusions
References
MNR-Air: An Economic and Dynamic Crowdsourcing Mechanism to Collect Personal Lifelog and Surrounding Environment Dataset. A Case Study in Ho Chi Minh City, Vietnam
1 Introduction
2 System Architecture and Hardware
3 The MNR-HCM Dataset
3.1 Data Collection
3.2 Data Collection Route
3.3 Data Description
4 Air Pollution and Traffic Risk Map
4.1 Motivation and Purposes
4.2 Methodology
4.3 AQI-T-RM Architecture
4.4 Discussion
5 Conclusion and Future Work
References
Kvasir-Instrument: Diagnostic and Therapeutic Tool Segmentation Dataset in Gastrointestinal Endoscopy
1 Introduction
2 Related Work
3 Kvasir-Instrument Dataset
3.1 Data Acquisition
3.2 Annotation Strategy
4 Benchmarking, Results and Discussion
4.1 Baseline Methods
4.2 Implementation Details
4.3 Evaluation Metrics
4.4 Quantitative and Qualitative Results
4.5 Discussion
5 Conclusion
References
CatMeows: A Publicly-Available Dataset of Cat Vocalizations
1 Introduction
2 Building the Dataset
2.1 Design Choices
2.2 Capturing Audio Signals
2.3 Post-processing
3 Composition of the Dataset
4 Application Scenarios
4.1 Proposed Scenarios
4.2 Example
5 Conclusions
References
Search and Explore Strategies for Interactive Analysis of Real-Life Image Collections with Unknown and Unique Categories
1 Introduction
2 Related Work
3 Proposed Method
4 Evaluation and Discussion
4.1 User Experiments
5 Conclusion
References
Graph-Based Indexing and Retrieval of Lifelog Data
1 Introduction
2 Related Work
3 Dataset
4 Graph Generation
4.1 Image to Graph
4.2 Query to Graph
5 Image Retrieval
5.1 Graph Embedding
5.2 Similarity Score
6 Experiments
7 Results and Discussion
8 Conclusion
References
On Fusion of Learned and Designed Features for Video Data Analytics
1 Introduction
1.1 Running Use-Case in Urban Settings
2 Related Work
2.1 Multi-modal Retrieval and Feature Fusion
2.2 Object Detection
2.3 Identification and Tracking
2.4 General Video Analytics and Retrieval
3 Framework
3.1 Database Indexing
3.2 Analytics over High-Level Features
4 Urban Use Case
5 System Architecture
5.1 Architecture for Real-Time Analytics
6 Conclusions
References
XQM: Interactive Learning on Mobile Phones
1 Introduction
2 Related Work
3 XQM Architecture
4 System Performance Evaluation
5 User Interface Evaluation
6 Conclusion
References
A Multimodal Tensor-Based Late Fusion Approach for Satellite Image Search in Sentinel 2 Images
1 Introduction
2 Related Work
3 Methodology
3.1 Late Fusion of Multiple Modalities
3.2 Visual Similarity Search
3.3 Visual Concept Search
3.4 Spatial and Temporal Search
4 Experiments
4.1 Dataset Description
4.2 Settings
4.3 Results
5 Conclusions
References
Canopy Height Estimation from Spaceborne Imagery Using Convolutional Encoder-Decoder
1 Introduction
2 Materials
3 Methodology
4 Experimental Results
4.1 Results
4.2 Estimation Error Analysis
5 Discussion
References
Implementation of a Random Forest Classifier to Examine Wildfire Predictive Modelling in Greece Using Diachronically Collected Fire Occurrence and Fire Mapping Data
1 Introduction
2 Study Area - Training Data Set
2.1 Study Area
2.2 Data Resources
3 Methodological Approach and Implementation
3.1 Data Archiving and Modelling
3.2 Feature Ranking
3.3 Random Forest (RF) Algorithm Implementation
4 Results
4.1 The Spearman's Correlation
4.2 The Chi-Squared Tests
4.3 Comparative Feature Ranking
4.4 The RF Classifier Results
5 Conclusion
References
Mobile eHealth Platform for Home Monitoring of Bipolar Disorder
1 Introduction
2 Related Work
3 The MoodRecord Application
3.1 Mood Detection from Voice
3.2 Mood Detection from Video
4 Pilot Testing
5 Conclusions and Future Work
References
Multimodal Sensor Data Analysis for Detection of Risk Situations of Fragile People in @home Environments
1 Introduction
2 State of the Art
3 Risk Definition Scenarios and Preliminary Risk Detection on LSC Dataset
3.1 Risk Definition Scenarios
3.2 Detecting Risk Situations Using Sensor Signals on a LSC Dataset
4 Data Collection for Risk Prevention
4.1 Data Recording Protocol
4.2 BIRDS Corpus Description
4.3 Corpus Annotation
5 Conclusion
References
Towards the Development of a Trustworthy Chatbot for Mental Health Applications
1 Introduction
2 Related Work
2.1 Chatbots and Mental Health
2.2 Human-Computer Trust
3 Development of Trustworthy Agents
3.1 Scenario
3.2 Dialogue Design
3.3 System Overview
4 Pilot Study
4.1 Study Setup
4.2 Results
4.3 Perspectives of Trustworthy Agents for Mental Health Applications
5 Conclusion
References
Fusion of Multimodal Sensor Data for Effective Human Action Recognition in the Service of Medical Platforms
1 Introduction
2 Related Work
3 Methodology
3.1 Inertial Sensors
3.2 Depth Sensors
3.3 Sensor Fusion
4 Experiments and Results
4.1 Dataset and Evaluation Description
4.2 Inertial Sensor Performance Analysis
4.3 Depth Sensor Performance Analysis
4.4 Sensor Fusion Performance Analysis
4.5 Comparison with State-of-the-Art
5 Conclusions
References
SpotifyGraph: Visualisation of User's Preferences in Music
1 Introduction
2 SpotifyGraph Application
2.1 Main View
2.2 Single Cluster View
3 Conclusions and Future Work
References
A System for Interactive Multimedia Retrieval Evaluations
1 Introduction
2 Related Work
3 DRES: System Overview
3.1 Capabilities
3.2 Architecture
3.3 Demonstration
4 Conclusion and Outlook
References
SQL-Like Interpretable Interactive Video Search
1 Introduction
2 Dual-Task Model for Real-Time Interactive Search
3 The SQL-Like Interface
4 The Vireo Video Search System
5 Conclusion
References
VERGE in VBS 2021
1 Introduction
2 Video Retrieval Framework
2.1 Visual Similarity Search
2.2 Concept-Based Retrieval
2.3 Text to Video Matching Module
2.4 Face Detection
2.5 Video Captioning - Caption-Based Search
2.6 Activity Recognition
2.7 Multimodal Fusion and Temporal Search
3 VERGE User Interface and Interaction Modes
4 Future Work
References
NoShot Video Browser at VBS2021
1 Introduction
2 NoShot Video Browser
3 Time Cache
4 The NoShot GUI
5 Improvements
6 Conclusion
References
Exquisitor at the Video Browser Showdown 2021: Relationships Between Semantic Classifiers
1 Introduction
2 Exquisitor
3 Operations on Semantic Classifier Rankings
4 Conclusions
References
VideoGraph - Towards Using Knowledge Graphs for Interactive Video Retrieval
1 Introduction
2 VideoGraph Construction
2.1 Wikidata
2.2 Semantic Video Metadata
2.3 Textual Semantic Information from Video
2.4 Visual Semantic Information from Video
2.5 Technical Video Metadata
3 VideoGraph Exploration
3.1 Query Formulation
3.2 Graph Exploration
3.3 Graph Extension
3.4 User Interaction
4 Conclusion
References
IVIST: Interactive Video Search Tool in VBS 2021
1 Introduction
2 Overall Architecture of IVIST
3 Main Functions in IVIST
3.1 Existing Capabilities
3.2 Action Recognition
3.3 Place Recognition
3.4 Description Searching
4 Conclusion
References
Video Search with Collage Queries
1 Introduction
2 System Overview
2.1 Additional Notes
3 Conclusions
References
Towards Explainable Interactive Multi-modal Video Retrieval with Vitrivr
1 Introduction
2 vitrivr
3 Temporal Querying
4 Index Structures for Similarity Search
5 Towards the Explainability of Search Results
6 Conclusion
References
Competitive Interactive Video Retrieval in Virtual Reality with vitrivr-VR
1 Introduction
2 VR Multimedia Retrieval Interfaces
3 System Overview
4 Querying Mechanisms
5 Interactive Retrieval Process in VR
5.1 Initial Query
5.2 Result Organisation
5.3 Refinement Queries
6 Conclusion
References
An Interactive Video Search Tool: A Case Study Using the V3C1 Dataset
1 Introduction
2 Search Tool Architecture
2.1 Video Dataset
2.2 Query Tasks
2.3 Image Representation Metadata
2.4 Storage and Indexing
2.5 Searching Approach
3 Conclusion
References
Less is More - diveXplore 5.0 at VBS 2021
1 Introduction
2 diveXplore 5.0
2.1 Architecture
2.2 Features
3 Conclusion
References
SOMHunter V2 at Video Browser Showdown 2021
1 Introduction
2 Newly Included Text Querying Options
2.1 Localized Text Queries
2.2 Text Query Vector Relocation
2.3 User Interface
3 Conclusion
References
W2VV++ BERT Model at VBS 2021
1 Introduction
2 System Overview
3 Context-Aware Query Ranker
4 Conclusion
References
VISIONE at Video Browser Showdown 2021
1 Introduction
2 VISIONE Video Search System
3 New VISIONE Functionalities for VBS 2021
4 Conclusion and Future Work
References
IVOS - The ITEC Interactive Video Object Search System at VBS2021
1 Introduction
2 Object-Based Exploratory Search
3 The IVOS User Interface
4 Summary
References
Video Search with Sub-Image Keyword Transfer Using Existing Image Archives
1 Introduction
2 Automatic Keywording
3 Improved Image Feature Vectors
4 Search Result Visualization
5 Search System
References
A VR Interface for Browsing Visual Spaces at VBS2021
1 Introduction
2 Related Systems
3 An Overview of EOLAS
3.1 Source Data
3.2 Search Engine
3.3 User Interaction
4 Conclusions
References
Correction to: SQL-Like Interpretable Interactive Video Search
Correction to: Chapter "SQL-Like Interpretable Interactive Video Search" in: J. Lokoc et al. (Eds.): MultiMedia Modeling, LNCS 12573, https://doi.org/10.1007/978-3-030-67835-7_34
Author Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

MultiMedia Modeling

Description

More details

Other editions

Additional editions

Content

System requirements