
MultiMedia Modeling
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
The two-volume set LNCS 12572 and 1273 constitutes the thoroughly refereed proceedings of the 27th International Conference on MultiMedia Modeling, MMM 2021, held in Prague, Czech Republic, in June2021.
Of the 211 submitted regular papers, 40 papers were selected for oral presentation and 33 for poster presentation; 16 special session papers were accepted as well as 2 papers for a demo presentation and 17 papers for participation at the Video Browser Showdown 2021. The papers cover topics such as: multimedia indexing; multimedia mining; multimedia abstraction and summarization; multimedia annotation, tagging and recommendation; multimodal analysis for retrieval applications; semantic analysis of multimedia and contextual data; multimedia fusion methods; multimedia hyperlinking; media content browsing and retrieval tools; media representation and algorithms; audio, image, video processing, coding and compression; multimedia sensors and interaction modes; multimedia privacy, security and content protection; multimedia standards and related issues; advances in multimedia networking and streaming; multimedia databases, content delivery and transport; wireless and mobile multimedia networking; multi-camera and multi-view systems; augmented and virtual reality, virtual environments; real-time and interactive multimedia applications; mobile multimedia applications; multimedia web applications; multimedia authoring and personalization; interactive multimedia and interfaces; sensor networks; social and educational multimedia applications; and emerging trends.
More details
Other editions
Additional editions

Content
- Intro
- Preface
- Organization
- Contents - Part II
- Contents - Part I
- MSCANet: Adaptive Multi-scale Context Aggregation Network for Congested Crowd Counting
- 1 Introduction
- 2 Related Work
- 3 Proposed Method
- 3.1 Multi-scale Context Aggregation Module
- 3.2 Multi-scale Context Aggregation Network
- 3.3 Compared to Other Context Modules
- 4 Experiments
- 4.1 Datasets
- 4.2 Implementation Details
- 4.3 Evaluation Metrics
- 4.4 Comparison with State-of-the-Arts
- 4.5 Ablation Study
- 5 Conclusion
- References
- Tropical Cyclones Tracking Based on Satellite Cloud Images: Database and Comprehensive Study
- 1 Introduction
- 2 The Proposed TCTSCI Database
- 2.1 Data Preprocessing
- 2.2 Annotation
- 2.3 Attributes
- 3 Evaluation
- 3.1 Evaluation Metric
- 3.2 Evaluated Trackers
- 3.3 Evaluation Results with OPE
- 3.4 Evaluation Results with EAO
- 3.5 Analysis
- 4 Conclusion
- References
- Image Registration Improved by Generative Adversarial Networks
- 1 Introduction
- 2 Proposed Method
- 2.1 Background
- 2.2 Proposed Network Structure
- 2.3 Loss Function
- 3 Experiments
- 3.1 Dataset
- 3.2 Implementation Details
- 3.3 Results
- 4 Conclusion
- References
- Deep 3D Modeling of Human Bodies from Freehand Sketching
- 1 Introduction
- 2 Related Work
- 3 Our Method
- 3.1 Intermediate Skeleton Construction
- 3.2 Joint-Wise Pose Regression
- 3.3 Loss
- 4 Experiments and Discussion
- 4.1 Dataset
- 4.2 Network Details and Training Settings
- 4.3 Results and Discussion
- 4.4 Body Modeling by Freehand Sketching
- 5 Conclusions
- References
- Two-Stage Real-Time Multi-object Tracking with Candidate Selection
- 1 Introduction
- 2 Related Work
- 2.1 Tracking-by-Detection Methods
- 2.2 Simultaneous Detection and Tracking Methods
- 3 Proposed Method
- 3.1 Backbone Network
- 3.2 Two Branches
- 3.3 Candidate Selection
- 3.4 Cascade Data Association
- 4 Experiments
- 4.1 Datasets and Metrics
- 4.2 Implementation Details
- 4.3 Experimental Results
- 5 Conclusion
- References
- Tell as You Imagine: Sentence Imageability-Aware Image Captioning
- 1 Introduction
- 2 Related Work
- 3 Image Captioning Considering Imageability
- 3.1 Data Augmentation
- 3.2 Sentence Imageability Calculation
- 3.3 Image Captioning
- 4 Evaluation
- 4.1 Environment
- 4.2 Analysis on the Sentence Imageability Scores
- 4.3 Evaluation of Image Captioning Results
- 4.4 Subjective Evaluation
- 5 Conclusion
- References
- Deep Face Swapping via Cross-Identity Adversarial Training
- 1 Introduction
- 2 Related Works
- 3 Our Approach
- 3.1 Network Architecture
- 3.2 Model Objective
- 4 Implementation
- 5 Experiments and Analysis
- 5.1 Qualitative Results
- 5.2 Quantitative Results
- 5.3 Ablation Study
- 5.4 Difficult Cases
- 6 Conclusion
- References
- Res2-Unet: An Enhanced Network for Generalized Nuclear Segmentation in Pathological Images
- 1 Introduction
- 2 Proposed Method
- 2.1 Pre-processing and Patch Extraction
- 2.2 Res2-Unet
- 2.3 Post-processing
- 2.4 Loss Function
- 3 Experiments
- 3.1 Dataset and Implementation Detail
- 3.2 Experiments on MoNuSeg
- 3.3 Experiments on TN-BC
- 3.4 Ablation Study
- 4 Conclusion
- References
- Automatic Diagnosis of Glaucoma on Color Fundus Images Using Adaptive Mask Deep Network
- 1 Introduction
- 2 Method
- 3 Experiments
- 3.1 Experimental Setup
- 3.2 Results and Discussion
- 4 Conclusion
- References
- Initialize with Mask: For More Efficient Federated Learning
- 1 Introduction
- 2 Related Work
- 3 Federated Mask
- 3.1 Motivation
- 3.2 Initialize with Fisher Information Matrix
- 3.3 Local Training with MMD Constraint
- 3.4 FedMask Implementation
- 4 Experiments
- 4.1 Experimental Setup
- 4.2 Results and Analysis
- 5 Conclusion
- References
- Unsupervised Gaze: Exploration of Geometric Constraints for 3D Gaze Estimation
- 1 Introduction
- 2 Related Work
- 3 Unsupervised Gaze Estimation Framework
- 4 Experiments
- 4.1 Dataset
- 4.2 Training Configuration
- 4.3 Qualitative Results and Analysis
- 4.4 Quantitative Results and Analysis
- 5 Conclusion
- References
- Median-Pooling Grad-CAM: An Efficient Inference Level Visual Explanation for CNN Networks in Remote Sensing Image Classification
- 1 Introduction
- 2 Gradient-Based Visual Explanation on Remote Sensing Image Classification
- 2.1 Introduction to Grad-CAM Techniques
- 2.2 Performance Analysis of Grad-CAM Techniques
- 3 Approach
- 3.1 Median-Pooling Grad-CAM
- 3.2 Computation Complexity
- 3.3 Evaluation Metric of Confidence Drop
- 4 Experiments
- 4.1 Datasets
- 4.2 Results on Remote Sensing Images
- 5 Conclusion
- References
- Multi-granularity Recurrent Attention Graph Neural Network for Few-Shot Learning
- 1 Introduction
- 2 Related Works
- 3 The Proposed Method
- 3.1 Model
- 3.2 Training
- 4 Experiments
- 4.1 Datasets
- 4.2 Experimental Setups
- 4.3 Few-Shot Classification
- 4.4 Ablation Studies
- 5 Conclusion
- References
- EEG Emotion Recognition Based on Channel Attention for E-Healthcare Applications
- 1 Introduction
- 2 Emotional EEG Databases
- 3 Method
- 3.1 Feature Extraction
- 3.2 Attention Mechanisms and Deep Residual Networks
- 4 Experiments
- 4.1 Experiment Setup
- 4.2 Performance Comparison Among Relevant Methods
- 4.3 Conclusion
- References
- The MovieWall: A New Interface for Browsing Large Video Collections
- 1 Introduction
- 2 Related Work
- 3 Pilot Study
- 4 Detailed User Study
- 4.1 Implementation and Study Design
- 4.2 Results and Discussion
- 5 Conclusion
- References
- Keystroke Dynamics as Part of Lifelogging
- 1 Introduction
- 2 Keystroke Dynamics
- 3 Collecting Keystroke Data
- 4 Data Analysis
- 5 Conclusions
- References
- HTAD: A Home-Tasks Activities Dataset with Wrist-Accelerometer and Audio Features
- 1 Introduction
- 2 Dataset Details
- 3 Feature Extraction
- 3.1 Accelerometer Features
- 3.2 Audio Features
- 4 Dataset Structure
- 5 Baseline Experiments
- 6 Conclusions
- References
- MNR-Air: An Economic and Dynamic Crowdsourcing Mechanism to Collect Personal Lifelog and Surrounding Environment Dataset. A Case Study in Ho Chi Minh City, Vietnam
- 1 Introduction
- 2 System Architecture and Hardware
- 3 The MNR-HCM Dataset
- 3.1 Data Collection
- 3.2 Data Collection Route
- 3.3 Data Description
- 4 Air Pollution and Traffic Risk Map
- 4.1 Motivation and Purposes
- 4.2 Methodology
- 4.3 AQI-T-RM Architecture
- 4.4 Discussion
- 5 Conclusion and Future Work
- References
- Kvasir-Instrument: Diagnostic and Therapeutic Tool Segmentation Dataset in Gastrointestinal Endoscopy
- 1 Introduction
- 2 Related Work
- 3 Kvasir-Instrument Dataset
- 3.1 Data Acquisition
- 3.2 Annotation Strategy
- 4 Benchmarking, Results and Discussion
- 4.1 Baseline Methods
- 4.2 Implementation Details
- 4.3 Evaluation Metrics
- 4.4 Quantitative and Qualitative Results
- 4.5 Discussion
- 5 Conclusion
- References
- CatMeows: A Publicly-Available Dataset of Cat Vocalizations
- 1 Introduction
- 2 Building the Dataset
- 2.1 Design Choices
- 2.2 Capturing Audio Signals
- 2.3 Post-processing
- 3 Composition of the Dataset
- 4 Application Scenarios
- 4.1 Proposed Scenarios
- 4.2 Example
- 5 Conclusions
- References
- Search and Explore Strategies for Interactive Analysis of Real-Life Image Collections with Unknown and Unique Categories
- 1 Introduction
- 2 Related Work
- 3 Proposed Method
- 4 Evaluation and Discussion
- 4.1 User Experiments
- 5 Conclusion
- References
- Graph-Based Indexing and Retrieval of Lifelog Data
- 1 Introduction
- 2 Related Work
- 3 Dataset
- 4 Graph Generation
- 4.1 Image to Graph
- 4.2 Query to Graph
- 5 Image Retrieval
- 5.1 Graph Embedding
- 5.2 Similarity Score
- 6 Experiments
- 7 Results and Discussion
- 8 Conclusion
- References
- On Fusion of Learned and Designed Features for Video Data Analytics
- 1 Introduction
- 1.1 Running Use-Case in Urban Settings
- 2 Related Work
- 2.1 Multi-modal Retrieval and Feature Fusion
- 2.2 Object Detection
- 2.3 Identification and Tracking
- 2.4 General Video Analytics and Retrieval
- 3 Framework
- 3.1 Database Indexing
- 3.2 Analytics over High-Level Features
- 4 Urban Use Case
- 5 System Architecture
- 5.1 Architecture for Real-Time Analytics
- 6 Conclusions
- References
- XQM: Interactive Learning on Mobile Phones
- 1 Introduction
- 2 Related Work
- 3 XQM Architecture
- 4 System Performance Evaluation
- 5 User Interface Evaluation
- 6 Conclusion
- References
- A Multimodal Tensor-Based Late Fusion Approach for Satellite Image Search in Sentinel 2 Images
- 1 Introduction
- 2 Related Work
- 3 Methodology
- 3.1 Late Fusion of Multiple Modalities
- 3.2 Visual Similarity Search
- 3.3 Visual Concept Search
- 3.4 Spatial and Temporal Search
- 4 Experiments
- 4.1 Dataset Description
- 4.2 Settings
- 4.3 Results
- 5 Conclusions
- References
- Canopy Height Estimation from Spaceborne Imagery Using Convolutional Encoder-Decoder
- 1 Introduction
- 2 Materials
- 3 Methodology
- 4 Experimental Results
- 4.1 Results
- 4.2 Estimation Error Analysis
- 5 Discussion
- References
- Implementation of a Random Forest Classifier to Examine Wildfire Predictive Modelling in Greece Using Diachronically Collected Fire Occurrence and Fire Mapping Data
- 1 Introduction
- 2 Study Area - Training Data Set
- 2.1 Study Area
- 2.2 Data Resources
- 3 Methodological Approach and Implementation
- 3.1 Data Archiving and Modelling
- 3.2 Feature Ranking
- 3.3 Random Forest (RF) Algorithm Implementation
- 4 Results
- 4.1 The Spearman's Correlation
- 4.2 The Chi-Squared Tests
- 4.3 Comparative Feature Ranking
- 4.4 The RF Classifier Results
- 5 Conclusion
- References
- Mobile eHealth Platform for Home Monitoring of Bipolar Disorder
- 1 Introduction
- 2 Related Work
- 3 The MoodRecord Application
- 3.1 Mood Detection from Voice
- 3.2 Mood Detection from Video
- 4 Pilot Testing
- 5 Conclusions and Future Work
- References
- Multimodal Sensor Data Analysis for Detection of Risk Situations of Fragile People in @home Environments
- 1 Introduction
- 2 State of the Art
- 3 Risk Definition Scenarios and Preliminary Risk Detection on LSC Dataset
- 3.1 Risk Definition Scenarios
- 3.2 Detecting Risk Situations Using Sensor Signals on a LSC Dataset
- 4 Data Collection for Risk Prevention
- 4.1 Data Recording Protocol
- 4.2 BIRDS Corpus Description
- 4.3 Corpus Annotation
- 5 Conclusion
- References
- Towards the Development of a Trustworthy Chatbot for Mental Health Applications
- 1 Introduction
- 2 Related Work
- 2.1 Chatbots and Mental Health
- 2.2 Human-Computer Trust
- 3 Development of Trustworthy Agents
- 3.1 Scenario
- 3.2 Dialogue Design
- 3.3 System Overview
- 4 Pilot Study
- 4.1 Study Setup
- 4.2 Results
- 4.3 Perspectives of Trustworthy Agents for Mental Health Applications
- 5 Conclusion
- References
- Fusion of Multimodal Sensor Data for Effective Human Action Recognition in the Service of Medical Platforms
- 1 Introduction
- 2 Related Work
- 3 Methodology
- 3.1 Inertial Sensors
- 3.2 Depth Sensors
- 3.3 Sensor Fusion
- 4 Experiments and Results
- 4.1 Dataset and Evaluation Description
- 4.2 Inertial Sensor Performance Analysis
- 4.3 Depth Sensor Performance Analysis
- 4.4 Sensor Fusion Performance Analysis
- 4.5 Comparison with State-of-the-Art
- 5 Conclusions
- References
- SpotifyGraph: Visualisation of User's Preferences in Music
- 1 Introduction
- 2 SpotifyGraph Application
- 2.1 Main View
- 2.2 Single Cluster View
- 3 Conclusions and Future Work
- References
- A System for Interactive Multimedia Retrieval Evaluations
- 1 Introduction
- 2 Related Work
- 3 DRES: System Overview
- 3.1 Capabilities
- 3.2 Architecture
- 3.3 Demonstration
- 4 Conclusion and Outlook
- References
- SQL-Like Interpretable Interactive Video Search
- 1 Introduction
- 2 Dual-Task Model for Real-Time Interactive Search
- 3 The SQL-Like Interface
- 4 The Vireo Video Search System
- 5 Conclusion
- References
- VERGE in VBS 2021
- 1 Introduction
- 2 Video Retrieval Framework
- 2.1 Visual Similarity Search
- 2.2 Concept-Based Retrieval
- 2.3 Text to Video Matching Module
- 2.4 Face Detection
- 2.5 Video Captioning - Caption-Based Search
- 2.6 Activity Recognition
- 2.7 Multimodal Fusion and Temporal Search
- 3 VERGE User Interface and Interaction Modes
- 4 Future Work
- References
- NoShot Video Browser at VBS2021
- 1 Introduction
- 2 NoShot Video Browser
- 3 Time Cache
- 4 The NoShot GUI
- 5 Improvements
- 6 Conclusion
- References
- Exquisitor at the Video Browser Showdown 2021: Relationships Between Semantic Classifiers
- 1 Introduction
- 2 Exquisitor
- 3 Operations on Semantic Classifier Rankings
- 4 Conclusions
- References
- VideoGraph - Towards Using Knowledge Graphs for Interactive Video Retrieval
- 1 Introduction
- 2 VideoGraph Construction
- 2.1 Wikidata
- 2.2 Semantic Video Metadata
- 2.3 Textual Semantic Information from Video
- 2.4 Visual Semantic Information from Video
- 2.5 Technical Video Metadata
- 3 VideoGraph Exploration
- 3.1 Query Formulation
- 3.2 Graph Exploration
- 3.3 Graph Extension
- 3.4 User Interaction
- 4 Conclusion
- References
- IVIST: Interactive Video Search Tool in VBS 2021
- 1 Introduction
- 2 Overall Architecture of IVIST
- 3 Main Functions in IVIST
- 3.1 Existing Capabilities
- 3.2 Action Recognition
- 3.3 Place Recognition
- 3.4 Description Searching
- 4 Conclusion
- References
- Video Search with Collage Queries
- 1 Introduction
- 2 System Overview
- 2.1 Additional Notes
- 3 Conclusions
- References
- Towards Explainable Interactive Multi-modal Video Retrieval with Vitrivr
- 1 Introduction
- 2 vitrivr
- 3 Temporal Querying
- 4 Index Structures for Similarity Search
- 5 Towards the Explainability of Search Results
- 6 Conclusion
- References
- Competitive Interactive Video Retrieval in Virtual Reality with vitrivr-VR
- 1 Introduction
- 2 VR Multimedia Retrieval Interfaces
- 3 System Overview
- 4 Querying Mechanisms
- 5 Interactive Retrieval Process in VR
- 5.1 Initial Query
- 5.2 Result Organisation
- 5.3 Refinement Queries
- 6 Conclusion
- References
- An Interactive Video Search Tool: A Case Study Using the V3C1 Dataset
- 1 Introduction
- 2 Search Tool Architecture
- 2.1 Video Dataset
- 2.2 Query Tasks
- 2.3 Image Representation Metadata
- 2.4 Storage and Indexing
- 2.5 Searching Approach
- 3 Conclusion
- References
- Less is More - diveXplore 5.0 at VBS 2021
- 1 Introduction
- 2 diveXplore 5.0
- 2.1 Architecture
- 2.2 Features
- 3 Conclusion
- References
- SOMHunter V2 at Video Browser Showdown 2021
- 1 Introduction
- 2 Newly Included Text Querying Options
- 2.1 Localized Text Queries
- 2.2 Text Query Vector Relocation
- 2.3 User Interface
- 3 Conclusion
- References
- W2VV++ BERT Model at VBS 2021
- 1 Introduction
- 2 System Overview
- 3 Context-Aware Query Ranker
- 4 Conclusion
- References
- VISIONE at Video Browser Showdown 2021
- 1 Introduction
- 2 VISIONE Video Search System
- 3 New VISIONE Functionalities for VBS 2021
- 4 Conclusion and Future Work
- References
- IVOS - The ITEC Interactive Video Object Search System at VBS2021
- 1 Introduction
- 2 Object-Based Exploratory Search
- 3 The IVOS User Interface
- 4 Summary
- References
- Video Search with Sub-Image Keyword Transfer Using Existing Image Archives
- 1 Introduction
- 2 Automatic Keywording
- 3 Improved Image Feature Vectors
- 4 Search Result Visualization
- 5 Search System
- References
- A VR Interface for Browsing Visual Spaces at VBS2021
- 1 Introduction
- 2 Related Systems
- 3 An Overview of EOLAS
- 3.1 Source Data
- 3.2 Search Engine
- 3.3 User Interaction
- 4 Conclusions
- References
- Correction to: SQL-Like Interpretable Interactive Video Search
- Correction to: Chapter "SQL-Like Interpretable Interactive Video Search" in: J. Lokoc et al. (Eds.): MultiMedia Modeling, LNCS 12573, https://doi.org/10.1007/978-3-030-67835-7_34
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.