Image and Video Analysis
DepthFisheye: Efficient Fine-Tuning of Depth Estimation Models for Fisheye Cameras.- DIMATrack: Dimension Aware Data Association for Multi-Object Tracking.- Efficient Transformer Network for Visible and Ultraviolet Object Tracking.- LightGR-Transformer: Light Grouped Residual Transformer for Multispectral Object Detection.- ADMMOA: Attribute-Driven Multimodal Optimization for Face Recognition Adversarial Attacks.- Training-Free Language-Guided Video Summarization via Multi-Grained Saliency Scoring.-
Multimodal Learning
Reinforced Label Denoising for Weakly-Supervised Audio-Visual Video Parsing.- Bridging the Modality Gap: Advancing Multimodal Human Pose Estimation with Modality-Adaptive Pose Estimator and Novel Benchmark Datasets.- Momentum-Based Uni-Modal Soft-Label Alignment and Multi-Modal Latent Projection Networks for Optimizing Image-Text Retrieval.- Multi-Granularity and Multi-Modal Prompt Learning for Person Re-Identification.- Local and Global Feature Cross-attention Multimodal Place Recognition.- IML-CMM - A Multimodal Sentiment Analysis Framework Integrating Intra-Modal Learning and Cross-Modal Mixup Enhancement.-
Geometrical Processing
MCFG with GUMAP: A Simple and Effective Clustering Framework on Grassmann Manifold.- Joint UMAP for Visualization of Time-Dependent Data.- Unsupervised Domain Adaptation on Point Cloud Classification via Imposing Structural Manifolds into Representation Space.-
Applications
Learning Adaptive Basis Fonts to Fuse Content Features for Few-shot Font Generation.- TaiCrowd: A High-Performance Simulation Framework for Massive Crowd.-Feature Disentanglement and Fusion Model for Multi-Source Domain Adaptation with Domain-Specific Features.- A Trademark Retrieval Method Based on Self-Supervised Learning.- Weaken Noisy Feature: Boosting Semi-Supervised Learning by Noise Estimation.- Multi-Dimension Full Scene Integrated Visual Emotion Analysis Network.- Gap-KD: Bridging the Significant Capacity Gap Between Teacher and Student Model.