Advanced Intelligent Computing Technology and Applications

Name: Advanced Intelligent Computing Technology and Applications | 20th International Conference, ICIC 2024, Tianjin, China, August 5-8, 2024, Proceedings, Part XI
Brand: Springer
Price: 79.17 EUR
Availability: OnlineOnly

20th International Conference, ICIC 2024, Tianjin, China, August 5-8, 2024, Proceedings, Part XI

De-Shuang Huang Yijie Pan Qinhu Zhang(Herausgeber*in)

Springer (Verlag)

Erschienen am 30. Juli 2024

XVIII, 495 Seiten

E-Book

PDF mit Wasserzeichen-DRM

Systemvoraussetzungen

978-981-97-5612-4 (ISBN)

79,17 €inkl. 7% MwSt.

Systemvoraussetzungen

für PDF mit Wasserzeichen-DRM

E-Book Einzellizenz

Als Download verfügbar

Beschreibung

Weitere Details

Weitere Ausgaben

Inhalt

Intro
Preface
Organization
Contents - Part XI
Intelligent Computing in Computer Vision
Priority Intra-model Adaptation for Traffic Sign Detection and Recognition
1 Introduction
2 Related Work
3 Proposed Method
3.1 Priority Intra-model Adaptation
3.2 TSDR Models
3.3 TT100K-FineNet and GTSDB-FineNet
4 Experiment
4.1 Datasets
4.2 Evaluation Metrics and Implementation Details
4.3 Experimental Results
5 Discussion
6 Conclusion
References
Adaptive Swin Transformers for Few-Shot Cross-Domain Silent Face Liveness Detection
1 Introduction
2 Related Work
3 Method
3.1 Overview
3.2 Network Architecture
3.3 Feature-Wise Transformation
4 Experiment
4.1 Datasets and Evaluation Metrics
4.2 Implementation Details
4.3 Cross-Domain Performance
4.4 Ablation Study
5 Conclusion
References
DSFormer: Leveraging Transformer with Cross-Modal Attention for Temporal Consistency in Low-Light Video Enhancement
1 Introduction
2 Related Work
2.1 Low-Light Video Enhancement
3 Method
3.1 DSFormer Architecture
3.2 Flow Cross-Attention (FCA)
3.3 Spatial-Channel Multi-head Self-Attention (SCMA)
3.4 Dual Path Feed-Forward Network (DPFN)
4 Experiment
4.1 Implementation Detail
4.2 Static Video Evaluation
4.3 Dynamic Video Evaluation
4.4 Ablation Study
5 Conclusion
References
Robot Control Using Hand Gestures of the Mexican Sign Language
1 Introduction
2 Proposed Method
2.1 Segmentation Techniques
2.2 Feature Extraction
2.3 Feature Selection
2.4 Classification Techniques
2.5 Dataset
3 Experimental Results
4 Control Robot Method
4.1 Movement Orders Selection and Implementation
5 Conclusions
References
Improved Channel-Wise Semantic Alignment for Few-Shot Object Detection
1 Introduction
2 Related Work
3 Problem Definition
3.1 Few-Shot Object Detection
3.2 Channel Attention
4 Our Method
4.1 Feature Purification
4.2 Sparse Channel Relation Distillation
5 Experiments
5.1 Datasets
5.2 Implementation Details
5.3 Comparison with the State-of-the-Arts
5.4 Ablation Study
6 Conclusion
References
Adapting Depth Distribution for 3D Object Detection with a Two-Stage Training Paradigm
1 Introduction
2 Related Work
2.1 Camera-Only 3D Object Detection
2.2 Depth Estimation
3 Preliminary
3.1 3D Object Detection
3.2 Multi-View Depth Estimation
3.3 LSS-Based 3D Object Detection Framework
4 Method
4.1 Two-Stage Training Paradigm
4.2 Depth Distribution Adaption
5 Experiment
5.1 Experimental Setup
5.2 Main Results
5.3 Ablation Study
5.4 Validation Test: The Impact of Depth Accuracy on Detection
6 Conclusion
6.1 Limitations
References
Domain Adaptive Object Detection with Dehazing Module
1 Introduction
2 Related Work
2.1 Image Dehazing
2.2 Object Detection
2.3 Domain Adaptive Object Detection
3 Methods
3.1 Network Overview
3.2 Dehazing Module
3.3 Domain Adaptation
4 Experimental Results
4.1 Datasets
4.2 Experiment of Fog Removal Module
4.3 Experiment of DefogDA-FasterRCNN
5 Conclusions
References
Improving Dynamic 3D Gaussian Splatting from Monocular Videos with Object Motion Information
1 Introduction
2 Related Work
2.1 Dynamic Scene Reconstruction
2.2 Depth Estimation
3 Preliminary
3.1 Problem Definition
3.2 3D Gaussian Splatting
3.3 Deformation Field
4 Method
4.1 Overview
4.2 Motion Segmentation
4.3 Three-Stage Training Strategy
4.4 Synthetic View Augmentation
5 Experiment
5.1 Setting
5.2 Comparisons
5.3 Ablation Study
6 Conclusion
References
Segmentation and Quality Assessment of Continuous Fitness Movements Based on Vision
1 Introduction
2 Related Work
3 Approach
3.1 WaveOptiSeg
3.2 TimeTransMLP
4 Experiments
4.1 Squat-Score Dataset
4.2 Evaluation Metrics
4.3 Implementation Details
4.4 Performance Comparison
5 Conclusion
References
Diagonal-Angle-Foreground IoU Loss Function for Small Object Detection
1 Introduction
2 Related Work
2.1 IoU Series Loss Functions for Bounding Box Regression
2.2 Summary
3 DAFIoU Loss Function
3.1 Angle-Based Loss Term
3.2 Diagonal-Based Loss Term
3.3 Foreground-Based Loss Term
3.4 DAFIoU (Diagonal, Angle, and Foreground Loss Function)
4 Experimental Results
4.1 Simulated Experiment
4.2 Ablation Experiment
4.3 YOLOv8s on Visdrone2019
4.4 YOLOv8s on SODA-D10
4.5 Faster R-CNN on Visdrone2019
4.6 Visualization of Detection Results
5 Conclusion
References
Enhancing Dense Object Counting in Occlusion with a Dual-Branch Network
1 Introduction
2 Related Works
2.1 Neural Networks for Counting
2.2 Optimization Method of Dense Object Counting
3 Bilateral Counting Network
3.1 Density Region Extraction
3.2 Multi-lateral Collaborative Counting Network
4 Experiments
4.1 Datasets
4.2 Experiment Settings
4.3 Experiment Results
5 Analysis
5.1 Ablation Studies
6 Limitations
7 Conclusion
References
Street Block Classification Based on Urban Satellite Images
1 Introduction
2 Dataset Building and Reprocessing
2.1 Dataset Building
2.2 Preprocessing of Public Datasets
3 Our Network Architecture
3.1 Feature Extractor
3.2 Adaptive Pyramid Pooling
3.3 Classifier
4 Experiments
4.1 Overall Accuracy
4.2 F1 Score
5 Conclusion
References
SRCFT: A Correlation Filter Tracker with Siamese Super-Resolution Network and Sample Reliability Awareness for Thermal Infrared Target Tracking
1 Introduction
2 Methodology
2.1 Algorithm Overview
2.2 Siamese Super-Resolution Network
2.3 Sample Reliability Awareness
3 Experiment
3.1 Implementation Details
3.2 Performance Comparison with State-of-the-Arts
4 Conclusions
References
Traffic Sign Detection and Recognition Using Gradient Training with an Improved YOLO Network
1 Introduction
2 Tri-modal Gradient Based Dataset Processing
2.1 First Gradient Dataset
2.2 Second Gradient Dataset
2.3 Third Gradient Dataset
3 IYOLO-TS
4 Experiments and Results
5 Summary and Outlook
References
Neural Radiation Fields via Accelerated and High Quality Parallel for Novel View Synthesis
1 Introduction
2 Related Work
2.1 Novel View Synthesis
2.2 Neural Radiance Fields
2.3 NeRFs with Explicit Volumetric Representations
3 Background and Motivation
4 Method
5 Experiments
5.1 Experiment Setup
5.2 Evaluation on Quality and Efficiency
5.3 Training on Consumer Devices
5.4 Comparison of Ablation Experiments
6 Conclusion
References
IOCSegFormer: Enhancing Wheat Ears Counting in Field Conditions Through Augmented Local Features
1 Introduction
2 Related Work
3 Methods
3.1 The Architecture
3.2 Local Segmentation Branch
3.3 Loss Function
4 Experiments
4.1 Experimental Setup
4.2 Datasets
4.3 Data Preprocessing
4.4 Results and Analysis
4.5 Ablation Studies
4.6 Visualizations
5 Conclusion
References
Stroke-Based Few-Shot Chinese Character Style Transfer
1 Introduction
2 Method
2.1 Dataset
2.2 Overall Pipeline
2.3 Cross-attention Module
2.4 Loss Functions
3 Result and Discussions
3.1 Evaluation Metrics
3.2 Generated Chinese Character Images Results
4 Conclusion
References
Computer Vision Drives the New Quality Productive Forces in Agriculture: A Method for Recognizing Farming Behavior on Edge Computing Devices
1 Introduction
2 Related Work
3 Approach
3.1 Employee Detection
3.2 Behavior Classification
4 Experiments and Results
4.1 Dataset
4.2 Implementation Details
4.3 Evaluation Metrics
4.4 Experimental Results
5 Conclusions and Future Works
References
PS-DeiT: A Part-Selection Based DeiT for Fine-Grained Classification
1 Introduction
2 Related Work
3 Part-Selection Based DeiT
3.1 DeiT Based Feature Extractor
3.2 Knowledge Distillation Model
3.3 Part Selection Module
3.4 Loss Function Design
4 Experimental Results and Analysis
4.1 Implementation Details
4.2 Performance Evaluation
4.3 Ablation Study
5 Conclusion
References
Text-Guided Multi-region Scene Image Editing Based on Diffusion Model
1 Introduction
2 Method
2.1 Overview
2.2 Mask Dilation Based Object Editing
2.3 OutwardLPF Based Background Coordination
3 Experimental Evaluation
3.1 Implementation Details
3.2 Main Results
3.3 Ablation Study
3.4 Scene Iterative Editing
4 Conclusion
References
MFANet: Multi-feature Aggregation Network for Domain Generalized Stereo Matching
1 Introduction
2 Related Work
2.1 Deep Stereo Matching
2.2 Domain Generalization
3 Method
3.1 Network Architecture
3.2 Multi-scale Adaptive Semantic Feature Aggregation
3.3 Multi-scale Texture Feature Aggregation
3.4 Recurrent Hourglass Aggregation Network
3.5 Disparity Regression and Loss Function
4 Experiments
4.1 Implementation Details
4.2 Ablation Studies
4.3 Comparison with State-of-the-Art
4.4 Cross-Domain Generalization
5 Conclusion
References
DGAT-net: Dynamic Graph Attention for 3D Point Cloud Semantic Segmentation
1 Introduction
2 Related Works
2.1 Dynamic Convolution
2.2 Graph Convolution
3 Method
3.1 Overview
3.2 Dynamic Graph Feature Module (Dy-Graph)
3.3 Dynamic Graph Feature Enhancement
3.4 Loss Function
4 Experiments
4.1 Dataset and Metrics
4.2 Results on the S3DIS Dataset
4.3 Results on the SemanticKITTI Dataset
4.4 Ablation Study
5 Conclusion
References
FSCformernet: A Fourier-Transformer UNet for Efficient Semantic Segmentation of Plant Leaf
1 Introduction
2 Method
2.1 Architecture Overview
2.2 FSCI Transformer Block
2.3 Fourier Enhance Layer
2.4 Loss Function
2.5 Evaluation Metrics
3 Experiments
3.1 Dataset
3.2 Implementation Details
3.3 Experiment Results on KOMATSUNA Dataset
3.4 Experiment Results on MSU-PID Dataset
3.5 Ablation Study
4 Conclusion
References
Improved Real-Time Monitoring Lightweight Model for UAVs Based on YOLOv8
1 Introduction
2 Related Research
3 Method
3.1 RepNCSPELAN4
3.2 C2FiRMA
4 Experimental Design and Analysis of Results
4.1 Experimental Setup
4.2 Evaluation Indicators
4.3 Results
5 Conclusion
References
Aesthetics-Driven Active Reinforcement Learning for Color Enhancement
1 Introduction
2 Related Work
3 Problem Formulation
4 Method
4.1 Overall Architecture
4.2 Agent Network
4.3 Action Space
4.4 Reward
5 Experiments
5.1 Datasets and Training Details
5.2 Benchmark Evaluations
5.3 Ablation Study
6 Conclusion
References
Beyond the Limits: Tackling Extreme Overexposure with Diffusion Model
1 Introduction
2 Related Works
2.1 Exposure Correction
2.2 Image Inpainting
3 Method
3.1 Problem Formulation and Overall Architecture
3.2 Enhancement Network
3.3 Brightness Extraction Module
3.4 Semantically Inpainting Module
3.5 Loss Functions
4 Experiments
4.1 Implementation Details
4.2 Comparisons with State-of-the-Art Methods
4.3 Ablation Study
5 Conclusion
References
Fuse and Calibrate: A Bi-directional Vision-Language Guided Framework for Referring Image Segmentation
1 Introduction
2 Related Work
3 Methodology
3.1 Text and Image Encoding
3.2 Vision-Guided Fusion
3.3 Language-Guided Calibration
3.4 Mask Decoding
4 Experiments
4.1 Implementation Details
4.2 Comparison with State-of-the-Art Method
5 Ablation Study
6 Visualization
7 Conclusion
References
Efficient Local Imperceptible Random Search for Black-Box Adversarial Attacks
1 Introduction
2 Background and Related Work
2.1 Black-Box Adversarial Attack
2.2 Local Perturbation
2.3 Search Space and Query Reduction Method
3 Proposed Method
3.1 Objective Function
3.2 Location of Salient Region
3.3 Grouping and Ranking of Subregions
3.4 Imperceptible Attack with Random Search
4 Experimental Studies and Analysis
4.1 Results on MNIST
4.2 Results on CIFAR10
4.3 Results on ImageNet
5 Conclusion
References
COLORSHOP: Color Manipulation of Objects in Videos Using Diffusion Models
1 Introduction
2 Related Work
2.1 Image Editing
2.2 Image Editing
3 Method
3.1 Method Overview
3.2 Foreground Diffusion
3.3 Cross-Frame Spatial Feature Fusion
4 Experiment
4.1 Implementation Details
4.2 Metrics
4.3 Ablation Study
4.4 Quantitative Evaluation
4.5 Qualitative Evaluation
5 Conclusion
References
Utilizing Stable Diffusion to Enhance Car Parts Detection
1 Introduction
2 FGCP Dataset
3 Method
3.1 Data Augmentation with Diffusion Model
3.2 Diffusion-Based Detector
4 Experiment
4.1 Car Parts Detection
4.2 Car Parts Detection with Augmented Data
4.3 Comparison of Diffusion-Based Detector (DBD) and YOLOv8X
5 Conclusion
References
MonoRetNet: A Self-supervised Model for Monocular Depth Estimation with Bidirectional Half-Duplex Retention
1 Introduction
2 Related Work
2.1 Self-supervised Monocular Depth Estimation
2.2 Retention Mechanism
3 MonoRetNet
3.1 Framework Overview
3.2 Depth Network
3.3 Pose Network
3.4 Self-supervised Learning
4 Experiments
4.1 Dataset
4.2 Results
4.3 Ablation Study
5 Conclusion
References
Lightweight Human Pose Estimation Model for Industrial Scenarios
1 Introduction
2 Related Work
3 Method
3.1 Lightweight C2f Module
3.2 Head Structure Optimisation
4 Experiments Preparation
4.1 Dataset and Experimental Environment
4.2 Evaluation Criteria
5 Experimental Results
5.1 Ablation Experiment
5.2 Comparative Experiment
5.3 Visualizing Results
6 Conclusions
References
Improved YOLOv8 Method for Multi-scale Pothole Detection
1 Introduction
2 Improvement
2.1 Backbone
2.2 Neck
2.3 Loss Function
3 Experimental Analysis
3.1 Datasets and Experimental Environment
3.2 Evaluate Metrics
3.3 Experiment on the Effectiveness of the Small Object Detection Layer
3.4 Comparative Experiments on Horizontal Improvements of Some Modules
3.5 Ablation Experiments
3.6 Comparative Experiments Among Different Algorithms
3.7 Visualization
3.8 Conclusion
References
ST-CLIP: Spatio-Temporal Enhanced CLIP Towards Dense Video Captioning
1 Introduction
2 Method
2.1 Model Architecture
2.2 Cross-Frame Spatio-Temporal Attention Mechanism
2.3 Cross-Modal Feature Interaction Mechanism
2.4 Training
3 Experiments
3.1 Experimental Settings
3.2 Main Results
3.3 Ablation Studies
4 Conclusion and Future Work
References
SCW-YOLO: An Improved Algorithm for Fall Detection Based on Deep Learning
1 Introduction
2 Related Works
2.1 Wearable Sensor
2.2 Computer Vision
2.3 YOLOv8 Model
3 Methodology
3.1 C2fSCConv Module
3.2 CARAFE Upsampling
3.3 WIoU Loss Function
4 Experiment Design and Result Analysis
4.1 Dataset and Evaluation Metrics
4.2 Experimental Setup
4.3 Results Analysis
5 Conclusion
References
One Stage Near-Ground Pothole Object Detection
1 Introduction
2 Related Work
3 The Proposed Method
3.1 Image Acquisition and Processing
3.2 The Designed Network Framework
3.3 Anchor Design
3.4 NMS Post Processing
3.5 The Loss Function
3.6 Increase Receptive Field (N-rank)
3.7 High and Low Resolution Information Interaction (BI-FM)
4 Experiments
4.1 Experimental Setup
4.2 Performance Metrics
4.3 Results and Discussion
5 Conclusion
References
Multi-layered Stixels Prompted by Semantic Information
1 Introduction
2 Overview of the Algorithm
3 Our Stixels Model
3.1 Semantic Labels
3.2 Disparity Labels
3.3 Semantic Stixels
4 Experiment
4.1 Semantic Segmentation Network Training
4.2 Multi-layer Stixel-World Experiments
4.3 Results
5 Conclusion
References
A New Multi-task Network for Autonomous Driving: Efficientnetv1Unet
1 Introduction
2 Related Work
2.1 Object Detection
2.2 Segmentation
2.3 Height and Width Restriction Detection
2.4 Multi-task Learning
3 Methodology
3.1 Encoder
3.2 Decoder
3.3 Loss Function
4 Experiments
4.1 Details Setting
4.2 Experimental Results
5 Conclusion
References
Automatic Detection and Assessment of Corals in Shallow Sea Regions Based on Deep Learning Models
1 Introduction
2 Methodology
2.1 Overview
2.2 Yolov8 with Embedded Attention Module
2.3 Deep SORT Object Matching and Tracking Algorithms
3 Datasets
4 Experiments
4.1 Training Details
4.2 Evaluation Metrics
4.3 Experimental Evaluation
5 Coral Detection and Assessment
6 Conclusions
References
Rust Detection Network for Transmission Line Based on UAV Inspection
1 Introduction
2 Related Work
3 Method
3.1 YOLOv8 Algorithm
3.2 Improved YOLOv8 Algorithm
4 Experimental Design and Results Analysis
4.1 Experimental Equipment
4.2 Evaluation Index
4.3 Results
4.4 Ablation Experiment
4.5 Visualisation of Experimental Results
5 Conclusion
References
ETVKT: Enhanced Training Vector for Knowledge Tracing
1 Introduction
2 Related Work
3 ETVKT Model
3.1 Problem Definition
3.2 Question Information Embedding
4 Experiment
4.1 Baselines
4.2 Analysis of Prediction Results
4.3 Training Cycle
4.4 Visualization of Learner Mastery Status
4.5 Visualization of Learner Answer Prediction
5 Conclusion
References
DGAP-YOLO: A Crack Detection Method Based on UAV Images and YOLO
1 Introduction
2 Related Work
2.1 Overview of Target Detection
2.2 Crack Detection Method
3 Method
3.1 Attention Module
3.2 Part LowAFPN
4 Experimental Design and Result Analysis
4.1 Experimental Settings
4.2 Evaluation Index
4.3 Experimental Results
5 Conclusion
References
Author Index

Systemvoraussetzungen

Als PDF speichern Als Link merken