
Advanced Intelligent Computing Technology and Applications
Beschreibung
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
This 13-volume set LNCS 14862-14874 constitutes - in conjunction with the 6-volume set LNAI 14875-14880 and the two-volume set LNBI 14881-14882 - the refereed proceedings of the 20th International Conference on Intelligent Computing, ICIC 2024, held in Tianjin, China, during August 5-8, 2024.
The total of 863 regular papers were carefully reviewed and selected from 2189 submissions.
This year, the conference concentrated mainly on the theories and methodologies as well as the emerging applications of intelligent computing. Its aim was to unify the picture of contemporary intelligent computing techniques as an integral concept that highlights the trends in advanced computational intelligence and bridges theoretical research with applications. Therefore, the theme for this conference was "Advanced Intelligent Computing Technology and Applications". Papers that focused on this theme were solicited, addressing theories, methodologies, and applications in science and technology.
Weitere Details
Weitere Ausgaben
Inhalt
- Intro
- Preface
- Organization
- Contents - Part XI
- Intelligent Computing in Computer Vision
- Priority Intra-model Adaptation for Traffic Sign Detection and Recognition
- 1 Introduction
- 2 Related Work
- 3 Proposed Method
- 3.1 Priority Intra-model Adaptation
- 3.2 TSDR Models
- 3.3 TT100K-FineNet and GTSDB-FineNet
- 4 Experiment
- 4.1 Datasets
- 4.2 Evaluation Metrics and Implementation Details
- 4.3 Experimental Results
- 5 Discussion
- 6 Conclusion
- References
- Adaptive Swin Transformers for Few-Shot Cross-Domain Silent Face Liveness Detection
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Overview
- 3.2 Network Architecture
- 3.3 Feature-Wise Transformation
- 4 Experiment
- 4.1 Datasets and Evaluation Metrics
- 4.2 Implementation Details
- 4.3 Cross-Domain Performance
- 4.4 Ablation Study
- 5 Conclusion
- References
- DSFormer: Leveraging Transformer with Cross-Modal Attention for Temporal Consistency in Low-Light Video Enhancement
- 1 Introduction
- 2 Related Work
- 2.1 Low-Light Video Enhancement
- 3 Method
- 3.1 DSFormer Architecture
- 3.2 Flow Cross-Attention (FCA)
- 3.3 Spatial-Channel Multi-head Self-Attention (SCMA)
- 3.4 Dual Path Feed-Forward Network (DPFN)
- 4 Experiment
- 4.1 Implementation Detail
- 4.2 Static Video Evaluation
- 4.3 Dynamic Video Evaluation
- 4.4 Ablation Study
- 5 Conclusion
- References
- Robot Control Using Hand Gestures of the Mexican Sign Language
- 1 Introduction
- 2 Proposed Method
- 2.1 Segmentation Techniques
- 2.2 Feature Extraction
- 2.3 Feature Selection
- 2.4 Classification Techniques
- 2.5 Dataset
- 3 Experimental Results
- 4 Control Robot Method
- 4.1 Movement Orders Selection and Implementation
- 5 Conclusions
- References
- Improved Channel-Wise Semantic Alignment for Few-Shot Object Detection
- 1 Introduction
- 2 Related Work
- 3 Problem Definition
- 3.1 Few-Shot Object Detection
- 3.2 Channel Attention
- 4 Our Method
- 4.1 Feature Purification
- 4.2 Sparse Channel Relation Distillation
- 5 Experiments
- 5.1 Datasets
- 5.2 Implementation Details
- 5.3 Comparison with the State-of-the-Arts
- 5.4 Ablation Study
- 6 Conclusion
- References
- Adapting Depth Distribution for 3D Object Detection with a Two-Stage Training Paradigm
- 1 Introduction
- 2 Related Work
- 2.1 Camera-Only 3D Object Detection
- 2.2 Depth Estimation
- 3 Preliminary
- 3.1 3D Object Detection
- 3.2 Multi-View Depth Estimation
- 3.3 LSS-Based 3D Object Detection Framework
- 4 Method
- 4.1 Two-Stage Training Paradigm
- 4.2 Depth Distribution Adaption
- 5 Experiment
- 5.1 Experimental Setup
- 5.2 Main Results
- 5.3 Ablation Study
- 5.4 Validation Test: The Impact of Depth Accuracy on Detection
- 6 Conclusion
- 6.1 Limitations
- References
- Domain Adaptive Object Detection with Dehazing Module
- 1 Introduction
- 2 Related Work
- 2.1 Image Dehazing
- 2.2 Object Detection
- 2.3 Domain Adaptive Object Detection
- 3 Methods
- 3.1 Network Overview
- 3.2 Dehazing Module
- 3.3 Domain Adaptation
- 4 Experimental Results
- 4.1 Datasets
- 4.2 Experiment of Fog Removal Module
- 4.3 Experiment of DefogDA-FasterRCNN
- 5 Conclusions
- References
- Improving Dynamic 3D Gaussian Splatting from Monocular Videos with Object Motion Information
- 1 Introduction
- 2 Related Work
- 2.1 Dynamic Scene Reconstruction
- 2.2 Depth Estimation
- 3 Preliminary
- 3.1 Problem Definition
- 3.2 3D Gaussian Splatting
- 3.3 Deformation Field
- 4 Method
- 4.1 Overview
- 4.2 Motion Segmentation
- 4.3 Three-Stage Training Strategy
- 4.4 Synthetic View Augmentation
- 5 Experiment
- 5.1 Setting
- 5.2 Comparisons
- 5.3 Ablation Study
- 6 Conclusion
- References
- Segmentation and Quality Assessment of Continuous Fitness Movements Based on Vision
- 1 Introduction
- 2 Related Work
- 3 Approach
- 3.1 WaveOptiSeg
- 3.2 TimeTransMLP
- 4 Experiments
- 4.1 Squat-Score Dataset
- 4.2 Evaluation Metrics
- 4.3 Implementation Details
- 4.4 Performance Comparison
- 5 Conclusion
- References
- Diagonal-Angle-Foreground IoU Loss Function for Small Object Detection
- 1 Introduction
- 2 Related Work
- 2.1 IoU Series Loss Functions for Bounding Box Regression
- 2.2 Summary
- 3 DAFIoU Loss Function
- 3.1 Angle-Based Loss Term
- 3.2 Diagonal-Based Loss Term
- 3.3 Foreground-Based Loss Term
- 3.4 DAFIoU (Diagonal, Angle, and Foreground Loss Function)
- 4 Experimental Results
- 4.1 Simulated Experiment
- 4.2 Ablation Experiment
- 4.3 YOLOv8s on Visdrone2019
- 4.4 YOLOv8s on SODA-D10
- 4.5 Faster R-CNN on Visdrone2019
- 4.6 Visualization of Detection Results
- 5 Conclusion
- References
- Enhancing Dense Object Counting in Occlusion with a Dual-Branch Network
- 1 Introduction
- 2 Related Works
- 2.1 Neural Networks for Counting
- 2.2 Optimization Method of Dense Object Counting
- 3 Bilateral Counting Network
- 3.1 Density Region Extraction
- 3.2 Multi-lateral Collaborative Counting Network
- 4 Experiments
- 4.1 Datasets
- 4.2 Experiment Settings
- 4.3 Experiment Results
- 5 Analysis
- 5.1 Ablation Studies
- 6 Limitations
- 7 Conclusion
- References
- Street Block Classification Based on Urban Satellite Images
- 1 Introduction
- 2 Dataset Building and Reprocessing
- 2.1 Dataset Building
- 2.2 Preprocessing of Public Datasets
- 3 Our Network Architecture
- 3.1 Feature Extractor
- 3.2 Adaptive Pyramid Pooling
- 3.3 Classifier
- 4 Experiments
- 4.1 Overall Accuracy
- 4.2 F1 Score
- 5 Conclusion
- References
- SRCFT: A Correlation Filter Tracker with Siamese Super-Resolution Network and Sample Reliability Awareness for Thermal Infrared Target Tracking
- 1 Introduction
- 2 Methodology
- 2.1 Algorithm Overview
- 2.2 Siamese Super-Resolution Network
- 2.3 Sample Reliability Awareness
- 3 Experiment
- 3.1 Implementation Details
- 3.2 Performance Comparison with State-of-the-Arts
- 4 Conclusions
- References
- Traffic Sign Detection and Recognition Using Gradient Training with an Improved YOLO Network
- 1 Introduction
- 2 Tri-modal Gradient Based Dataset Processing
- 2.1 First Gradient Dataset
- 2.2 Second Gradient Dataset
- 2.3 Third Gradient Dataset
- 3 IYOLO-TS
- 4 Experiments and Results
- 5 Summary and Outlook
- References
- Neural Radiation Fields via Accelerated and High Quality Parallel for Novel View Synthesis
- 1 Introduction
- 2 Related Work
- 2.1 Novel View Synthesis
- 2.2 Neural Radiance Fields
- 2.3 NeRFs with Explicit Volumetric Representations
- 3 Background and Motivation
- 4 Method
- 5 Experiments
- 5.1 Experiment Setup
- 5.2 Evaluation on Quality and Efficiency
- 5.3 Training on Consumer Devices
- 5.4 Comparison of Ablation Experiments
- 6 Conclusion
- References
- IOCSegFormer: Enhancing Wheat Ears Counting in Field Conditions Through Augmented Local Features
- 1 Introduction
- 2 Related Work
- 3 Methods
- 3.1 The Architecture
- 3.2 Local Segmentation Branch
- 3.3 Loss Function
- 4 Experiments
- 4.1 Experimental Setup
- 4.2 Datasets
- 4.3 Data Preprocessing
- 4.4 Results and Analysis
- 4.5 Ablation Studies
- 4.6 Visualizations
- 5 Conclusion
- References
- Stroke-Based Few-Shot Chinese Character Style Transfer
- 1 Introduction
- 2 Method
- 2.1 Dataset
- 2.2 Overall Pipeline
- 2.3 Cross-attention Module
- 2.4 Loss Functions
- 3 Result and Discussions
- 3.1 Evaluation Metrics
- 3.2 Generated Chinese Character Images Results
- 4 Conclusion
- References
- Computer Vision Drives the New Quality Productive Forces in Agriculture: A Method for Recognizing Farming Behavior on Edge Computing Devices
- 1 Introduction
- 2 Related Work
- 3 Approach
- 3.1 Employee Detection
- 3.2 Behavior Classification
- 4 Experiments and Results
- 4.1 Dataset
- 4.2 Implementation Details
- 4.3 Evaluation Metrics
- 4.4 Experimental Results
- 5 Conclusions and Future Works
- References
- PS-DeiT: A Part-Selection Based DeiT for Fine-Grained Classification
- 1 Introduction
- 2 Related Work
- 3 Part-Selection Based DeiT
- 3.1 DeiT Based Feature Extractor
- 3.2 Knowledge Distillation Model
- 3.3 Part Selection Module
- 3.4 Loss Function Design
- 4 Experimental Results and Analysis
- 4.1 Implementation Details
- 4.2 Performance Evaluation
- 4.3 Ablation Study
- 5 Conclusion
- References
- Text-Guided Multi-region Scene Image Editing Based on Diffusion Model
- 1 Introduction
- 2 Method
- 2.1 Overview
- 2.2 Mask Dilation Based Object Editing
- 2.3 OutwardLPF Based Background Coordination
- 3 Experimental Evaluation
- 3.1 Implementation Details
- 3.2 Main Results
- 3.3 Ablation Study
- 3.4 Scene Iterative Editing
- 4 Conclusion
- References
- MFANet: Multi-feature Aggregation Network for Domain Generalized Stereo Matching
- 1 Introduction
- 2 Related Work
- 2.1 Deep Stereo Matching
- 2.2 Domain Generalization
- 3 Method
- 3.1 Network Architecture
- 3.2 Multi-scale Adaptive Semantic Feature Aggregation
- 3.3 Multi-scale Texture Feature Aggregation
- 3.4 Recurrent Hourglass Aggregation Network
- 3.5 Disparity Regression and Loss Function
- 4 Experiments
- 4.1 Implementation Details
- 4.2 Ablation Studies
- 4.3 Comparison with State-of-the-Art
- 4.4 Cross-Domain Generalization
- 5 Conclusion
- References
- DGAT-net: Dynamic Graph Attention for 3D Point Cloud Semantic Segmentation
- 1 Introduction
- 2 Related Works
- 2.1 Dynamic Convolution
- 2.2 Graph Convolution
- 3 Method
- 3.1 Overview
- 3.2 Dynamic Graph Feature Module (Dy-Graph)
- 3.3 Dynamic Graph Feature Enhancement
- 3.4 Loss Function
- 4 Experiments
- 4.1 Dataset and Metrics
- 4.2 Results on the S3DIS Dataset
- 4.3 Results on the SemanticKITTI Dataset
- 4.4 Ablation Study
- 5 Conclusion
- References
- FSCformernet: A Fourier-Transformer UNet for Efficient Semantic Segmentation of Plant Leaf
- 1 Introduction
- 2 Method
- 2.1 Architecture Overview
- 2.2 FSCI Transformer Block
- 2.3 Fourier Enhance Layer
- 2.4 Loss Function
- 2.5 Evaluation Metrics
- 3 Experiments
- 3.1 Dataset
- 3.2 Implementation Details
- 3.3 Experiment Results on KOMATSUNA Dataset
- 3.4 Experiment Results on MSU-PID Dataset
- 3.5 Ablation Study
- 4 Conclusion
- References
- Improved Real-Time Monitoring Lightweight Model for UAVs Based on YOLOv8
- 1 Introduction
- 2 Related Research
- 3 Method
- 3.1 RepNCSPELAN4
- 3.2 C2FiRMA
- 4 Experimental Design and Analysis of Results
- 4.1 Experimental Setup
- 4.2 Evaluation Indicators
- 4.3 Results
- 5 Conclusion
- References
- Aesthetics-Driven Active Reinforcement Learning for Color Enhancement
- 1 Introduction
- 2 Related Work
- 3 Problem Formulation
- 4 Method
- 4.1 Overall Architecture
- 4.2 Agent Network
- 4.3 Action Space
- 4.4 Reward
- 5 Experiments
- 5.1 Datasets and Training Details
- 5.2 Benchmark Evaluations
- 5.3 Ablation Study
- 6 Conclusion
- References
- Beyond the Limits: Tackling Extreme Overexposure with Diffusion Model
- 1 Introduction
- 2 Related Works
- 2.1 Exposure Correction
- 2.2 Image Inpainting
- 3 Method
- 3.1 Problem Formulation and Overall Architecture
- 3.2 Enhancement Network
- 3.3 Brightness Extraction Module
- 3.4 Semantically Inpainting Module
- 3.5 Loss Functions
- 4 Experiments
- 4.1 Implementation Details
- 4.2 Comparisons with State-of-the-Art Methods
- 4.3 Ablation Study
- 5 Conclusion
- References
- Fuse and Calibrate: A Bi-directional Vision-Language Guided Framework for Referring Image Segmentation
- 1 Introduction
- 2 Related Work
- 3 Methodology
- 3.1 Text and Image Encoding
- 3.2 Vision-Guided Fusion
- 3.3 Language-Guided Calibration
- 3.4 Mask Decoding
- 4 Experiments
- 4.1 Implementation Details
- 4.2 Comparison with State-of-the-Art Method
- 5 Ablation Study
- 6 Visualization
- 7 Conclusion
- References
- Efficient Local Imperceptible Random Search for Black-Box Adversarial Attacks
- 1 Introduction
- 2 Background and Related Work
- 2.1 Black-Box Adversarial Attack
- 2.2 Local Perturbation
- 2.3 Search Space and Query Reduction Method
- 3 Proposed Method
- 3.1 Objective Function
- 3.2 Location of Salient Region
- 3.3 Grouping and Ranking of Subregions
- 3.4 Imperceptible Attack with Random Search
- 4 Experimental Studies and Analysis
- 4.1 Results on MNIST
- 4.2 Results on CIFAR10
- 4.3 Results on ImageNet
- 5 Conclusion
- References
- COLORSHOP: Color Manipulation of Objects in Videos Using Diffusion Models
- 1 Introduction
- 2 Related Work
- 2.1 Image Editing
- 2.2 Image Editing
- 3 Method
- 3.1 Method Overview
- 3.2 Foreground Diffusion
- 3.3 Cross-Frame Spatial Feature Fusion
- 4 Experiment
- 4.1 Implementation Details
- 4.2 Metrics
- 4.3 Ablation Study
- 4.4 Quantitative Evaluation
- 4.5 Qualitative Evaluation
- 5 Conclusion
- References
- Utilizing Stable Diffusion to Enhance Car Parts Detection
- 1 Introduction
- 2 FGCP Dataset
- 3 Method
- 3.1 Data Augmentation with Diffusion Model
- 3.2 Diffusion-Based Detector
- 4 Experiment
- 4.1 Car Parts Detection
- 4.2 Car Parts Detection with Augmented Data
- 4.3 Comparison of Diffusion-Based Detector (DBD) and YOLOv8X
- 5 Conclusion
- References
- MonoRetNet: A Self-supervised Model for Monocular Depth Estimation with Bidirectional Half-Duplex Retention
- 1 Introduction
- 2 Related Work
- 2.1 Self-supervised Monocular Depth Estimation
- 2.2 Retention Mechanism
- 3 MonoRetNet
- 3.1 Framework Overview
- 3.2 Depth Network
- 3.3 Pose Network
- 3.4 Self-supervised Learning
- 4 Experiments
- 4.1 Dataset
- 4.2 Results
- 4.3 Ablation Study
- 5 Conclusion
- References
- Lightweight Human Pose Estimation Model for Industrial Scenarios
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 Lightweight C2f Module
- 3.2 Head Structure Optimisation
- 4 Experiments Preparation
- 4.1 Dataset and Experimental Environment
- 4.2 Evaluation Criteria
- 5 Experimental Results
- 5.1 Ablation Experiment
- 5.2 Comparative Experiment
- 5.3 Visualizing Results
- 6 Conclusions
- References
- Improved YOLOv8 Method for Multi-scale Pothole Detection
- 1 Introduction
- 2 Improvement
- 2.1 Backbone
- 2.2 Neck
- 2.3 Loss Function
- 3 Experimental Analysis
- 3.1 Datasets and Experimental Environment
- 3.2 Evaluate Metrics
- 3.3 Experiment on the Effectiveness of the Small Object Detection Layer
- 3.4 Comparative Experiments on Horizontal Improvements of Some Modules
- 3.5 Ablation Experiments
- 3.6 Comparative Experiments Among Different Algorithms
- 3.7 Visualization
- 3.8 Conclusion
- References
- ST-CLIP: Spatio-Temporal Enhanced CLIP Towards Dense Video Captioning
- 1 Introduction
- 2 Method
- 2.1 Model Architecture
- 2.2 Cross-Frame Spatio-Temporal Attention Mechanism
- 2.3 Cross-Modal Feature Interaction Mechanism
- 2.4 Training
- 3 Experiments
- 3.1 Experimental Settings
- 3.2 Main Results
- 3.3 Ablation Studies
- 4 Conclusion and Future Work
- References
- SCW-YOLO: An Improved Algorithm for Fall Detection Based on Deep Learning
- 1 Introduction
- 2 Related Works
- 2.1 Wearable Sensor
- 2.2 Computer Vision
- 2.3 YOLOv8 Model
- 3 Methodology
- 3.1 C2fSCConv Module
- 3.2 CARAFE Upsampling
- 3.3 WIoU Loss Function
- 4 Experiment Design and Result Analysis
- 4.1 Dataset and Evaluation Metrics
- 4.2 Experimental Setup
- 4.3 Results Analysis
- 5 Conclusion
- References
- One Stage Near-Ground Pothole Object Detection
- 1 Introduction
- 2 Related Work
- 3 The Proposed Method
- 3.1 Image Acquisition and Processing
- 3.2 The Designed Network Framework
- 3.3 Anchor Design
- 3.4 NMS Post Processing
- 3.5 The Loss Function
- 3.6 Increase Receptive Field (N-rank)
- 3.7 High and Low Resolution Information Interaction (BI-FM)
- 4 Experiments
- 4.1 Experimental Setup
- 4.2 Performance Metrics
- 4.3 Results and Discussion
- 5 Conclusion
- References
- Multi-layered Stixels Prompted by Semantic Information
- 1 Introduction
- 2 Overview of the Algorithm
- 3 Our Stixels Model
- 3.1 Semantic Labels
- 3.2 Disparity Labels
- 3.3 Semantic Stixels
- 4 Experiment
- 4.1 Semantic Segmentation Network Training
- 4.2 Multi-layer Stixel-World Experiments
- 4.3 Results
- 5 Conclusion
- References
- A New Multi-task Network for Autonomous Driving: Efficientnetv1Unet
- 1 Introduction
- 2 Related Work
- 2.1 Object Detection
- 2.2 Segmentation
- 2.3 Height and Width Restriction Detection
- 2.4 Multi-task Learning
- 3 Methodology
- 3.1 Encoder
- 3.2 Decoder
- 3.3 Loss Function
- 4 Experiments
- 4.1 Details Setting
- 4.2 Experimental Results
- 5 Conclusion
- References
- Automatic Detection and Assessment of Corals in Shallow Sea Regions Based on Deep Learning Models
- 1 Introduction
- 2 Methodology
- 2.1 Overview
- 2.2 Yolov8 with Embedded Attention Module
- 2.3 Deep SORT Object Matching and Tracking Algorithms
- 3 Datasets
- 4 Experiments
- 4.1 Training Details
- 4.2 Evaluation Metrics
- 4.3 Experimental Evaluation
- 5 Coral Detection and Assessment
- 6 Conclusions
- References
- Rust Detection Network for Transmission Line Based on UAV Inspection
- 1 Introduction
- 2 Related Work
- 3 Method
- 3.1 YOLOv8 Algorithm
- 3.2 Improved YOLOv8 Algorithm
- 4 Experimental Design and Results Analysis
- 4.1 Experimental Equipment
- 4.2 Evaluation Index
- 4.3 Results
- 4.4 Ablation Experiment
- 4.5 Visualisation of Experimental Results
- 5 Conclusion
- References
- ETVKT: Enhanced Training Vector for Knowledge Tracing
- 1 Introduction
- 2 Related Work
- 3 ETVKT Model
- 3.1 Problem Definition
- 3.2 Question Information Embedding
- 4 Experiment
- 4.1 Baselines
- 4.2 Analysis of Prediction Results
- 4.3 Training Cycle
- 4.4 Visualization of Learner Mastery Status
- 4.5 Visualization of Learner Answer Prediction
- 5 Conclusion
- References
- DGAP-YOLO: A Crack Detection Method Based on UAV Images and YOLO
- 1 Introduction
- 2 Related Work
- 2.1 Overview of Target Detection
- 2.2 Crack Detection Method
- 3 Method
- 3.1 Attention Module
- 3.2 Part LowAFPN
- 4 Experimental Design and Result Analysis
- 4.1 Experimental Settings
- 4.2 Evaluation Index
- 4.3 Experimental Results
- 5 Conclusion
- References
- Author Index
Systemvoraussetzungen
Dateiformat: PDF
Kopierschutz: Wasserzeichen-DRM (Digital Rights Management)
Systemvoraussetzungen:
- Computer (Windows; MacOS X; Linux): Verwenden Sie zum Lesen die kostenlose Software Adobe Reader, Adobe Digital Editions oder einen anderen PDF-Viewer Ihrer Wahl (siehe E-Book Hilfe).
- Tablet/Smartphone (Android; iOS): Installieren Sie bereits vor dem Download die kostenlose App Adobe Digital Editions oder die App PocketBook (siehe E-Book Hilfe).
- E-Book-Reader: Bookeen, Kobo, Pocketbook, Sony, Tolino u.v.a.m.
Das Dateiformat PDF zeigt auf jeder Hardware eine Buchseite stets identisch an. Daher ist eine PDF auch für ein komplexes Layout geeignet, wie es bei Lehr- und Fachbüchern verwendet wird (Bilder, Tabellen, Spalten, Fußnoten). Bei kleinen Displays von E-Readern oder Smartphones sind PDF leider eher nervig, weil zu viel Scrollen notwendig ist. Mit Wasserzeichen-DRM wird hier ein „weicher” Kopierschutz verwendet. Daher ist technisch zwar alles möglich – sogar eine unzulässige Weitergabe. Aber an sichtbaren und unsichtbaren Stellen wird der Käufer des E-Books als Wasserzeichen hinterlegt, sodass im Falle eines Missbrauchs die Spur zurückverfolgt werden kann.
Weitere Informationen finden Sie in unserer E-Book Hilfe.