
Document Analysis Systems
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
The full papers presented were carefully reviewed and selected from numerous submissions addressing key techniques of document analysis.
More details
Other editions
Additional editions

Content
- Intro
- Preface
- Organization
- Contents
- Document Analysis Systems and Applications
- Font Shape-to-Impression Translation
- 1 Introduction
- 2 Related Work
- 2.1 Subjective Impression Analysis of Fonts
- 2.2 Objective Impression Analysis of Fonts
- 2.3 Transformer
- 3 Dataset and Local Descriptor
- 4 Shape-Impression Relationship Analysis by Multi-label Classification Approach
- 4.1 Transformer as a Multi-label Classifier
- 4.2 Implementation Details
- 4.3 Classification Examples
- 4.4 Shape-Impression Relation Analysis by Group-Based Occlusion Sensitivity
- 5 Shape-Impression Relationship Analysis by Translation Approach
- 5.1 Transformer as a Shape-to-Impression Translator
- 5.2 Implementation Details
- 5.3 Translation Examples
- 5.4 Shape-Impression Relation Analysis Using Integrated Gradients
- 6 Experimental Results
- 6.1 Quantitative Evaluation of the Trained Transformer
- 6.2 Analysis Results of the Shape-Impression Relationship
- 7 Conclusion and Future Work
- References
- TrueType Transformer: Character and Font Style Recognition in Outline Format
- 1 Introduction
- 2 Related Work
- 2.1 Transformer
- 2.2 Font Analysis by Using Vector Format
- 3 TrueType Transformer (T3)
- 3.1 Representation of Outline
- 3.2 Transformer Model for T3
- 4 Experiment
- 4.1 Dataset
- 4.2 Implementation Details
- 4.3 Quantitative Comparison Between Outline-Based and Image-Based Recognition Methods
- 4.4 Qualitative Comparison Between Outline-Based and Image-Based Recognition Methods
- 4.5 Analysis of Learned Attention
- 5 Conclusion
- References
- Unified Line and Paragraph Detection by Graph Convolutional Networks
- 1 Introduction
- 2 Related Work
- 2.1 Text Line Detection
- 2.2 Paragraph Detection
- 3 Proposed Method
- 3.1 Pure Bounding Box Input
- 3.2 Problem Statement
- 3.3 Main Challenge
- 3.4 -skeleton Graph with 2-Hop Connections
- 3.5 GCN Predictions
- 3.6 Forming Lines
- 3.7 Forming Paragraphs
- 3.8 Overall System Pipeline
- 4 Limitations
- 4.1 Single-Line Paragraphs
- 4.2 Document Rotations
- 5 Experiments
- 5.1 PubLayNet Results
- 5.2 Real-World Evaluation Results
- 6 Conclusions and Future Work
- References
- The Winner Takes It All: Choosing the ``best'' Binarization Algorithm for Photographed Documents
- 1 Introduction
- 2 Quality-Time Evaluation Methods
- 3 Test Set
- 4 Results
- 4.1 Motorola Moto G9
- 4.2 Samsung A10S
- 4.3 Samsung S20
- 4.4 Apple iPhone SE
- 5 Conclusions
- References
- A Multilingual Approach to Scene Text Visual Question Answering
- 1 Introduction
- 2 Related Work
- 2.1 Word Embeddings
- 2.2 Scene Text Visual Question Answering
- 3 Methodology
- 3.1 Word Embeddings
- 3.2 Visual Question Answering Architecture
- 4 Experiments
- 4.1 Datasets
- 4.2 Evaluation Metrics
- 4.3 Implementation Details
- 4.4 VQA Experiments
- 5 Conclusions
- References
- Information Extraction and Applications
- Sequence-to-Sequence Models for Extracting Information from Registration and Legal Documents
- 1 Introduction
- 2 Related Work
- 3 Methodology
- 3.1 Questions and Answers
- 3.2 Compound QAs
- 3.3 Sentence IDs and Canonical Format
- 4 Experimental Setup
- 4.1 Models
- 4.2 Datasets
- 4.3 Training and Inference
- 5 Results
- 5.1 Experiments for Compound QAs
- 5.2 Experiments for Sentence IDs and Canonical Format
- 5.3 Comparison with BERT on a NER Task
- 6 Conclusion
- References
- Contrastive Graph Learning with Graph Convolutional Networks
- 1 Introduction
- 2 Related Work
- 3 Methodology
- 3.1 Graph Representation
- 3.2 Contrastive Graph Learning
- 4 Graph Convolution
- 4.1 Loss
- 5 Experiments
- 5.1 Datasets
- 5.2 Implementation Details
- 5.3 Baseline Methods
- 5.4 Comparison with Baseline Methods
- 5.5 Supervised vs. Semi-supervised Contrastive Graph Learning
- 5.6 Ablation Studies
- 6 Conclusion and Future Work
- References
- Improving Information Extraction on Business Documents with Specific Pre-training Tasks
- 1 Introduction
- 2 Related Work
- 2.1 Information Extraction
- 2.2 Pre-training
- 3 Models
- 3.1 Architecture
- 3.2 ConfOpt Post-processing
- 4 Pre-training
- 4.1 Numeric Ordering Task
- 4.2 Layout Inclusion Task
- 5 Datasets
- 5.1 Business Documents Collection
- 5.2 Business Documents Collection - Purchase Orders
- 5.3 ICDAR 2019 - Scanned Receipts
- 6 Experiments
- 6.1 Post-processing
- 6.2 Business Document-Specific Pre-training
- 7 Conclusion
- References
- How Confident Was Your Reviewer? Estimating Reviewer Confidence from Peer Review Texts-5pt
- 1 Introduction
- 2 Related Work
- 3 Methodology
- 3.1 Problem Statement
- 3.2 Framework
- 3.3 Baseline Models
- 4 Data Description and Experimental Setup
- 4.1 Implementation Details
- 5 Experimental Results and Discussion
- 5.1 Evaluation Criteria
- 5.2 Cross-Year Experiments
- 5.3 Ablation Study
- 5.4 Non Parametric (Levene's) Test
- 5.5 Observations
- 5.6 Error Analysis
- 6 Conclusion and Future Work
- References
- Historical Document Analysis + CSAWA
- Recognition and Information Extraction in Historical Handwritten Tables: Toward Understanding Early 20th Century Paris Census
- 1 Introduction
- 2 Corpus and Ground-Truthed Datasets
- 2.1 Presentation of the Census
- 2.2 Annotation of Two Datasets
- 3 Processing Pipeline
- 4 Layout Analysis and Information Extraction
- 4.1 Segmentation and Dewarping of Tables
- 4.2 Page Classification
- 4.3 Segmentation of Tables into Rows
- 5 Handwriting Recognition
- 5.1 Architecture of the Optical Model
- 5.2 Results of the Optical Model
- 5.3 Self-training
- 6 Leveraging Domain Knowledge
- 6.1 Language Models
- 6.2 Normalization and Logical Deductions
- 7 Processing Time
- 8 Conclusion
- References
- Importance of Textlines in Historical Document Classification
- 1 Introduction
- 2 Related Work
- 3 Datasets
- 4 Document Classification Systems
- 4.1 Loss Functions
- 4.2 Patch System
- 4.3 Textline System
- 4.4 System Fusion
- 5 Experiments
- 5.1 Experimental Setup
- 5.2 Results
- 6 Conclusion
- References
- Historical Map Toponym Extraction for Efficient Information Retrieval
- 1 Introduction
- 2 Related Work
- 3 Dataset
- 4 Toponym Processing Approach
- 4.1 Toponym Detection Methods
- 4.2 Toponym Classification Method
- 4.3 OCR Models
- 5 Experiments
- 5.1 Toponym Detection
- 5.2 Toponym Classification
- 5.3 OCR Results
- 6 Conclusions and Future Work
- References
- Information Extraction from Handwritten Tables in Historical Documents
- 1 Introduction
- 2 Related Work
- 3 HisClima Dataset
- 4 Proposed Approaches
- 4.1 Heuristic Geometric Information
- 4.2 Log-Linear Model
- 4.3 Graph Neural Network
- 5 Evaluation Criteria and Metrics
- 6 Experimental Framework and Results
- 6.1 Experimental Settings
- 6.2 Text Recognition Results
- 6.3 Information Extraction Results
- 7 Discussion
- 8 Reproducibility
- 9 Conclusions
- References
- Named Entity Linking on Handwritten Document Images
- 1 Introduction
- 2 Named Entity Linking
- 3 Dataset
- 3.1 Synthetic HW-AIDA-CoNLL
- 3.2 IAM-DB
- 3.3 George Washington
- 4 Baseline Approach
- 4.1 Handwriting Text Recognizer
- 4.2 Named Entity Linking
- 5 Experiments
- 5.1 Evaluation Protocol
- 5.2 Results
- 6 Conclusion
- References
- Pattern Analysis Software Tools (PAST) for Written Artefacts
- 1 Introduction
- 2 The Handwriting Analysis Tool (HAT)
- 2.1 Basic Functionality
- 2.2 Use Case
- 3 The Visual-Pattern Detector (VPD)
- 3.1 Basic Functionality
- 3.2 Use Case
- 4 The Line Detection Tool (LDT)
- 4.1 Basic Functionality
- 4.2 Use Case
- 5 The XRF-Data Analysis Tool (XRF-DAT)
- 5.1 Basic Functionality
- 5.2 Use Case
- 6 The Artefact-Feature Analysis Tool (AFAT)
- 6.1 Basic Functionality
- 6.2 Use Case
- 7 Conclusion
- References
- TEI-Based Interactive Critical Editions
- 1 Introduction
- 2 Preliminaries
- 3 Related Work
- 4 Critical Edition - Critical Texts of Cankam Literature
- 5 Transforming Documents into TEI
- 6 Interactive Critical Editions
- 7 Databasing on Demand
- 8 An Annotation System for the Humanities
- 9 Application and Results
- 10 Conclusion and Future Work
- References
- Handwriting Text Recognition
- Best Practices for a Handwritten Text Recognition System
- 1 Introduction
- 2 Related Work
- 3 Proposed HTR System
- 3.1 Preprocessing
- 3.2 Network Architecture
- 3.3 Training Scheme
- 4 Experimental Evaluation
- 4.1 Ablation Study
- 4.2 Comparison to State-of-the-Art Systems
- 5 Conclusions
- References
- Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes
- 1 Introduction
- 2 Related Work
- 3 Data
- 4 Methods
- 4.1 Preprocessing
- 4.2 Network-Architecture
- 4.3 Training
- 4.4 Inference
- 4.5 Language Model
- 4.6 Synthetic Line Generation
- 5 Results
- 5.1 Language Models
- 5.2 Pretraining on Synthetic Data
- 5.3 Training on Real Data
- 5.4 Enabling the Language Model
- 5.5 Varying the Beam Count
- 5.6 Ablation Study
- 5.7 Comparison with the State of the Art
- 6 Future Work
- References
- A Light Transformer-Based Architecture for Handwritten Text Recognition
- 1 Introduction
- 2 Related Works
- 2.1 Standard Approaches for HTR
- 2.2 Transformer-Based Architectures
- 3 Our Light Encoder-Decoder Transformer-Based Model
- 3.1 Summary of the Architecture
- 3.2 Network Encoder
- 3.3 Transformer Decoder
- 3.4 Hybrid Loss
- 4 Experiments and Results
- 4.1 Handwritten Text-line Data
- 4.2 Experimental Settings
- 4.3 Ablation Study of the Main Components of Our Network
- 4.4 Benefits of Using a Light Architecture
- 4.5 Interest of the Hybrid Loss
- 4.6 Comparison with the State of the Art
- 5 Conclusion and Future Works
- References
- Effective Crowdsourcing in the EDT Project with Probabilistic Indexes
- 1 Introduction
- 2 Preparing the PrIx System for the Crowdsourcing Platform
- 3 Description of the Collections
- 3.1 EDT Hungary
- 3.2 EDT Norway
- 3.3 EDT Portugal
- 3.4 EDT Spain
- 3.5 EDT Malta
- 4 HTR and PrIx Experiments and Results
- 5 Crowdsourcing Platform
- 6 Conclusions
- References
- Applications in Handwriting
- Paired Image to Image Translation for Strikethrough Removal from Handwritten Words
- 1 Introduction
- 2 Related Works
- 2.1 Strikethrough Processing
- 2.2 Paired Image to Image Translation
- 2.3 Strikethrough Datasets
- 3 Image to Image Translation Models for Strikethrough Removal
- 4 Experiment Setup
- 4.1 Datasets
- 4.2 Neural Network Training Protocol
- 4.3 Evaluation Protocol
- 5 Results and Analysis
- 5.1 Models Trained on IAMsynth
- 5.2 Models Trained on Individual Partitions of Draculasynth
- 5.3 Models Trained on the Aggregation of Partitions from Draculasynth
- 5.4 Qualitative Results
- 6 Conclusions
- A Dataset and Code Availability
- References
- Revealing Reliable Signatures by Learning Top-Rank Pairs
- 1 Introduction
- 2 Related Work
- 3 Learning Top-Rank Pairs
- 3.1 Feature Representation of Paired Samples
- 3.2 Optimization to Learn Top-Rank Pairs
- 3.3 Learning Top-Rank Pairs with Their Representation
- 3.4 Initial Features by SigNet
- 4 Experiments
- 4.1 Datasets
- 4.2 Experimental Settings
- 4.3 Evaluation Metrics
- 4.4 Quantitative and Qualitative Evaluations
- 5 Conclusion
- References
- On-the-Fly Deformations for Keyword Spotting
- 1 Introduction
- 2 Related Work
- 3 Reference KWS System
- 3.1 Preprocessing
- 3.2 Proposed Architecture
- 3.3 Training Process
- 3.4 Retrieval Application
- 4 On-the-Fly Deformation
- 4.1 Considered Deformations
- 4.2 Query-Based Deformation
- 4.3 Implementation Aspects
- 5 Experimental Evaluation
- 5.1 Ablation Study
- 5.2 Comparison to State-of-the-Art Systems
- 6 Conclusions and Future Work
- References
- Writer Identification and Writer Retrieval Using Vision Transformer for Forensic Documents
- 1 Introduction
- 2 Related Work
- 2.1 Methods with Enrollment
- 2.2 Methods Without Enrollment
- 3 WI/WR Using Vision Transformer
- 3.1 Preprocessing
- 3.2 ViT-Lite
- 3.3 Aggregation/Encoding
- 4 Experimental Setup
- 4.1 Datasets
- 4.2 Training Details
- 4.3 Evaluation
- 5 Results
- 5.1 CVL Dataset
- 5.2 ICDAR 2013 WI/WR Competition Dataset
- 5.3 WRITE Dataset
- 5.4 Comparison to State of the Art
- 6 Conclusion
- References
- Approximate Search for Keywords in Handwritten Text Images
- 1 Introduction
- 2 Probabilistic Indexing and Search
- 2.1 Multi-word Boolean Queries
- 3 Approximate-Spelling Queries
- 3.1 Algorithmics
- 4 Dataset, Assessment, Queries, and Empirical Settings
- 4.1 Dataset
- 4.2 Query Selection
- 4.3 Evaluation Protocols and Measures
- 4.4 Experimental Settings
- 5 Experiments and Results
- 5.1 Retrieval Performance
- 5.2 Computational Performance
- 5.3 Illustrative Retrieval Examples
- 6 Conclusion
- References
- Keyword Spotting with Quaternionic ResNet: Application to Spotting in Greek Manuscripts
- 1 Introduction
- 2 Quaternions in Neural Networks
- 2.1 Elementary Notions
- 2.2 Quaternionized Versions of Standard NN Layers
- 3 Why Are Quaternionic Layers Less Costly?
- 4 Proposed Model
- 5 Experiments
- 5.1 Datasets
- 5.2 Hyperparameters and Other Training Considerations
- 5.3 Results
- 6 Conclusion and Future Work
- References
- Open-Source Software and Benchmarking
- A Comprehensive Comparison of Open-Source Libraries for Handwritten Text Recognition in Norwegian
- 1 Introduction
- 2 Related Work
- 3 The Hugin-Munin Dataset for HTR in Norwegian
- 3.1 Overview
- 3.2 Dataset
- 3.3 Transcription Process
- 3.4 Language
- 4 HTR Libraries and Models
- 4.1 Selection of the Libraries
- 4.2 Description of the Selected Libraries
- 4.3 Training of the Models
- 5 Results
- 5.1 Random Split
- 5.2 Random Split by Writer with Unseen Writers
- 6 Conclusion
- References
- Open Source Handwritten Text Recognition on Medieval Manuscripts Using Mixed Models and Document-Specific Finetuning
- 1 Introduction
- 2 Related Work
- 3 Data Sets
- 4 Methods
- 5 Experiments
- 5.1 Determining the Best Starting Model
- 5.2 Iterative Document-Specific Training
- 6 Discussion
- 7 Conclusion and Future Work
- References
- A Comprehensive Study of Open-Source Libraries for Named Entity Recognition on Handwritten Historical Documents
- 1 Introduction
- 2 Related Work
- 3 Handwritten Historical Document Corpora
- 3.1 Nested Entities in HOME Corpus
- 4 Named Entity Recognition Libraries
- 5 Experiments
- 5.1 Evaluation Metrics
- 5.2 Hyperparameters and Model Training
- 6 Results
- 7 Conclusions and Future Work
- References
- A Benchmark of Named Entity Recognition Approaches in Historical Documents Application to 19th Century French Directories
- 1 Introduction
- 2 OCR and NER on Historical Texts
- 2.1 Optical Character Recognition of Historical Texts
- 2.2 Named Entity Recognition in Historical Texts
- 2.3 Pipeline Summary
- 3 Dataset
- 3.1 A Selection of Paris Trade Directories from 1798 to 1854
- 3.2 A Dataset for OCR and NER Evaluation
- 3.3 Metrics for OCR and NER Quality Assessment
- 4 OCR Benchmark
- 5 NER Sensibility to the Number of Training Examples
- 5.1 Training and Evaluation Protocol
- 5.2 Results and Discussion
- 6 Impact of OCR Noise on Named Entity Recognition
- 6.1 Training and Evaluation Protocol
- 6.2 Results and Discussion
- 7 Conclusion and Future Works
- References
- NCERT5K-IITRPR: A Benchmark Dataset for Non-textual Component Detection in School Books
- 1 Introduction
- 2 Related Datasets
- 3 NCERT5K-IITRPR Dataset
- 3.1 Source and Statistics
- 3.2 Label Categories
- 3.3 Annotation Method
- 4 Benchmarking
- 4.1 Models
- 4.2 Experimental Setup
- 4.3 Results and Analysis
- 5 Conclusion
- References
- Poster Session 1
- ReadOCR: A Novel Dataset and Readability Assessment of OCRed Texts
- 1 Introduction
- 2 Proposed Dataset
- 2.1 Document Collection
- 2.2 Proposed Text Corpus
- 2.3 Dataset Analysis
- 3 Readability Assessment
- 3.1 Methods
- 3.2 Experimental Results
- 4 Conclusions
- References
- Hard and Soft Labeling for Hebrew Paleography: A Case Study
- 1 Introduction
- 2 Related Work
- 3 Hebrew Paleography
- 4 VML-HP-ext Dataset Description
- 5 Case Study
- 5.1 Hard-Label Classification
- 5.2 Soft-Label Regression
- 5.3 Maximum Score Class Assignment
- 5.4 Nearest Neighbor Label Conversion
- 5.5 Comparison Between Soft and Hard-Label Classification
- 6 Conclusion and Further Research
- References
- AttentionHTR: Handwritten Text Recognition Based on Attention Encoder-Decoder Networks
- 1 Introduction
- 2 Related Work
- 3 Attention-Based Encoder-Decoder Network
- 4 Experimental Results
- 4.1 Datasets
- 4.2 Hyper-parameters
- 4.3 Results
- 4.4 Hyper-parameter Tuning
- 4.5 Ablation Study
- 4.6 Test Set Errors
- 5 Error Analysis
- 5.1 Character-Level Error Analysis
- 5.2 Bias-Variance Analysis
- 5.3 Visual Analysis of Images
- 6 Conclusion
- References
- .26em plus .1em minus .1emHST-GAN: Historical Style Transfer GAN for Generating Historical Text Images
- 1 Introduction
- 2 Related Work
- 2.1 Datasets
- 2.2 Data Preparation
- 3 Method
- 3.1 Model Framework
- 3.2 Objective Function
- 4 Experiments
- 4.1 Style Transfer Evaluation
- 4.2 Data Augmentation Using Style Transfer
- 5 Conclusion
- References
- Challenging Children Handwriting Recognition Study Exploiting Synthetic, Mixed and Real Data
- 1 Introduction
- 1.1 Children Handwriting Recognition Context and Problematic
- 1.2 ScolEdit: A Small Real Children Handwriting Dataset
- 1.3 Investigating Variable Training Datasets Composition
- 2 Related Works for Latin Handwritten Text Recognition
- 3 Scoledit: A Real Children Handwriting Annotated Dataset
- 3.1 Line Cleaning
- 3.2 Words Detection
- 3.3 Words Annotation on IAM Format
- 4 HTR Architecture and Recognition Scenarios
- 4.1 Standard MDLSTM-RNN Word Transcriber
- 4.2 Mixing Real and Synthetic Data to Enhance the Recognition Rates
- 4.3 Scenarios for HTR Training and Data Preparation
- 5 Experiments and Results
- 5.1 First Scenario: Supervised Selection of Validation Datasets and Domain Transfer
- 5.2 Second Scenario: Training Focused on Dictation Words
- 5.3 Third Scenario: Large Lexicon Training with Transfer
- 6 Conclusions
- References
- Combining Image Processing Techniques, OCR, and OMR for the Digitization of Musical Books
- 1 Introduction
- 2 The Music in the Santo Domingo's Cathedral Book
- 3 Related Work
- 4 Methods
- 4.1 Block Detection
- 4.2 OCR
- 4.3 OMR
- 4.4 Data Storage
- 5 Discussion
- 6 Conclusions
- References
- Evaluation of Named Entity Recognition in Handwritten Documents
- 1 Introduction
- 2 Related Work
- 3 Framework
- 3.1 Characteristics of the Task
- 3.2 HTR and NER via a Coupled Model
- 3.3 Error Correction
- 4 Evaluation Metrics
- 4.1 Character and Word Error Rates
- 4.2 Precision, Recall and F1-Score
- 4.3 Entity CER and Entity WER
- 5 Experimental Method
- 5.1 Dataset
- 5.2 Implementation Details
- 5.3 Obtained Results
- 6 Conclusions
- References
- A Generic Image Retrieval Method for Date Estimation of Historical Document Collections
- 1 Introduction
- 2 Related Work
- 3 Datasets
- 4 Learning Objectives
- 5 Proposed Method
- 6 Application
- 6.1 Smooth-nDCG Human-in-the-Loop Architecture
- 6.2 Quantitaive Evaluation
- 7 Conclusions
- References
- Combining Visual and Linguistic Models for a Robust Recipient Line Recognition in Historical Documents
- 1 Introduction
- 2 Related Work
- 3 Background: Handwritten Text Recognition with Transformers
- 4 Methodology
- 4.1 Semantic Segmentation of Recipient Lines
- 4.2 Handwritten Text Recognition and Recipient Line Classification with CNN
- 4.3 Joint Recipient Line Recognition and Transcription
- 4.4 Combination of Different Approaches
- 5 Experimental Evaluation
- 5.1 Nuremberg Letterbooks
- 5.2 Experiment Details
- 5.3 Metrics
- 5.4 Results
- 6 Discussion
- 7 Conclusion
- References
- Investigating the Effect of Using Synthetic and Semi-synthetic Images for Historical Document Font Classification
- 1 Introduction
- 2 Related Work
- 3 Dataset and Image Generation
- 3.1 Dataset
- 3.2 Semi-synthetic Image Generation Using DocCreator
- 3.3 Synthetic Image Generation Using Generative Adversarial Networks
- 4 Experiments
- 5 Results
- 6 Conclusion and Future Work
- References
- Poster Session 2
- 3D Modelling Approach for Ancient Floor Plans' Quick Browsing
- 1 Introduction
- 2 Related Work
- 2.1 Wall Detection
- 2.2 3D Modelling
- 3 Proposed Approach
- 3.1 Floor Plan Digitization
- 3.2 Wall Mask Generation
- 3.3 3D modelling
- 4 Results and Evaluation
- 4.1 Dataset
- 4.2 Evaluation Protocol
- 4.3 Results
- 5 Perspectives and Conclusion
- References
- A Comparative Study of Information Extraction Strategies Using an Attention-Based Neural Network
- 1 Introduction
- 2 Related Works
- 2.1 Handwriting Recognition
- 2.2 Information Extraction
- 2.3 Our Statement
- 3 The Attention-Based Seq2seq Architecture
- 4 Strategies for Information Extraction
- 4.1 Comparing the Sequential and Joint Approaches
- 4.2 Exploring Additional Joint Learning Configurations
- 5 Experiments
- 5.1 Handwriting Recognition Using Seq2seq
- 5.2 Information Extraction Using Seq2seq
- 6 Conclusion
- References
- QAlayout: Question Answering Layout Based on Multimodal Attention for Visual Question Answering on Corporate Document
- 1 Introduction
- 2 Related Work
- 3 Problem Definition
- 4 Proposed Approach
- 4.1 Global Description
- 4.2 Encoder
- 4.3 Co-attention
- 5 Experiments
- 5.1 Dataset
- 5.2 Performance Evaluation
- 6 Conclusion and Future Work
- References
- Is Multitask Learning Always Better?
- 1 Introduction
- 2 Related Work
- 3 Datasets
- 4 Methodology
- 5 Evaluation
- 5.1 ResNet
- 5.2 Perceiver
- 6 Discussion
- 7 Conclusions
- References
- SciBERTSUM: Extractive Summarization for Scientific Documents
- 1 Introduction
- 2 Related Work
- 2.1 Summarization
- 2.2 Transformer Based Summarization
- 3 Method - SciBERTSUM
- 4 Language Model Architecture
- 4.1 Embedding Layer
- 4.2 Attention Mechanism
- 4.3 Transformer Layer
- 5 Sentence Extractor
- 5.1 Sentence Features
- 5.2 Document Embedding
- 5.3 Score Predictor
- 6 Reinforcement Learning
- 7 Experimental Results
- 7.1 Hardware
- 7.2 Experiments
- 8 Conclusions and Future Work
- References
- Using Multi-level Segmentation Features for Document Image Classification
- 1 Introduction
- 2 Related Work
- 3 Proposed Method
- 3.1 Integrated CNN Architecture
- 3.2 Implementation Details
- 4 Experiments
- 4.1 Datasets
- 4.2 Experimental Results
- 5 Conclusion
- References
- Eye Got It: A System for Automatic Calculation of the Eye-Voice Span
- 1 Introduction
- 2 Eye Got It
- 2.1 Eye Tracking
- 2.2 Speech Processing
- 2.3 EVS Computation
- 3 Experiment
- 3.1 Participants
- 3.2 Apparatus and Material
- 3.3 Procedure
- 3.4 Results
- 4 Discussion
- 4.1 Eye Tracking
- 4.2 Audio
- 5 Conclusion
- References
- Text Detection and Post-OCR Correction in Engineering Documents
- 1 Introduction
- 2 Related Work
- 2.1 Text Detection in Unconstrained Documents
- 2.2 Post-OCR Correction
- 3 Our Approach for Lexicon-Free Text Recognition
- 3.1 EAST-Based Text Detection
- 3.2 Open-Source Engine for Text Recognition
- 3.3 Post-OCR Correction of Tags and Lexicon-Free Worlds
- 4 Experimentation and Discussions
- 4.1 Dataset
- 4.2 Results
- 5 Conclusion
- References
- TraffSign: Multilingual Traffic Signboard Text Detection and Recognition for Urdu and English
- 1 Introduction
- 2 Related Work
- 2.1 Text Detection
- 2.2 Text Recognition
- 2.3 Standard Benchmark Datasets for Text Detection and Recognition
- 3 Dataset Preparation
- 3.1 Data Acquisition and Pre-processing
- 3.2 Multi-lingual Text Detection and Recognition
- 4 The Methodology
- 4.1 Multi-lingual Text Detection Architecture
- 4.2 Text Recognition Architecture
- 5 Experiments and Results
- 5.1 Evaluation of Multi-lingual Text Detection Methods
- 5.2 Evaluation of Multi-lingual Text Recognition Methods
- 5.3 Evaluation of Proposed Text-Detection and Recognition Models as an End-to-End Pipeline
- 6 Conclusions
- References
- Read While You Drive - Multilingual Text Tracking on the Road
- 1 Introduction
- 2 Related Work
- 2.1 Datasets for Text Spotting in Videos
- 2.2 Text Detection
- 2.3 Text Tracking
- 2.4 Multiple Object Tracking Metrics
- 3 RoadText-3K Dataset
- 3.1 Videos
- 3.2 Annotations
- 3.3 Analysis
- 4 Methodology
- 4.1 Text Detection
- 4.2 Text Tracking
- 4.3 CenterNet-Based Detection and Tracking
- 5 Results
- 5.1 Frame Level Text Detection
- 5.2 Tracking
- 5.3 Qualitative Analysis
- 6 Conclusions
- References
- A Fair Evaluation of Various Deep Learning-Based Document Image Binarization Approaches
- 1 Introduction
- 2 Overview of Evaluated Binarization Methods
- 2.1 Document Enhancement Generative Adversarial Network
- 2.2 SauvolaNet
- 2.3 Two-Stage GAN
- 2.4 Robin U-Net Model
- 2.5 DP-LinkNet
- 2.6 Selectional Auto-Encoder
- 2.7 DeepOtsu
- 3 Materials and Methods
- 3.1 Datasets
- 3.2 Metrics
- 3.3 Training
- 4 Evaluation
- 5 Conclusion
- References
- Correction to: How Confident Was Your Reviewer? Estimating Reviewer Confidence from Peer Review Texts
- Correction to: Chapter "How Confident Was Your Reviewer? Estimating Reviewer Confidence from Peer Review Texts" in: S. Uchida et al. (Eds.): Document Analysis Systems, LNCS 13237, https://doi.org/10.1007/978-3-031-06555-2_9
- Author Index
System requirements
File format: PDF
Copy protection: Watermark-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use the free software Adobe Reader, Adobe Digital Editions, or any other PDF viewer of your choice (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or another reading app for eBooks, e.g., PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Watermark-DRM, a „soft” copy protection. This means that there are no technical restrictions to prevent illegal distribution. However, there is a personalised watermark embedded in the eBook that can be used to identify the purchaser of the eBook in the event of misuse and to provide evidence for legal purposes.
For more information, see our eBook Help page.