toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
  Records Links
Author Wenwen Yu; Chengquan Zhang; Haoyu Cao; Wei Hua; Bohan Li; Huang Chen; Mingyu Liu; Mingrui Chen; Jianfeng Kuang; Mengjun Cheng; Yuning Du; Shikun Feng; Xiaoguang Hu; Pengyuan Lyu; Kun Yao; Yuechen Yu; Yuliang Liu; Wanxiang Che; Errui Ding; Cheng-Lin Liu; Jiebo Luo; Shuicheng Yan; Min Zhang; Dimosthenis Karatzas; Xing Sun; Jingdong Wang; Xiang Bai edit  url
openurl 
  Title ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images Type Conference Article
  Year 2023 Publication 17th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume (down) 14188 Issue Pages 536–552  
  Keywords  
  Abstract Structured text extraction is one of the most valuable and challenging application directions in the field of Document AI. However, the scenarios of past benchmarks are limited, and the corresponding evaluation protocols usually focus on the submodules of the structured text extraction scheme. In order to eliminate these problems, we organized the ICDAR 2023 competition on Structured text extraction from Visually-Rich Document images (SVRD). We set up two tracks for SVRD including Track 1: HUST-CELL and Track 2: Baidu-FEST, where HUST-CELL aims to evaluate the end-to-end performance of Complex Entity Linking and Labeling, and Baidu-FEST focuses on evaluating the performance and generalization of Zero-shot/Few-shot Structured Text extraction from an end-to-end perspective. Compared to the current document benchmarks, our two tracks of competition benchmark enriches the scenarios greatly and contains more than 50 types of visually-rich document images (mainly from the actual enterprise applications). The competition opened on 30th December, 2022 and closed on 24th March, 2023. There are 35 participants and 91 valid submissions received for Track 1, and 15 participants and 26 valid submissions received for Track 2. In this report we will presents the motivation, competition datasets, task definition, evaluation protocol, and submission summaries. According to the performance of the submissions, we believe there is still a large gap on the expected information extraction performance for complex and zero-shot scenarios. It is hoped that this competition will attract many researchers in the field of CV and NLP, and bring some new thoughts to the field of Document AI.  
  Address San Jose; CA; USA; August 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG Approved no  
  Call Number Admin @ si @ YZC2023 Serial 3896  
Permanent link to this record
 

 
Author Wenwen Yu; Mingyu Liu; Mingrui Chen; Ning Lu; Yinlong We; Yuliang Liu; Dimosthenis Karatzas; Xiang Bai edit  url
openurl 
  Title ICDAR 2023 Competition on Reading the Seal Title Type Conference Article
  Year 2023 Publication 17th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume (down) 14188 Issue Pages 522–535  
  Keywords  
  Abstract Reading seal title text is a challenging task due to the variable shapes of seals, curved text, background noise, and overlapped text. However, this important element is commonly found in official and financial scenarios, and has not received the attention it deserves in the field of OCR technology. To promote research in this area, we organized ICDAR 2023 competition on reading the seal title (ReST), which included two tasks: seal title text detection (Task 1) and end-to-end seal title recognition (Task 2). We constructed a dataset of 10,000 real seal data, covering the most common classes of seals, and labeled all seal title texts with text polygons and text contents. The competition opened on 30th December, 2022 and closed on 20th March, 2023. The competition attracted 53 participants and received 135 submissions from academia and industry, including 28 participants and 72 submissions for Task 1, and 25 participants and 63 submissions for Task 2, which demonstrated significant interest in this challenging task. In this report, we present an overview of the competition, including the organization, challenges, and results. We describe the dataset and tasks, and summarize the submissions and evaluation results. The results show that significant progress has been made in the field of seal title text reading, and we hope that this competition will inspire further research and development in this important area of OCR technology.  
  Address San Jose; CA; USA; August 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG Approved no  
  Call Number Admin @ si @ YLC2023 Serial 3897  
Permanent link to this record
 

 
Author Weijia Wu; Yuzhong Zhao; Zhuang Li; Jiahong Li; Mike Zheng Shou; Umapada Pal; Dimosthenis Karatzas; Xiang Bai edit   pdf
url  openurl
  Title ICDAR 2023 Competition on Video Text Reading for Dense and Small Text Type Conference Article
  Year 2023 Publication 17th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume (down) 14188 Issue Pages 405–419  
  Keywords Video Text Spotting; Small Text; Text Tracking; Dense Text  
  Abstract Recently, video text detection, tracking and recognition in natural scenes are becoming very popular in the computer vision community. However, most existing algorithms and benchmarks focus on common text cases (e.g., normal size, density) and single scenario, while ignore extreme video texts challenges, i.e., dense and small text in various scenarios. In this competition report, we establish a video text reading benchmark, named DSText, which focuses on dense and small text reading challenge in the video with various scenarios. Compared with the previous datasets, the proposed dataset mainly include three new challenges: 1) Dense video texts, new challenge for video text spotter. 2) High-proportioned small texts. 3) Various new scenarios, e.g., ‘Game’, ‘Sports’, etc. The proposed DSText includes 100 video clips from 12 open scenarios, supporting two tasks (i.e., video text tracking (Task 1) and end-to-end video text spotting (Task2)). During the competition period (opened on 15th February, 2023 and closed on 20th March, 2023), a total of 24 teams participated in the three proposed tasks with around 30 valid submissions, respectively. In this article, we describe detailed statistical information of the dataset, tasks, evaluation protocols and the results summaries of the ICDAR 2023 on DSText competition. Moreover, we hope the benchmark will promise the video text research in the community.  
  Address San Jose; CA; USA; August 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG Approved no  
  Call Number Admin @ si @ WZL2023 Serial 3898  
Permanent link to this record
 

 
Author Stepan Simsa; Milan Sulc; Michal Uricar; Yash Patel; Ahmed Hamdi; Matej Kocian; Matyas Skalicky; Jiri Matas; Antoine Doucet; Mickael Coustaty; Dimosthenis Karatzas edit   pdf
url  openurl
  Title DocILE Benchmark for Document Information Localization and Extraction Type Conference Article
  Year 2023 Publication 17th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume (down) 14188 Issue Pages 147–166  
  Keywords Document AI; Information Extraction; Line Item Recognition; Business Documents; Intelligent Document Processing  
  Abstract This paper introduces the DocILE benchmark with the largest dataset of business documents for the tasks of Key Information Localization and Extraction and Line Item Recognition. It contains 6.7k annotated business documents, 100k synthetically generated documents, and nearly 1M unlabeled documents for unsupervised pre-training. The dataset has been built with knowledge of domain- and task-specific aspects, resulting in the following key features: (i) annotations in 55 classes, which surpasses the granularity of previously published key information extraction datasets by a large margin; (ii) Line Item Recognition represents a highly practical information extraction task, where key information has to be assigned to items in a table; (iii) documents come from numerous layouts and the test set includes zero- and few-shot cases as well as layouts commonly seen in the training set. The benchmark comes with several baselines, including RoBERTa, LayoutLMv3 and DETR-based Table Transformer; applied to both tasks of the DocILE benchmark, with results shared in this paper, offering a quick starting point for future work. The dataset, baselines and supplementary material are available at https://github.com/rossumai/docile.  
  Address San Jose; CA; USA; August 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG Approved no  
  Call Number Admin @ si @ SSU2023 Serial 3903  
Permanent link to this record
 

 
Author George Tom; Minesh Mathew; Sergi Garcia Bordils; Dimosthenis Karatzas; CV Jawahar edit  url
openurl 
  Title ICDAR 2023 Competition on RoadText Video Text Detection, Tracking and Recognition Type Conference Article
  Year 2023 Publication 17th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume (down) 14188 Issue Pages 577–586  
  Keywords  
  Abstract In this report, we present the final results of the ICDAR 2023 Competition on RoadText Video Text Detection, Tracking and Recognition. The RoadText challenge is based on the RoadText-1K dataset and aims to assess and enhance current methods for scene text detection, recognition, and tracking in videos. The RoadText-1K dataset contains 1000 dash cam videos with annotations for text bounding boxes and transcriptions in every frame. The competition features an end-to-end task, requiring systems to accurately detect, track, and recognize text in dash cam videos. The paper presents a comprehensive review of the submitted methods along with a detailed analysis of the results obtained by the methods. The analysis provides valuable insights into the current capabilities and limitations of video text detection, tracking, and recognition systems for dashcam videos.  
  Address San Jose; CA; USA; August 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG Approved no  
  Call Number Admin @ si @ TMG2023 Serial 3905  
Permanent link to this record
 

 
Author Ayan Banerjee; Sanket Biswas; Josep Llados; Umapada Pal edit  url
openurl 
  Title SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation Type Conference Article
  Year 2023 Publication 17th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume (down) 14187 Issue Pages 307–325  
  Keywords  
  Abstract Instance-level segmentation of documents consists in assigning a class-aware and instance-aware label to each pixel of the image. It is a key step in document parsing for their understanding. In this paper, we present a unified transformer encoder-decoder architecture for en-to-end instance segmentation of complex layouts in document images. The method adapts a contrastive training with a mixed query selection for anchor initialization in the decoder. Later on, it performs a dot product between the obtained query embeddings and the pixel embedding map (coming from the encoder) for semantic reasoning. Extensive experimentation on competitive benchmarks like PubLayNet, PRIMA, Historical Japanese (HJ), and TableBank demonstrate that our model with SwinL backbone achieves better segmentation performance than the existing state-of-the-art approaches with the average precision of 93.72, 54.39, 84.65 and 98.04 respectively under one billion parameters. The code is made publicly available at: github.com/ayanban011/SwinDocSegmenter .  
  Address San Jose; CA; USA; August 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG Approved no  
  Call Number Admin @ si @ BBL2023 Serial 3893  
Permanent link to this record
 

 
Author Subhajit Maity; Sanket Biswas; Siladittya Manna; Ayan Banerjee; Josep Llados; Saumik Bhattacharya; Umapada Pal edit   pdf
url  doi
openurl 
  Title SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation Type Conference Article
  Year 2023 Publication 17th International Conference on Doccument Analysis and Recognition Abbreviated Journal  
  Volume (down) 14187 Issue Pages 342–360  
  Keywords  
  Abstract Document layout analysis is a known problem to the documents research community and has been vastly explored yielding a multitude of solutions ranging from text mining, and recognition to graph-based representation, visual feature extraction, etc. However, most of the existing works have ignored the crucial fact regarding the scarcity of labeled data. With growing internet connectivity to personal life, an enormous amount of documents had been available in the public domain and thus making data annotation a tedious task. We address this challenge using self-supervision and unlike, the few existing self-supervised document segmentation approaches which use text mining and textual labels, we use a complete vision-based approach in pre-training without any ground-truth label or its derivative. Instead, we generate pseudo-layouts from the document images to pre-train an image encoder to learn the document object representation and localization in a self-supervised framework before fine-tuning it with an object detection model. We show that our pipeline sets a new benchmark in this context and performs at par with the existing methods and the supervised counterparts, if not outperforms. The code is made publicly available at: this https URL  
  Address Document Layout Analysis; Document  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG Approved no  
  Call Number Admin @ si @ MBM2023 Serial 3990  
Permanent link to this record
 

 
Author Stepan Simsa; Michal Uricar; Milan Sulc; Yash Patel; Ahmed Hamdi; Matej Kocian; Matyas Skalicky; Jiri Matas; Antoine Doucet; Mickael Coustaty; Dimosthenis Karatzas edit  url
doi  openurl
  Title Overview of DocILE 2023: Document Information Localization and Extraction Type Conference Article
  Year 2023 Publication International Conference of the Cross-Language Evaluation Forum for European Languages Abbreviated Journal  
  Volume (down) 14163 Issue Pages 276–293  
  Keywords Information Extraction; Computer Vision; Natural Language Processing; Optical Character Recognition; Document Understanding  
  Abstract This paper provides an overview of the DocILE 2023 Competition, its tasks, participant submissions, the competition results and possible future research directions. This first edition of the competition focused on two Information Extraction tasks, Key Information Localization and Extraction (KILE) and Line Item Recognition (LIR). Both of these tasks require detection of pre-defined categories of information in business documents. The second task additionally requires correctly grouping the information into tuples, capturing the structure laid out in the document. The competition used the recently published DocILE dataset and benchmark that stays open to new submissions. The diversity of the participant solutions indicates the potential of the dataset as the submissions included pure Computer Vision, pure Natural Language Processing, as well as multi-modal solutions and utilized all of the parts of the dataset, including the annotated, synthetic and unlabeled subsets.  
  Address Thessaloniki; Greece; September 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference CLEF  
  Notes DAG Approved no  
  Call Number Admin @ si @ SUS2023a Serial 3924  
Permanent link to this record
 

 
Author Albert Tatjer; Bhalaji Nagarajan; Ricardo Marques; Petia Radeva edit  url
openurl 
  Title CCLM: Class-Conditional Label Noise Modelling Type Conference Article
  Year 2023 Publication 11th Iberian Conference on Pattern Recognition and Image Analysis Abbreviated Journal  
  Volume (down) 14062 Issue Pages 3-14  
  Keywords  
  Abstract The performance of deep neural networks highly depends on the quality and volume of the training data. However, cost-effective labelling processes such as crowdsourcing and web crawling often lead to data with noisy (i.e., wrong) labels. Making models robust to this label noise is thus of prime importance. A common approach is using loss distributions to model the label noise. However, the robustness of these methods highly depends on the accuracy of the division of training set into clean and noisy samples. In this work, we dive in this research direction highlighting the existing problem of treating this distribution globally and propose a class-conditional approach to split the clean and noisy samples. We apply our approach to the popular DivideMix algorithm and show how the local treatment fares better with respect to the global treatment of loss distribution. We validate our hypothesis on two popular benchmark datasets and show substantial improvements over the baseline experiments. We further analyze the effectiveness of the proposal using two different metrics – Noise Division Accuracy and Classiness.  
  Address Alicante; Spain; June 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference IbPRIA  
  Notes MILAB Approved no  
  Call Number Admin @ si @ TNM2023 Serial 3925  
Permanent link to this record
 

 
Author Alejandro Ariza-Casabona; Bartlomiej Twardowski; Tri Kurniawan Wijaya edit  url
openurl 
  Title Exploiting Graph Structured Cross-Domain Representation for Multi-domain Recommendation Type Conference Article
  Year 2023 Publication European Conference on Information Retrieval – ECIR 2023: Advances in Information Retrieval Abbreviated Journal  
  Volume (down) 13980 Issue Pages 49–65  
  Keywords  
  Abstract Multi-domain recommender systems benefit from cross-domain representation learning and positive knowledge transfer. Both can be achieved by introducing a specific modeling of input data (i.e. disjoint history) or trying dedicated training regimes. At the same time, treating domains as separate input sources becomes a limitation as it does not capture the interplay that naturally exists between domains. In this work, we efficiently learn multi-domain representation of sequential users’ interactions using graph neural networks. We use temporal intra- and inter-domain interactions as contextual information for our method called MAGRec (short for Multi-dom Ain Graph-based Recommender). To better capture all relations in a multi-domain setting, we learn two graph-based sequential representations simultaneously: domain-guided for recent user interest, and general for long-term interest. This approach helps to mitigate the negative knowledge transfer problem from multiple domains and improve overall representation. We perform experiments on publicly available datasets in different scenarios where MAGRec consistently outperforms state-of-the-art methods. Furthermore, we provide an ablation study and discuss further extensions of our method.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ECIR  
  Notes LAMP Approved no  
  Call Number Admin @ si @ ATK2023 Serial 3933  
Permanent link to this record
Select All    Deselect All
 |   | 
Details

Save Citations:
Export Records: