toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
  Records Links
Author Qingshan Chen; Zhenzhen Quan; Yifan Hu; Yujun Li; Zhi Liu; Mikhail Mozerov edit  url
openurl 
  Title MSIF: multi-spectrum image fusion method for cross-modality person re-identification Type Journal Article
  Year (down) 2023 Publication International Journal of Machine Learning and Cybernetics Abbreviated Journal IJMLC  
  Volume Issue Pages  
  Keywords  
  Abstract Sketch-RGB cross-modality person re-identification (ReID) is a challenging task that aims to match a sketch portrait drawn by a professional artist with a full-body photo taken by surveillance equipment to deal with situations where the monitoring equipment is damaged at the accident scene. However, sketch portraits only provide highly abstract frontal body contour information and lack other important features such as color, pose, behavior, etc. The difference in saliency between the two modalities brings new challenges to cross-modality person ReID. To overcome this problem, this paper proposes a novel dual-stream model for cross-modality person ReID, which is able to mine modality-invariant features to reduce the discrepancy between sketch and camera images end-to-end. More specifically, we propose a multi-spectrum image fusion (MSIF) method, which aims to exploit the image appearance changes brought by multiple spectrums and guide the network to mine modality-invariant commonalities during training. It only processes the spectrum of the input images without adding additional calculations and model complexity, which can be easily integrated into other models. Moreover, we introduce a joint structure via a generalized mean pooling (GMP) layer and a self-attention (SA) mechanism to balance background and texture information and obtain the regional features with a large amount of information in the image. To further shrink the intra-class distance, a weighted regularized triplet (WRT) loss is developed without introducing additional hyperparameters. The model was first evaluated on the PKU Sketch ReID dataset, and extensive experimental results show that the Rank-1/mAP accuracy of our method is 87.00%/91.12%, reaching the current state-of-the-art performance. To further validate the effectiveness of our approach in handling cross-modality person ReID, we conducted experiments on two commonly used IR-RGB datasets (SYSU-MM01 and RegDB). The obtained results show that our method achieves competitive performance. These results confirm the ability of our method to effectively process images from different modalities.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes LAMP Approved no  
  Call Number Admin @ si @ CQH2023 Serial 3885  
Permanent link to this record
 

 
Author Olivier Penacchio; Xavier Otazu; Arnold J Wilkings; Sara M. Haigh edit  url
openurl 
  Title A mechanistic account of visual discomfort Type Journal Article
  Year (down) 2023 Publication Frontiers in Neuroscience Abbreviated Journal FN  
  Volume 17 Issue Pages  
  Keywords  
  Abstract Much of the neural machinery of the early visual cortex, from the extraction of local orientations to contextual modulations through lateral interactions, is thought to have developed to provide a sparse encoding of contour in natural scenes, allowing the brain to process efficiently most of the visual scenes we are exposed to. Certain visual stimuli, however, cause visual stress, a set of adverse effects ranging from simple discomfort to migraine attacks, and epileptic seizures in the extreme, all phenomena linked with an excessive metabolic demand. The theory of efficient coding suggests a link between excessive metabolic demand and images that deviate from natural statistics. Yet, the mechanisms linking energy demand and image spatial content in discomfort remain elusive. Here, we used theories of visual coding that link image spatial structure and brain activation to characterize the response to images observers reported as uncomfortable in a biologically based neurodynamic model of the early visual cortex that included excitatory and inhibitory layers to implement contextual influences. We found three clear markers of aversive images: a larger overall activation in the model, a less sparse response, and a more unbalanced distribution of activity across spatial orientations. When the ratio of excitation over inhibition was increased in the model, a phenomenon hypothesised to underlie interindividual differences in susceptibility to visual discomfort, the three markers of discomfort progressively shifted toward values typical of the response to uncomfortable stimuli. Overall, these findings propose a unifying mechanistic explanation for why there are differences between images and between observers, suggesting how visual input and idiosyncratic hyperexcitability give rise to abnormal brain responses that result in visual stress.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes NEUROBIT Approved no  
  Call Number Admin @ si @ POW2023 Serial 3886  
Permanent link to this record
 

 
Author Armin Mehri; Parichehr Behjati; Dario Carpio; Angel Sappa edit  url
doi  openurl
  Title SRFormer: Efficient Yet Powerful Transformer Network for Single Image Super Resolution Type Journal Article
  Year (down) 2023 Publication IEEE Access Abbreviated Journal ACCESS  
  Volume 11 Issue Pages  
  Keywords  
  Abstract Recent breakthroughs in single image super resolution have investigated the potential of deep Convolutional Neural Networks (CNNs) to improve performance. However, CNNs based models suffer from their limited fields and their inability to adapt to the input content. Recently, Transformer based models were presented, which demonstrated major performance gains in Natural Language Processing and Vision tasks while mitigating the drawbacks of CNNs. Nevertheless, Transformer computational complexity can increase quadratically for high-resolution images, and the fact that it ignores the original structures of the image by converting them to the 1D structure can make it problematic to capture the local context information and adapt it for real-time applications. In this paper, we present, SRFormer, an efficient yet powerful Transformer-based architecture, by making several key designs in the building of Transformer blocks and Transformer layers that allow us to consider the original structure of the image (i.e., 2D structure) while capturing both local and global dependencies without raising computational demands or memory consumption. We also present a Gated Multi-Layer Perceptron (MLP) Feature Fusion module to aggregate the features of different stages of Transformer blocks by focusing on inter-spatial relationships while adding minor computational costs to the network. We have conducted extensive experiments on several super-resolution benchmark datasets to evaluate our approach. SRFormer demonstrates superior performance compared to state-of-the-art methods from both Transformer and Convolutional networks, with an improvement margin of 0.1∼0.53dB . Furthermore, while SRFormer has almost the same model size, it outperforms SwinIR by 0.47% and inference time by half the time of SwinIR. The code will be available on GitHub.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MSIAU Approved no  
  Call Number Admin @ si @ MBC2023 Serial 3887  
Permanent link to this record
 

 
Author Shiqi Yang; Yaxing Wang; Joost Van de Weijer; Luis Herranz; Shangling Jui; Jian Yang edit  url
doi  openurl
  Title Trust Your Good Friends: Source-Free Domain Adaptation by Reciprocal Neighborhood Clustering Type Journal Article
  Year (down) 2023 Publication IEEE Transactions on Pattern Analysis and Machine Intelligence Abbreviated Journal TPAMI  
  Volume 45 Issue 12 Pages 15883-15895  
  Keywords  
  Abstract Domain adaptation (DA) aims to alleviate the domain shift between source domain and target domain. Most DA methods require access to the source data, but often that is not possible (e.g., due to data privacy or intellectual property). In this paper, we address the challenging source-free domain adaptation (SFDA) problem, where the source pretrained model is adapted to the target domain in the absence of source data. Our method is based on the observation that target data, which might not align with the source domain classifier, still forms clear clusters. We capture this intrinsic structure by defining local affinity of the target data, and encourage label consistency among data with high local affinity. We observe that higher affinity should be assigned to reciprocal neighbors. To aggregate information with more context, we consider expanded neighborhoods with small affinity values. Furthermore, we consider the density around each target sample, which can alleviate the negative impact of potential outliers. In the experimental results we verify that the inherent structure of the target features is an important source of information for domain adaptation. We demonstrate that this local structure can be efficiently captured by considering the local neighbors, the reciprocal neighbors, and the expanded neighborhood. Finally, we achieve state-of-the-art performance on several 2D image and 3D point cloud recognition datasets.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes LAMP; MACO Approved no  
  Call Number Admin @ si @ YWW2023 Serial 3889  
Permanent link to this record
 

 
Author Jaykishan Patel; Alban Flachot; Javier Vazquez; David H. Brainard; Thomas S. A. Wallis; Marcus A. Brubaker; Richard F. Murray edit  url
openurl 
  Title A deep convolutional neural network trained to infer surface reflectance is deceived by mid-level lightness illusions Type Journal Article
  Year (down) 2023 Publication Journal of Vision Abbreviated Journal JV  
  Volume 23 Issue 9 Pages 4817-4817  
  Keywords  
  Abstract A long-standing view is that lightness illusions are by-products of strategies employed by the visual system to stabilize its perceptual representation of surface reflectance against changes in illumination. Computationally, one such strategy is to infer reflectance from the retinal image, and to base the lightness percept on this inference. CNNs trained to infer reflectance from images have proven successful at solving this problem under limited conditions. To evaluate whether these CNNs provide suitable starting points for computational models of human lightness perception, we tested a state-of-the-art CNN on several lightness illusions, and compared its behaviour to prior measurements of human performance. We trained a CNN (Yu & Smith, 2019) to infer reflectance from luminance images. The network had a 30-layer hourglass architecture with skip connections. We trained the network via supervised learning on 100K images, rendered in Blender, each showing randomly placed geometric objects (surfaces, cubes, tori, etc.), with random Lambertian reflectance patterns (solid, Voronoi, or low-pass noise), under randomized point+ambient lighting. The renderer also provided the ground-truth reflectance images required for training. After training, we applied the network to several visual illusions. These included the argyle, Koffka-Adelson, snake, White’s, checkerboard assimilation, and simultaneous contrast illusions, along with their controls where appropriate. The CNN correctly predicted larger illusions in the argyle, Koffka-Adelson, and snake images than in their controls. It also correctly predicted an assimilation effect in White's illusion. It did not, however, account for the checkerboard assimilation or simultaneous contrast effects. These results are consistent with the view that at least some lightness phenomena are by-products of a rational approach to inferring stable representations of physical properties from intrinsically ambiguous retinal images. Furthermore, they suggest that CNN models may be a promising starting point for new models of human lightness perception.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MACO; CIC Approved no  
  Call Number Admin @ si @ PFV2023 Serial 3890  
Permanent link to this record
 

 
Author Diego Velazquez; Pau Rodriguez; Alexandre Lacoste; Issam H. Laradji; Xavier Roca; Jordi Gonzalez edit  url
openurl 
  Title Evaluating Counterfactual Explainers Type Journal
  Year (down) 2023 Publication Transactions on Machine Learning Research Abbreviated Journal TMLR  
  Volume Issue Pages  
  Keywords Explainability; Counterfactuals; XAI  
  Abstract Explainability methods have been widely used to provide insight into the decisions made by statistical models, thus facilitating their adoption in various domains within the industry. Counterfactual explanation methods aim to improve our understanding of a model by perturbing samples in a way that would alter its response in an unexpected manner. This information is helpful for users and for machine learning practitioners to understand and improve their models. Given the value provided by counterfactual explanations, there is a growing interest in the research community to investigate and propose new methods. However, we identify two issues that could hinder the progress in this field. (1) Existing metrics do not accurately reflect the value of an explainability method for the users. (2) Comparisons between methods are usually performed with datasets like CelebA, where images are annotated with attributes that do not fully describe them and with subjective attributes such as ``Attractive''. In this work, we address these problems by proposing an evaluation method with a principled metric to evaluate and compare different counterfactual explanation methods. The evaluation method is based on a synthetic dataset where images are fully described by their annotated attributes. As a result, we are able to perform a fair comparison of multiple explainability methods in the recent literature, obtaining insights about their performance. We make the code public for the benefit of the research community.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes ISE Approved no  
  Call Number Admin @ si @ VRL2023 Serial 3891  
Permanent link to this record
 

 
Author Patricia Suarez; Henry Velesaca; Dario Carpio; Angel Sappa edit  url
openurl 
  Title Corn kernel classification from few training samples Type Journal
  Year (down) 2023 Publication Artificial Intelligence in Agriculture Abbreviated Journal  
  Volume 9 Issue Pages 89-99  
  Keywords  
  Abstract This article presents an efficient approach to classify a set of corn kernels in contact, which may contain good, or defective kernels along with impurities. The proposed approach consists of two stages, the first one is a next-generation segmentation network, trained by using a set of synthesized images that is applied to divide the given image into a set of individual instances. An ad-hoc lightweight CNN architecture is then proposed to classify each instance into one of three categories (ie good, defective, and impurities). The segmentation network is trained using a strategy that avoids the time-consuming and human-error-prone task of manual data annotation. Regarding the classification stage, the proposed ad-hoc network is designed with only a few sets of layers to result in a lightweight architecture capable of being used in integrated solutions. Experimental results and comparisons with previous approaches showing both the improvement in accuracy and the reduction in time are provided. Finally, the segmentation and classification approach proposed can be easily adapted for use with other cereal types.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference  
  Notes MSIAU Approved no  
  Call Number Admin @ si @ SVC2023 Serial 3892  
Permanent link to this record
 

 
Author Ayan Banerjee; Sanket Biswas; Josep Llados; Umapada Pal edit  url
openurl 
  Title SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation Type Conference Article
  Year (down) 2023 Publication 17th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume 14187 Issue Pages 307–325  
  Keywords  
  Abstract Instance-level segmentation of documents consists in assigning a class-aware and instance-aware label to each pixel of the image. It is a key step in document parsing for their understanding. In this paper, we present a unified transformer encoder-decoder architecture for en-to-end instance segmentation of complex layouts in document images. The method adapts a contrastive training with a mixed query selection for anchor initialization in the decoder. Later on, it performs a dot product between the obtained query embeddings and the pixel embedding map (coming from the encoder) for semantic reasoning. Extensive experimentation on competitive benchmarks like PubLayNet, PRIMA, Historical Japanese (HJ), and TableBank demonstrate that our model with SwinL backbone achieves better segmentation performance than the existing state-of-the-art approaches with the average precision of 93.72, 54.39, 84.65 and 98.04 respectively under one billion parameters. The code is made publicly available at: github.com/ayanban011/SwinDocSegmenter .  
  Address San Jose; CA; USA; August 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG Approved no  
  Call Number Admin @ si @ BBL2023 Serial 3893  
Permanent link to this record
 

 
Author Wenwen Yu; Chengquan Zhang; Haoyu Cao; Wei Hua; Bohan Li; Huang Chen; Mingyu Liu; Mingrui Chen; Jianfeng Kuang; Mengjun Cheng; Yuning Du; Shikun Feng; Xiaoguang Hu; Pengyuan Lyu; Kun Yao; Yuechen Yu; Yuliang Liu; Wanxiang Che; Errui Ding; Cheng-Lin Liu; Jiebo Luo; Shuicheng Yan; Min Zhang; Dimosthenis Karatzas; Xing Sun; Jingdong Wang; Xiang Bai edit  url
openurl 
  Title ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images Type Conference Article
  Year (down) 2023 Publication 17th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume 14188 Issue Pages 536–552  
  Keywords  
  Abstract Structured text extraction is one of the most valuable and challenging application directions in the field of Document AI. However, the scenarios of past benchmarks are limited, and the corresponding evaluation protocols usually focus on the submodules of the structured text extraction scheme. In order to eliminate these problems, we organized the ICDAR 2023 competition on Structured text extraction from Visually-Rich Document images (SVRD). We set up two tracks for SVRD including Track 1: HUST-CELL and Track 2: Baidu-FEST, where HUST-CELL aims to evaluate the end-to-end performance of Complex Entity Linking and Labeling, and Baidu-FEST focuses on evaluating the performance and generalization of Zero-shot/Few-shot Structured Text extraction from an end-to-end perspective. Compared to the current document benchmarks, our two tracks of competition benchmark enriches the scenarios greatly and contains more than 50 types of visually-rich document images (mainly from the actual enterprise applications). The competition opened on 30th December, 2022 and closed on 24th March, 2023. There are 35 participants and 91 valid submissions received for Track 1, and 15 participants and 26 valid submissions received for Track 2. In this report we will presents the motivation, competition datasets, task definition, evaluation protocol, and submission summaries. According to the performance of the submissions, we believe there is still a large gap on the expected information extraction performance for complex and zero-shot scenarios. It is hoped that this competition will attract many researchers in the field of CV and NLP, and bring some new thoughts to the field of Document AI.  
  Address San Jose; CA; USA; August 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG Approved no  
  Call Number Admin @ si @ YZC2023 Serial 3896  
Permanent link to this record
 

 
Author Wenwen Yu; Mingyu Liu; Mingrui Chen; Ning Lu; Yinlong We; Yuliang Liu; Dimosthenis Karatzas; Xiang Bai edit  url
openurl 
  Title ICDAR 2023 Competition on Reading the Seal Title Type Conference Article
  Year (down) 2023 Publication 17th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume 14188 Issue Pages 522–535  
  Keywords  
  Abstract Reading seal title text is a challenging task due to the variable shapes of seals, curved text, background noise, and overlapped text. However, this important element is commonly found in official and financial scenarios, and has not received the attention it deserves in the field of OCR technology. To promote research in this area, we organized ICDAR 2023 competition on reading the seal title (ReST), which included two tasks: seal title text detection (Task 1) and end-to-end seal title recognition (Task 2). We constructed a dataset of 10,000 real seal data, covering the most common classes of seals, and labeled all seal title texts with text polygons and text contents. The competition opened on 30th December, 2022 and closed on 20th March, 2023. The competition attracted 53 participants and received 135 submissions from academia and industry, including 28 participants and 72 submissions for Task 1, and 25 participants and 63 submissions for Task 2, which demonstrated significant interest in this challenging task. In this report, we present an overview of the competition, including the organization, challenges, and results. We describe the dataset and tasks, and summarize the submissions and evaluation results. The results show that significant progress has been made in the field of seal title text reading, and we hope that this competition will inspire further research and development in this important area of OCR technology.  
  Address San Jose; CA; USA; August 2023  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title LNCS  
  Series Volume Series Issue Edition  
  ISSN ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG Approved no  
  Call Number Admin @ si @ YLC2023 Serial 3897  
Permanent link to this record
Select All    Deselect All
 |   | 
Details

Save Citations:
Export Records: