toggle visibility Search & Display Options

Select All    Deselect All
 |   | 
Details
  Records Links
Author Ruben Tito; Dimosthenis Karatzas; Ernest Valveny edit   pdf
doi  openurl
  Title Hierarchical multimodal transformers for Multi-Page DocVQA Type Journal Article
  Year 2023 Publication Pattern Recognition Abbreviated Journal PR  
  Volume 144 Issue Pages 109834  
  Keywords  
  Abstract Document Visual Question Answering (DocVQA) refers to the task of answering questions from document images. Existing work on DocVQA only considers single-page documents. However, in real scenarios documents are mostly composed of multiple pages that should be processed altogether. In this work we extend DocVQA to the multi-page scenario. For that, we first create a new dataset, MP-DocVQA, where questions are posed over multi-page documents instead of single pages. Second, we propose a new hierarchical method, Hi-VT5, based on the T5 architecture, that overcomes the limitations of current methods to process long multi-page documents. The proposed method is based on a hierarchical transformer architecture where the encoder summarizes the most relevant information of every page and then, the decoder takes this summarized information to generate the final answer. Through extensive experimentation, we demonstrate that our method is able, in a single stage, to answer the questions and provide the page that contains the relevant information to find the answer, which can be used as a kind of explainability measure.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) ISSN 0031-3203 ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.155; 600.121 Approved no  
  Call Number Admin @ si @ TKV2023 Serial 3825  
Permanent link to this record
 

 
Author Souhail Bakkali; Zuheng Ming; Mickael Coustaty; Marçal Rusiñol; Oriol Ramos Terrades edit   pdf
doi  openurl
  Title VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification Type Journal Article
  Year 2023 Publication Pattern Recognition Abbreviated Journal PR  
  Volume 139 Issue Pages 109419  
  Keywords  
  Abstract Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features as a prior into a learnable downstream approach. In this paper, we approach the document classification problem by learning cross-modal representations through language and vision cues, considering intra- and inter-modality relationships. Instead of merging features from different modalities into a common representation space, the proposed method exploits high-level interactions and learns relevant semantic information from effective attention flows within and across modalities. The proposed learning objective is devised between intra- and inter-modality alignment tasks, where the similarity distribution per task is computed by contracting positive sample pairs while simultaneously contrasting negative ones in the common feature representation space}. Extensive experiments on public document classification datasets demonstrate the effectiveness and the generalization capacity of our model on both low-scale and large-scale datasets.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) ISSN 0031-3203 ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.140; 600.121 Approved no  
  Call Number Admin @ si @ BMC2023 Serial 3826  
Permanent link to this record
 

 
Author Debora Gil; Jaume Garcia; Manuel Vazquez; Ruth Aris; Guillaume Houzeaux edit   pdf
url  openurl
  Title Patient-Sensitive Anatomic and Functional 3D Model of the Left Ventricle Function Type Conference Article
  Year 2008 Publication 8th World Congress on Computational Mechanichs (WCCM8)/5th European Congress on Computational Methods in Applied Sciences and Engineering (ECCOMAS 2008) Abbreviated Journal  
  Volume Issue Pages  
  Keywords Left Ventricle; Electromechanical Models; Image Processing; Magnetic Resonance.  
  Abstract Early diagnosis and accurate treatment of Left Ventricle (LV) dysfunction significantly increases the patient survival. Impairment of LV contractility due to cardiovascular diseases is reflected in its motion patterns. Recent advances in medical imaging, such as Magnetic Resonance (MR), have encouraged research on 3D simulation and modelling of the LV dynamics. Most of the existing 3D models consider just the gross anatomy of the LV and restore a truncated ellipse which deforms along the cardiac cycle. The contraction mechanics of any muscle strongly depends on the spatial orientation of its muscular fibers since the motion that the muscle undergoes mainly takes place along the fibers. It follows that such simplified models do not allow evaluation of the heart electro-mechanical function and coupling, which has recently risen as the key point for understanding the LV functionality . In order to thoroughly understand the LV mechanics it is necessary to consider the complete anatomy of the LV given by the orientation of the myocardial fibres in 3D space as described by Torrent Guasp. We propose developing a 3D patient-sensitive model of the LV integrating, for the first time, the ven- tricular band anatomy (fibers orientation), the LV gross anatomy and its functionality. Such model will represent the LV function as a natural consequence of its own ventricular band anatomy. This might be decisive in restoring a proper LV contraction in patients undergoing pace marker treatment. The LV function is defined as soon as the propagation of the contractile electromechanical pulse has been modelled. In our experiments we have used the wave equation for the propagation of the electric pulse. The electromechanical wave moves on the myocardial surface and should have a conductivity tensor oriented along the muscular fibers. Thus, whatever mathematical model for electric pulse propa- gation [4] we consider, the complete anatomy of the LV should be extracted. The LV gross anatomy is obtained by processing multi slice MR images recorded for each patient. Information about the myocardial fibers distribution can only be extracted by Diffusion Tensor Imag- ing (DTI), which can not provide in vivo information for each patient. As a first approach, we have computed an average model of fibers from several DTI studies of canine hearts. This rough anatomy is the input for our electro-mechanical propagation model simulating LV dynamics. The average fiber orientation is updated until the simulated LV motion agrees with the experimental evidence provided by the LV motion observed in tagged MR (TMR) sequences. Experimental LV motion is recovered by applying image processing, differential geometry and interpolation techniques to 2D TMR slices [5]. The pipeline in figure 1 outlines the interaction between simulations and experimental data leading to our patient-tailored model.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Venezia (Italia) Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) B-31470-08 ISBN Medium  
  Area Expedition Conference  
  Notes IAM Approved no  
  Call Number IAM @ iam @ GGV2008c Serial 1521  
Permanent link to this record
 

 
Author Enric Marti; Debora Gil; Marc Vivet; Carme Julia edit  openurl
  Title Aprendizaje Basado en Proyectos en la asignatura de Gráficos por Computador en Ingeniería Informática. Balance de cuatro años de experiencia Type Miscellaneous
  Year 2009 Publication 15th Jornadas de Enseñanza Universitaria de la Informatica Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Barcelona, Spain Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume 1 Series Issue Edition  
  ISSN (down) 978-84-692-2758-9 ISBN Medium  
  Area Expedition Conference JENUI  
  Notes IAM;ADAS Approved no  
  Call Number IAM @ iam @ MGV2009 Serial 1596  
Permanent link to this record
 

 
Author Sergio Escalera; Ralf Herbrich edit  url
doi  isbn
openurl 
  Title The NeurIPS’18 Competition: From Machine Learning to Intelligent Conversations Type Book Whole
  Year 2020 Publication The Springer Series on Challenges in Machine Learning Abbreviated Journal  
  Volume Issue Pages  
  Keywords  
  Abstract This volume presents the results of the Neural Information Processing Systems Competition track at the 2018 NeurIPS conference. The competition follows the same format as the 2017 competition track for NIPS. Out of 21 submitted proposals, eight competition proposals were selected, spanning the area of Robotics, Health, Computer Vision, Natural Language Processing, Systems and Physics. Competitions have become an integral part of advancing state-of-the-art in artificial intelligence (AI). They exhibit one important difference to benchmarks: Competitions test a system end-to-end rather than evaluating only a single component; they assess the practicability of an algorithmic solution in addition to assessing feasibility.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor Sergio Escalera; Ralf Hebrick  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 2520-1328 ISBN 978-3-030-29134-1 Medium  
  Area Expedition Conference  
  Notes HuPBA; no menciona Approved no  
  Call Number Admin @ si @ HeE2020 Serial 3328  
Permanent link to this record
 

 
Author Albert Berenguel; Oriol Ramos Terrades; Josep Llados; Cristina Cañero edit  doi
openurl 
  Title Evaluation of Texture Descriptors for Validation of Counterfeit Documents Type Conference Article
  Year 2017 Publication 14th International Conference on Document Analysis and Recognition Abbreviated Journal  
  Volume Issue Pages 1237-1242  
  Keywords  
  Abstract This paper describes an exhaustive comparative analysis and evaluation of different existing texture descriptor algorithms to differentiate between genuine and counterfeit documents. We include in our experiments different categories of algorithms and compare them in different scenarios with several counterfeit datasets, comprising banknotes and identity documents. Computational time in the extraction of each descriptor is important because the final objective is to use it in a real industrial scenario. HoG and CNN based descriptors stands out statistically over the rest in terms of the F1-score/time ratio performance.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 2379-2140 ISBN Medium  
  Area Expedition Conference ICDAR  
  Notes DAG; 600.061; 601.269; 600.097; 600.121 Approved no  
  Call Number Admin @ si @ BRL2017 Serial 3092  
Permanent link to this record
 

 
Author M. Altillawi; S. Li; S.M. Prakhya; Z. Liu; Joan Serrat edit  doi
openurl 
  Title Implicit Learning of Scene Geometry From Poses for Global Localization Type Journal Article
  Year 2024 Publication IEEE Robotics and Automation Letters Abbreviated Journal ROBOTAUTOMLET  
  Volume 9 Issue 2 Pages 955-962  
  Keywords Localization; Localization and mapping; Deep learning for visual perception; Visual learning  
  Abstract Global visual localization estimates the absolute pose of a camera using a single image, in a previously mapped area. Obtaining the pose from a single image enables many robotics and augmented/virtual reality applications. Inspired by latest advances in deep learning, many existing approaches directly learn and regress 6 DoF pose from an input image. However, these methods do not fully utilize the underlying scene geometry for pose regression. The challenge in monocular relocalization is the minimal availability of supervised training data, which is just the corresponding 6 DoF poses of the images. In this letter, we propose to utilize these minimal available labels (i.e., poses) to learn the underlying 3D geometry of the scene and use the geometry to estimate the 6 DoF camera pose. We present a learning method that uses these pose labels and rigid alignment to learn two 3D geometric representations ( X, Y, Z coordinates ) of the scene, one in camera coordinate frame and the other in global coordinate frame. Given a single image, it estimates these two 3D scene representations, which are then aligned to estimate a pose that matches the pose label. This formulation allows for the active inclusion of additional learning constraints to minimize 3D alignment errors between the two 3D scene representations, and 2D re-projection errors between the 3D global scene representation and 2D image pixels, resulting in improved localization accuracy. During inference, our model estimates the 3D scene geometry in camera and global frames and aligns them rigidly to obtain pose in real-time. We evaluate our work on three common visual localization datasets, conduct ablation studies, and show that our method exceeds state-of-the-art regression methods' pose accuracy on all datasets.  
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 2377-3766 ISBN Medium  
  Area Expedition Conference  
  Notes ADAS Approved no  
  Call Number Admin @ si @ Serial 3857  
Permanent link to this record
 

 
Author G.Thorvaldsen; Joana Maria Pujadas-Mora; T.Andersen ; L.Eikvil; Josep Llados; Alicia Fornes; Anna Cabre edit  url
openurl 
  Title A Tale of two Transcriptions Type Journal
  Year 2015 Publication Historical Life Course Studies Abbreviated Journal  
  Volume 2 Issue Pages 1-19  
  Keywords Nominative Sources; Census; Vital Records; Computer Vision; Optical Character Recognition; Word Spotting  
  Abstract non-indexed
This article explains how two projects implement semi-automated transcription routines: for census sheets in Norway and marriage protocols from Barcelona. The Spanish system was created to transcribe the marriage license books from 1451 to 1905 for the Barcelona area; one of the world’s longest series of preserved vital records. Thus, in the Project “Five Centuries of Marriages” (5CofM) at the Autonomous University of Barcelona’s Center for Demographic Studies, the Barcelona Historical Marriage Database has been built. More than 600,000 records were transcribed by 150 transcribers working online. The Norwegian material is cross-sectional as it is the 1891 census, recorded on one sheet per person. This format and the underlining of keywords for several variables made it more feasible to semi-automate data entry than when many persons are listed on the same page. While Optical Character Recognition (OCR) for printed text is scientifically mature, computer vision research is now focused on more difficult problems such as handwriting recognition. In the marriage project, document analysis methods have been proposed to automatically recognize the marriage licenses. Fully automatic recognition is still a challenge, but some promising results have been obtained. In Spain, Norway and elsewhere the source material is available as scanned pictures on the Internet, opening up the possibility for further international cooperation concerning automating the transcription of historic source materials. Like what is being done in projects to digitize printed materials, the optimal solution is likely to be a combination of manual transcription and machine-assisted recognition also for hand-written sources.
 
  Address  
  Corporate Author Thesis  
  Publisher Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 2352-6343 ISBN Medium  
  Area Expedition Conference  
  Notes DAG; 600.077; 602.006 Approved no  
  Call Number Admin @ si @ TPA2015 Serial 2582  
Permanent link to this record
 

 
Author Carles Fernandez; Jordi Gonzalez; Joao Manuel R. S. Taveres; Xavier Roca edit   pdf
doi  isbn
openurl 
  Title Towards Ontological Cognitive System Type Book Chapter
  Year 2013 Publication Topics in Medical Image Processing and Computational Vision Abbreviated Journal  
  Volume 8 Issue Pages 87-99  
  Keywords  
  Abstract The increasing ubiquitousness of digital information in our daily lives has positioned video as a favored information vehicle, and given rise to an astonishing generation of social media and surveillance footage. This raises a series of technological demands for automatic video understanding and management, which together with the compromising attentional limitations of human operators, have motivated the research community to guide its steps towards a better attainment of such capabilities. As a result, current trends on cognitive vision promise to recognize complex events and self-adapt to different environments, while managing and integrating several types of knowledge. Future directions suggest to reinforce the multi-modal fusion of information sources and the communication with end-users.  
  Address  
  Corporate Author Thesis  
  Publisher Springer Netherlands Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 2212-9391 ISBN 978-94-007-0725-2 Medium  
  Area Expedition Conference  
  Notes ISE; 605.203; 302.018; 600.049 Approved no  
  Call Number Admin @ si @ FGT2013 Serial 2287  
Permanent link to this record
 

 
Author J.Poujol; Cristhian A. Aguilera-Carrasco; E.Danos; Boris X. Vintimilla; Ricardo Toledo; Angel Sappa edit   pdf
url  doi
isbn  openurl
  Title Visible-Thermal Fusion based Monocular Visual Odometry Type Conference Article
  Year 2015 Publication 2nd Iberian Robotics Conference ROBOT2015 Abbreviated Journal  
  Volume 417 Issue Pages 517-528  
  Keywords Monocular Visual Odometry; LWIR-RGB cross-spectral Imaging; Image Fusion.  
  Abstract The manuscript evaluates the performance of a monocular visual odometry approach when images from different spectra are considered, both independently and fused. The objective behind this evaluation is to analyze if classical approaches can be improved when the given images, which are from different spectra, are fused and represented in new domains. The images in these new domains should have some of the following properties: i) more robust to noisy data; ii) less sensitive to changes (e.g., lighting); iii) more rich in descriptive information, among other. In particular in the current work two different image fusion strategies are considered. Firstly, images from the visible and thermal spectrum are fused using a Discrete Wavelet Transform (DWT) approach. Secondly, a monochrome threshold strategy is considered. The obtained
representations are evaluated under a visual odometry framework, highlighting
their advantages and disadvantages, using different urban and semi-urban scenarios. Comparisons with both monocular-visible spectrum and monocular-infrared spectrum, are also provided showing the validity of the proposed approach.
 
  Address Lisboa; Portugal; November 2015  
  Corporate Author Thesis  
  Publisher Springer International Publishing Place of Publication Editor  
  Language Summary Language Original Title  
  Series Editor Series Title Abbreviated Series Title  
  Series Volume Series Issue Edition  
  ISSN (down) 2194-5357 ISBN 978-3-319-27145-3 Medium  
  Area Expedition Conference ROBOT  
  Notes ADAS; 600.076; 600.086 Approved no  
  Call Number Admin @ si @ PAD2015 Serial 2663  
Permanent link to this record
Select All    Deselect All
 |   | 
Details

Save Citations:
Export Records: