%0 Conference Proceedings %T Synthetically generated semantic codebook for Bag-of-Visual-Words based word spotting %A David Aldavert %A Marçal Rusiñol %B 13th IAPR International Workshop on Document Analysis Systems %D 2018 %F David Aldavert2018 %O DAG; 600.084; 600.129; 600.121 %O exported from refbase (http://158.109.8.37/show.php?record=3105), last updated on Mon, 24 Jan 2022 13:34:47 +0100 %X Word-spotting methods based on the Bag-ofVisual-Words framework have demonstrated a good retrieval performance even when used in a completely unsupervised manner. Although unsupervised approaches are suitable forlarge document collections due to the cost of acquiring labeled data, these methods also present some drawbacks. For instance, having to train a suitable “codebook” for a certain dataset has a high computational cost. Therefore, inthis paper we present a database agnostic codebook which is trained from synthetic data. The aim of the proposed approach is to generate a codebook where the only information required is the type of script used in the document. The use of synthetic data also allows to easily incorporate semanticinformation in the codebook generation. So, the proposed method is able to determine which set of codewords have a semantic representation of the descriptor feature space. Experimental results show that the resulting codebook attains a state-of-the-art performance while having a more compact representation. %K Word Spotting %K Bag of Visual Words %K Synthetic Codebook %K Semantic Information %U http://158.109.8.37/files/AlR2018b.pdf %U http://dx.doi.org/10.1109/DAS.2018.25 %P 223-228