TY - CONF AU - David Aldavert AU - Marçal Rusiñol A2 - DAS PY - 2018// TI - Synthetically generated semantic codebook for Bag-of-Visual-Words based word spotting BT - 13th IAPR International Workshop on Document Analysis Systems SP - 223 EP - 228 KW - Word Spotting KW - Bag of Visual Words KW - Synthetic Codebook KW - Semantic Information N2 - Word-spotting methods based on the Bag-ofVisual-Words framework have demonstrated a good retrieval performance even when used in a completely unsupervised manner. Although unsupervised approaches are suitable forlarge document collections due to the cost of acquiring labeled data, these methods also present some drawbacks. For instance, having to train a suitable “codebook” for a certain dataset has a high computational cost. Therefore, inthis paper we present a database agnostic codebook which is trained from synthetic data. The aim of the proposed approach is to generate a codebook where the only information required is the type of script used in the document. The use of synthetic data also allows to easily incorporate semanticinformation in the codebook generation. So, the proposed method is able to determine which set of codewords have a semantic representation of the descriptor feature space. Experimental results show that the resulting codebook attains a state-of-the-art performance while having a more compact representation. L1 - http://158.109.8.37/files/AlR2018b.pdf UR - http://dx.doi.org/10.1109/DAS.2018.25 N1 - DAG; 600.084; 600.129; 600.121 ID - David Aldavert2018 ER -