2022 |
|
Guillem Martinez, Maya Aghaei, Martin Dijkstra, Bhalaji Nagarajan, Femke Jaarsma, Jaap van de Loosdrecht, et al. (2022). Hyper-Spectral Imaging for Overlapping Plastic Flakes Segmentation. In 47th International Conference on Acoustics, Speech, and Signal Processing.
Abstract: In this paper, we propose a deformable convolution-based generative adversarial network (DCNGAN) for perceptual quality enhancement of compressed videos. DCNGAN is also adaptive to the quantization parameters (QPs). Compared with optical flows, deformable convolutions are more effective and efficient to align frames. Deformable convolutions can operate on multiple frames, thus leveraging more temporal information, which is beneficial for enhancing the perceptual quality of compressed videos. Instead of aligning frames in a pairwise manner, the deformable convolution can process multiple frames simultaneously, which leads to lower computational complexity. Experimental results demonstrate that the proposed DCNGAN outperforms other state-of-the-art compressed video quality enhancement algorithms.
Keywords: Hyper-spectral imaging; plastic sorting; multi-label segmentation; bitfield encoding
|
|
|
Guillermo Torres, Sonia Baeza, Carles Sanchez, Ignasi Guasch, Antoni Rosell, & Debora Gil. (2022). An Intelligent Radiomic Approach for Lung Cancer Screening. APPLSCI - Applied Sciences, 12(3), 1568.
Abstract: The efficiency of lung cancer screening for reducing mortality is hindered by the high rate of false positives. Artificial intelligence applied to radiomics could help to early discard benign cases from the analysis of CT scans. The available amount of data and the fact that benign cases are a minority, constitutes a main challenge for the successful use of state of the art methods (like deep learning), which can be biased, over-fitted and lack of clinical reproducibility. We present an hybrid approach combining the potential of radiomic features to characterize nodules in CT scans and the generalization of the feed forward networks. In order to obtain maximal reproducibility with minimal training data, we propose an embedding of nodules based on the statistical significance of radiomic features for malignancy detection. This representation space of lesions is the input to a feed
forward network, which architecture and hyperparameters are optimized using own-defined metrics of the diagnostic power of the whole system. Results of the best model on an independent set of patients achieve 100% of sensitivity and 83% of specificity (AUC = 0.94) for malignancy detection.
Keywords: Lung cancer; Early diagnosis; Screening; Neural networks; Image embedding; Architecture optimization
|
|
|
Hector Laria Mantecon, Yaxing Wang, Joost Van de Weijer, & Bogdan Raducanu. (2022). Transferring Unconditional to Conditional GANs With Hyper-Modulation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
Abstract: GANs have matured in recent years and are able to generate high-resolution, realistic images. However, the computational resources and the data required for the training of high-quality GANs are enormous, and the study of transfer learning of these models is therefore an urgent topic. Many of the available high-quality pretrained GANs are unconditional (like StyleGAN). For many applications, however, conditional GANs are preferable, because they provide more control over the generation process, despite often suffering more training difficulties. Therefore, in this paper, we focus on transferring from high-quality pretrained unconditional GANs to conditional GANs. This requires architectural adaptation of the pretrained GAN to perform the conditioning. To this end, we propose hyper-modulated generative networks that allow for shared and complementary supervision. To prevent the additional weights of the hypernetwork to overfit, with subsequent mode collapse on small target domains, we introduce a self-initialization procedure that does not require any real data to initialize the hypernetwork parameters. To further improve the sample efficiency of the transfer, we apply contrastive learning in the discriminator, which effectively works on very limited batch sizes. In extensive experiments, we validate the efficiency of the hypernetworks, self-initialization and contrastive loss for knowledge transfer on standard benchmarks.
|
|
|
Henry Velesaca, Patricia Suarez, Angel Sappa, Dario Carpio, Rafael E. Rivadeneira, & Angel Sanchez. (2022). Review on Common Techniques for Urban Environment Video Analytics. In Anais do III Workshop Brasileiro de Cidades Inteligentes (pp. 107–118).
Abstract: This work compiles the different computer vision-based approaches
from the state-of-the-art intended for video analytics in urban environments.
The manuscript groups the different approaches according to the typical modules present in video analysis, including image preprocessing, object detection,
classification, and tracking. This proposed pipeline serves as a basic guide to
representing these most representative approaches in this topic of video analysis
that will be addressed in this work. Furthermore, the manuscript is not intended
to be an exhaustive review of the most advanced approaches, but only a list of
common techniques proposed to address recurring problems in this field.
Keywords: Video Analytics; Review; Urban Environments; Smart Cities
|
|
|
Henry Velesaca, Patricia Suarez, Dario Carpio, Rafael E. Rivadeneira, Angel Sanchez, & Angel Morera. (2022). Video Analytics in Urban Environments: Challenges and Approaches. In ICT Applications for Smart Cities (Vol. 224, pp. 101–121). ISRL. Springer.
Abstract: This chapter reviews state-of-the-art approaches generally present in the pipeline of video analytics on urban scenarios. A typical pipeline is used to cluster approaches in the literature, including image preprocessing, object detection, object classification, and object tracking modules. Then, a review of recent approaches for each module is given. Additionally, applications and datasets generally used for training and evaluating the performance of these approaches are included. This chapter does not pretend to be an exhaustive review of state-of-the-art video analytics in urban environments but rather an illustration of some of the different recent contributions. The chapter concludes by presenting current trends in video analytics in the urban scenario field.
|
|
|
Hugo Bertiche, Meysam Madadi, & Sergio Escalera. (2022). Neural Cloth Simulation. ACMTGraph - ACM Transactions on Graphics, 41(6), 1–14.
Abstract: We present a general framework for the garment animation problem through unsupervised deep learning inspired in physically based simulation. Existing trends in the literature already explore this possibility. Nonetheless, these approaches do not handle cloth dynamics. Here, we propose the first methodology able to learn realistic cloth dynamics unsupervisedly, and henceforth, a general formulation for neural cloth simulation. The key to achieve this is to adapt an existing optimization scheme for motion from simulation based methodologies to deep learning. Then, analyzing the nature of the problem, we devise an architecture able to automatically disentangle static and dynamic cloth subspaces by design. We will show how this improves model performance. Additionally, this opens the possibility of a novel motion augmentation technique that greatly improves generalization. Finally, we show it also allows to control the level of motion in the predictions. This is a useful, never seen before, tool for artists. We provide of detailed analysis of the problem to establish the bases of neural cloth simulation and guide future research into the specifics of this domain.
ACM Transactions on GraphicsVolume 41Issue 6December 2022 Article No.: 220pp 1–
|
|
|
Hugo Jair Escalante, Heysem Kaya, Albert Ali Salah, Sergio Escalera, Yagmur Gucluturk, Umut Guçlu, et al. (2022). Modeling, Recognizing, and Explaining Apparent Personality from Videos. TAC - IEEE Transactions on Affective Computing, 13(2), 894–911.
Abstract: Explainability and interpretability are two critical aspects of decision support systems. Despite their importance, it is only recently that researchers are starting to explore these aspects. This paper provides an introduction to explainability and interpretability in the context of apparent personality recognition. To the best of our knowledge, this is the first effort in this direction. We describe a challenge we organized on explainability in first impressions analysis from video. We analyze in detail the newly introduced data set, evaluation protocol, proposed solutions and summarize the results of the challenge. We investigate the issue of bias in detail. Finally, derived from our study, we outline research opportunities that we foresee will be relevant in this area in the near future.
|
|
|
Iban Berganzo-Besga, Hector A. Orengo, Felipe Lumbreras, Paloma Aliende, & Monica N. Ramsey. (2022). Automated detection and classification of multi-cell Phytoliths using Deep Learning-Based Algorithms. JArchSci - Journal of Archaeological Science, 148, 105654.
Abstract: This paper presents an algorithm for automated detection and classification of multi-cell phytoliths, one of the major components of many archaeological and paleoenvironmental deposits. This identification, based on phytolith wave pattern, is made using a pretrained VGG19 deep learning model. This approach has been tested in three key phytolith genera for the study of agricultural origins in Near East archaeology: Avena, Hordeum and Triticum. Also, this classification has been validated at species-level using Triticum boeoticum and dicoccoides images. Due to the diversity of microscopes, cameras and chemical treatments that can influence images of phytolith slides, three types of data augmentation techniques have been implemented: rotation of the images at 45-degree angles, random colour and brightness jittering, and random blur/sharpen. The implemented workflow has resulted in an overall accuracy of 93.68% for phytolith genera, improving previous attempts. The algorithm has also demonstrated its potential to automatize the classification of phytoliths species with an overall accuracy of 100%. The open code and platforms employed to develop the algorithm assure the method's accessibility, reproducibility and reusability.
|
|
|
Idoia Ruiz. (2022). Deep Metric Learning for re-identification, tracking and hierarchical novelty detection (Joan Serrat, Ed.). Ph.D. thesis, , .
Abstract: Metric learning refers to the problem in machine learning of learning a distance or similarity measurement to compare data. In particular, deep metric learning involves learning a representation, also referred to as embedding, such that in the embedding space data samples can be compared based on the distance, directly providing a similarity measure. This step is necessary to perform several tasks in computer vision. It allows to perform the classification of images, regions or pixels, re-identification, out-of-distribution detection, object tracking in image sequences and any other task that requires computing a similarity score for their solution. This thesis addresses three specific problems that share this common requirement. The first one is person re-identification. Essentially, it is an image retrieval task that aims at finding instances of the same person according to a similarity measure. We first compare in terms of accuracy and efficiency, classical metric learning to basic deep learning based methods for this problem. In this context, we also study network distillation as a strategy to optimize the trade-off between accuracy and speed at inference time. The second problem we contribute to is novelty detection in image classification. It consists in detecting samples of novel classes, i.e. never seen during training. However, standard novelty detection does not provide any information about the novel samples besides they are unknown. Aiming at more informative outputs, we take advantage from the hierarchical taxonomies that are intrinsic to the classes. We propose a metric learning based approach that leverages the hierarchical relationships among classes during training, being able to predict the parent class for a novel sample in such hierarchical taxonomy. Our third contribution is in multi-object tracking and segmentation. This joint task comprises classification, detection, instance segmentation and tracking. Tracking can be formulated as a retrieval problem to be addressed with metric learning approaches. We tackle the existing difficulty in academic research that is the lack of annotated benchmarks for this task. To this matter, we introduce the problem of weakly supervised multi-object tracking and segmentation, facing the challenge of not having available ground truth for instance segmentation. We propose a synergistic training strategy that benefits from the knowledge of the supervised tasks that are being learnt simultaneously.
|
|
|
Idoia Ruiz, & Joan Serrat. (2022). Hierarchical Novelty Detection for Traffic Sign Recognition. SENS - Sensors, 22(12), 4389.
Abstract: Recent works have made significant progress in novelty detection, i.e., the problem of detecting samples of novel classes, never seen during training, while classifying those that belong to known classes. However, the only information this task provides about novel samples is that they are unknown. In this work, we leverage hierarchical taxonomies of classes to provide informative outputs for samples of novel classes. We predict their closest class in the taxonomy, i.e., its parent class. We address this problem, known as hierarchical novelty detection, by proposing a novel loss, namely Hierarchical Cosine Loss that is designed to learn class prototypes along with an embedding of discriminative features consistent with the taxonomy. We apply it to traffic sign recognition, where we predict the parent class semantics for new types of traffic signs. Our model beats state-of-the art approaches on two large scale traffic sign benchmarks, Mapillary Traffic Sign Dataset (MTSD) and Tsinghua-Tencent 100K (TT100K), and performs similarly on natural images benchmarks (AWA2, CUB). For TT100K and MTSD, our approach is able to detect novel samples at the correct nodes of the hierarchy with 81% and 36% of accuracy, respectively, at 80% known class accuracy.
Keywords: Novelty detection; hierarchical classification; deep learning; traffic sign recognition; autonomous driving; computer vision
|
|