Detecting Textual Information in Images from Onion Domains Using Text Spotting
- Pablo Blanco 1
- Eduardo Fidalgo 1
- Enrique Alegre 1
- Mhd Wesam Al-Nabki 1
-
1
Universidad de León
info
- Inés Tejado Balsera (coord.)
- Emiliano Pérez Hernández (coord.)
- Antonio José Calderón Godoy (coord.)
- Isaías González Pérez (coord.)
- Pilar Merchán García (coord.)
- Jesús Lozano Rogado (coord.)
- Santiago Salamanca Miño (coord.)
- Blas M. Vinagre Jara (coord.)
Publisher: Universidad de Extremadura
ISBN: 978-84-9749-756-5, 978-84-09-04460-3
Year of publication: 2018
Pages: 975-982
Congress: Jornadas de Automática (39. 2018. Badajoz)
Type: Conference paper
Abstract
Due to the efforts of different authorities in the fight against illegal activities in the Tor networks, the traders have developed new ways of circumventing the monitoring tools used to obtain evidence of said activities. In particular, embedding textual content into graphical objects avoids that text analysis, using Natural Language Processing (NLP) algorithms, can be used for watching such onion web contents. In this paper, we present a Text Spotting framework dedicated to detecting and recognizing textual information within images hosted in onion domains. We found that the Connectionist Text Proposal Network and Convolutional Recurrent Neural Network achieve 0.57 F-Measure when running the combined pipeline on a subset of 100 images labeled manually obtained from TOIC dataset. We also identified the parameters that have a critical influence on the Text Spotting results. The proposed technique might support tools to help the authorities in detecting these activities.