Preprocesado de imagen y OCR para mejorar deteccion de smishing

Blanco Medina, Pablo; Carofilis, Andrés; Fidalgo, Eduardo; Alegre, Enrique

doi:10.17979/JA-CEA.2024.45.10955

Preprocesado de imagen y OCR para mejorar deteccion de smishing

Blanco Medina, Pablo ¹
Carofilis, Andrés ¹
Fidalgo, Eduardo ¹
Alegre, Enrique ¹

1 Universidad de León

Universidad de León

León, España

ROR https://ror.org/02tzt0b78

Revista:

Jornadas de Automática

Cruz Martín, Ana María (coord.)
Arévalo Espejo, V. (coord.)
Fernández Lozano, Juan Jesús (coord.)

ISSN: 3045-4093

Año de publicación: 2024

Número: 45

Tipo: Artículo

DOI: 10.17979/JA-CEA.2024.45.10955 DIALNET GOOGLE SCHOLAR Acceso abierto editor

Resumen

The globalization of communication technologies has led to an increase in the number of scams through phishing. Computer Emergency Response Teams receive screenshots of smartphones from citizens containing short messages with suspicious messages. These SMS try to impersonate well-known companies and persuade users to take urgent action through a URL to steal their data or make unauthorized charges to their bank account. These short messages are called Smishing, and CERTs could be interested in tools that can automatically extract the URLs from these screenshots to verify later if it is a phishing URL. In this work, we propose a pipeline for Smishing URL extraction from the screenshots that CERTs may receive. We have combined traditional computer vision techniques, such as preprocessing or morphological operations, with an OCR to recognize the suspicious URLs. We have used our pipeline to 117 screenshots of Smishing messages containing 121 URLs, achieving an accuracy of 61,16 % retrieving complete URLs from Smishing screenshots.

Referencias bibliográficas

Choudhary, N., Jain, A. K., 2018. Comparative analysis of mobile phishing detection and prevention approaches. In: Information and Communication Technology for Intelligent Systems (ICTIS 2017)-Volume 1 2. Springer, pp. 349–356. DOI: https://doi.org/10.1007/978-3-319-63673-3_43
Goel, D., Jain, A. K., 2018. Smishing-classifier: a novel framework for detection of smishing attack in mobile environment. In: Smart and Innovative Trends in Next Generation Computing Technologies: Third International Conference, NGCT 2017, Dehradun, India, October 30-31, 2017, Revised Selected Papers, Part II 3. Springer, pp. 502–512. DOI: https://doi.org/10.1007/978-981-10-8660-1_38
Jain, A. K., Yadav, S. K., Choudhary, N., 2020. A novel approach to detect spam and smishing sms using machine learning techniques. International Journal of E-Services and Mobile Applications (IJESMA) 12 (1), 21–38. DOI: https://doi.org/10.4018/IJESMA.2020010102
Jánez-Martino, F., Alaiz-Rodríguez, R., Gonzalez-Castro, V., Fidalgo, E., Alegre, E., 2023. Classifying spam emails using agglomerative hierarchical clustering and a topic-based approach. Applied Soft Computing 139, 110226. DOI: https://doi.org/10.1016/j.asoc.2023.110226
Jánez-Martino, F., Alaiz-Rodríguez, R., Gonzalez-Castro, V., Fidalgo, E., Alegre, E., 2023. A review of spam email detection: analysis of spammer strategies and the dataset shift problem. Artificial Intelligence Review 56 (2), 1145–1173. DOI: https://doi.org/10.1007/s10462-022-10195-4
Li, M., Lv, T., Chen, J., Cui, L., Lu, Y., Florencio, D., Zhang, C., Li, Z., Wei, F., 2023. Trocr: Transformer-based optical character recognition with pre-trained models. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 37. pp. 13094–13102. DOI: https://doi.org/10.1609/aaai.v37i11.26538
Mishra, S., Soni, D., 2023. Dsmishsms-a system to detect smishing sms. Neural Computing and Applications 35 (7), 4975–4992. DOI: https://doi.org/10.1007/s00521-021-06305-y
Rahman, M. L., Timko, D., Wali, H., Neupane, A., 2023. Users really do respond to smishing. In: Proceedings of the Thirteenth ACM Conference on Data and Application Security and Privacy. pp. 49–60. DOI: https://doi.org/10.1145/3577923.3583640
Sanchez-Paniagua, M., Fernández, E. F., Alegre, E., Al-Nabki, W., González-Castro, V., 2022. Phishing url detection: A real-case scenario through login urls. IEEE Access 10, 42949–42960. DOI: https://doi.org/10.1109/ACCESS.2022.3168681
Smith, R., 2007. An overview of the tesseract ocr engine. In: ICDAR ’07: Proceedings of the Ninth International Conference on Document Analysis and Recognition. IEEE Computer Society, Washington, DC, USA, pp. 629–633. DOI: https://doi.org/10.1109/ICDAR.2007.4376991
Timko, D., Rahman, M. L., 2023. Commercial anti-smishing tools and their comparative effectiveness against modern threats. In: Proceedings of the 16th ACM Conference on Security and Privacy in Wireless and Mobile Networks. pp. 1–12. DOI: https://doi.org/10.1145/3558482.3590173
Uddin, M. S., Sultana, M., Rahman, T., Busra, U. S., 2012. Extraction of texts from a scene image using morphology based approach. In: 2012 International Conference on Informatics, Electronics & Vision (ICIEV). IEEE, pp. 876–880.
Wang, Y., Liu, Y., Wu, T., Duncan, I., 2020. A cost-effective ocr implementation to prevent phishing on mobile platforms. In: 2020 International Conference on Cyber Securit DOI: https://doi.org/10.1109/CyberSecurity49315.2020.9138873

Fuente de los datos: Dialnet

Preprocesado de imagen y OCR para mejorar deteccion de smishing

Universidad de León

Resumen

Referencias bibliográficas