Corpus de interacciones digitalessistematización de técnicas para recoger datos en WhatsApp
- Lucía Cantamutto
- Cristina Vela Delfa
ISSN: 0719-3661
Année de publication: 2023
Número: 54
Pages: 117-139
Type: Article
D'autres publications dans: Cuadernos.Info
Résumé
The collection of datasets from real interactions is an unavoidable step in many research works aiming to understand language use. In the field of digital discourse analysis, data collection is complex due to the fast-paced changes in the applications and the ethical decisions involved. This work has two goals. First, we seek to show an overview of the literature on datasets of digital exchanges by WhatsApp. Then, we aim to systematize different sampling techniques used in previous research. We thus proceeded by applying content analysis to 100 research articles and theses retrieved from open access portals. We conducted a descriptive analysis that included the amount of data collected, the technique employed in the collection of the data, the method used to contact participants, and the online access to the linguistic corpora, among other variables. The results show the existence of some corpora annotated and available in languages other than Spanish. In addition, most of the literature shows a combination of different techniques to collect a wide set of linguistic and multimodal data. Then, we systematize the main methodological alternatives for data collection from digital interactions by WhatsApp, with the participant observation method standing out
Références bibliographiques
- Ädel, A. & Reppen, R. (Eds.). (2008). Corpora and Discourse. The challenges of different settings. John Benjamins Publishing.
- Alcántara-Plá, M. (2014). Las unidades discursivas en los mensajes instantáneos de wasap (The discursive units in WhatsApp instant messages). Estudios de Lingüística del Español, 35, 2014. https://infoling.org/elies/35/elies35.1-9.pdf
- Ayan, E. (2020). Descriptive Analysis of Emoticons/Emoji and Persuasive Digital Language Use in WhatsApp Messages. Open Journal of Modern Linguistics, 10(4), 375-389. https://doi.org/10.4236/ojml.2020.104022
- Bach, C. & Costa Carreras, J. (2020). Las conversaciones de wasap: ¿un nuevo género entre lo oral y lo escrito? (Whatsapp conversations: A new genre between orality and writing?) Revista Signos. Estudios De Lingüística, 53(104), 568-591. http://revistasignos.cl/index.php/signos/article/view/329
- Beißwenger, M. & Storrer, A. (2008). Corpora Of Computer-Mediated Communication. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics. An International Handbook (pp. 292-308). Mouton de Gruyter.
- Beißwenger, M., Ermakova, M., Geyken, A., Lemnitzer, L., & Storrer, A. (2013a). Dortmunder Chat-Korpus. [Data Set] https://www.uni-due.de/germanistik/chatkorpus/ (consulta: 11 de agosto de 2022).
- Beißwenger, M., Ermakova, M., Geyken, A., Lemnitzer, L., & Storrer, A. (2013b). DeRiK: A German reference corpus of computer-mediated communication. Literary and Linguistic Computing, 28(4), 531-537. https://doi.org/10.1093/llc/fqt038
- Cantamutto, L., Vela Delfa, C. & Boisselier, L. (2015). Comunicaciones Digitales: Corpus del español (CODICE). [Data Set]. Disponible en: aplicacionesonline.codice.com.ar.
- Cantamutto, L. & Vela Delfa, C. (2016). El discurso digital como objeto de estudio: de la descripción de interfaces a la definición de propiedades (Digital Discourse As A Subject Of Study: From The Interfaces Description To The Properties Definition). Aposta. Revista de Ciencias Sociales, 69, 296-323. http://apostadigital.com/revistav3/hemeroteca/cvela2.pdf
- Cantamutto, L. & Vela Delfa, C. (2019). Emojis frecuentes en las interacciones por WhatsApp: estudio comparativo entre dos variedades de español (Argentina y España) (Frequent emojis in WhatsApp interactions: a comparative study between two Spanish varieties (Argentina and Spain). Círculo de Lingüística Aplicada a la Comunicación, 77, 171-186. https://doi.org/10.5209/CLAC.63282
- Cantamutto, L. & Vela Delfa, C. (2020). Mensajes, publicaciones, comentarios y otros textos breves de la comunicación digital (Messages, Publications, Comments and other brief Texts of the Digital Communication). Tonos Digital: Revista Electrónica de Estudios Filológicos, (38), 1-27. http://www.tonosdigital.es/ojs/index.php/tonos/article/view/2394/
- Calero Vaquera, M. L. (2014). El discurso del WhatsApp: entre el Messenger y el SMS. Oralia, 17, 85-114.
- Castedo, T. M., de Marques Lucena, R., & Gomes da Silva, C. (2022). Vos: Young, Poor and Vulgar in Eastern Bolivia? A Corpus Study on Voseo in WhatsApp Exchanges. Íkala, Revista De Lenguaje Y Cultura, 27(2), 393–410. https://doi.org/10.17533/udea.ikala.v27n2a06
- Collins, L. C. (2019). Corpus Linguistics For Online Communication: A Guide For Research. Routledge.
- de Benito Moreno, C. (2022). Uso de los medios digitales de comunicación como corpus de español (Use of digital communication media as a corpus of Spanish). In G. Parodi, P. Cantos-Gómez, & C. Howe (Coords), Lingüística de corpus en español (The Routledge Handbook of Spanish Corpus Linguistics) (pp. 481-493). Routledge.
- de Benito Moreno, C. & Estrada Arraéz, A. (2018). Aproximación metodológica al estudio de la variación lingüística en las interacciones digitales (A methodological approximation to the study of linguistic variation in digital interactions). Revista de Estudios Del Discurso Digital, (1), 74–122. https://doi.org/10.24197/redd.1.2018.74-122
- De Luca, N. (2021). El marcador conversacional ahre en memes: hacia la definición del marcador-meme en interacciones digitales de dos comunidades de práctica juveniles (The conversational marker ahre in memes: towards the definition of the marker-meme in digital interactions of two youth communities of practice). Pragmática Sociocultural/ Sociocultural Pragmatics, 9(1), 76–95. https://doi.org/10.1515/soprag-2021-0008
- Dorantes, A., Sierra, G., Donohue Pérez, T. Y., Bel-Enguix, G., & Jasso Rosales, M. (2018). Sociolinguistic Corpus of WhatsApp chats in Spanish among College Students. In L.W. Ku & C. T. Li (Eds.), Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media (pp. 1-6). Association for Computational Linguistics. https://doi.org/10.18653/v1/W18-3501
- Forsythand, E. N., Lin, J. y Martell, C. (2007). NPS Internet Chatroom Conversations Corpus. [Data Set] Release 1.0 LDC2010T05. https://doi.org/10.35111/eqdj-ta72
- Forsythand, E. N. & Martell, C. H. (2007). Lexical and Discourse Analysis of Online Chat Dialog. International Conference on Semantic Computing (ICSC 2007), 19-26. IEEE. https://doi.org/10.1109/ICSC.2007.55
- García-Gómez, A. (2020). Intercultural and interpersonal communication failures: analyzing hostile interactions among British and Spanish university students on WhatsApp. Intercultural Pragmatics, 17(1), 27-51. https://doi.org/10.1515/ip-2020-0002
- Godoy, L. F. (2021). Interacción colaborativa escolar en WhatsApp: entre la tarea y las bromas (Collaborative school interaction: between homework and jokes). Revista Estudios del Discurso Digital, (4), 115-145. https://doi.org/10.24197/redd.4.2021.115-145
- González Fernández, A. (2017). The Web as Corpus: An Overview. Lengua y Habla, (21), 126 150.
- Kim, J. Y., Calvo, R. A., Enfield, N. J., & Yacef, K. (2021). A Systematic Review on Dyadic Conversation Visualizations. In Z. Hammal & C. Busso (Eds.), ICM’21 Companion: Companion Publication of the 2021 International Conference on Multimodal Interaction (pp. 137–147). ACM. https://doi.org/10.1145/3461615.3485396
- Kreis, R. (2022). Data Collection, Preparation, and Management. In C. Vásquez (Ed.), Research Methods for Digital Discourse Analysis (pp. 73-90). Bloomsbury
- Maíz-Arévalo, C. (2018). Emotional Self-Presentation on Whatsapp: Analysis of the Profile Status. Russian Journal of Linguistics, 22(1), 144-160. https://doi.org/10.22363/2312-9182-2018-22-1-144-160
- Molina Mejía, J. M. (2021). Lingüística computacional y de corpus: Teorías, métodos y aplicaciones (Computational and corpus linguistics: Theories, methods and applications). Universidad de Antioquia.
- Pano Alamán, A. & Moya Muñoz, P. (2015). CorpusRedEs. Proyecto de creación y anotación de un corpus de comunicación mediada por ordenador en español (CorpusRedEs. Project for the creation and annotation of a corpus of communication mediated by computer in Spanish). CHIMERA. Romance Corpora and Linguistic Studies, 2, 117–129. https://revistas.uam.es/chimera/article/view/1042
- Pano Alamán, A. & Moya Muñoz, P. (2016). Una aproximación a los estudios sobre el discurso mediado por ordenador en lengua española (An approach to studies on computer-mediated discourse in Spanish). Tonos Digital: Revista Electrónica de Estudios Filológicos, (30), 1-30.
- Pérez-Sabater, C. (2015). Discovering language variation in WhatsApp text interactions. Onomázein, (31),113-126. https://doi.org/10.7764/onomazein.31.8
- Pihlaja, S. (2022). Data Sampling and Digital Discourse. In C. Vásquez (Ed.), Research Methods for Digital Discourse Analysis (pp. 55-72). Bloomsbury
- Resende, G., Messias, J., Silva, M., Almedia, J., Vasconcelos, M., & Benevenuto, F. (2018). A System for Monitoring Public Political Groups in WhatsApp. In M. Carvalho Marques Neto, R. Lima Novais, C. Ferraz, & W. Viana (Chairs), WebMedia'18: Proceedings of the 24th Brazilian Symposium on Multimedia and the Web (pp. 387–390). ACM. https://doi.org/10.1145/3243082.3264662
- Sampietro, A. (2016). Emoticonos y emojis: análisis de su historia, difusión y uso en la comunicación digital actual. Tesis doctoral: Univerdidad de Alicante
- Srivastava, V. & Singh, M. (2020). PoliWAM: An Exploration of a Large Scale Corpus of Political Discussions on WhatsApp Messenger. arXiv preprint arXiv:2010.13263. https://doi.org/10.48550/arXiv.2010.13263
- Tannen, D. (1984). Conversational style: Analyzing talk among friends. Ablex
- Thurlow, C. (2018). Digital discourse: Locating language in new/social media. In J. Burgess, A. Marwick, & T. Poell (Eds.), The SAGE Handbook of Social Media (pp. 135-145). SAGE. https://doi.org/10.4135/9781473984066
- Toruella, J. & Llisterri, J. (1999). Diseño de corpus textuales y orales (Design of textual and oral corpora). In J. M. Blecqua, G. Clavería, C. Sánchez, & J. Toruella (Eds.), Filología e informática. Nuevas tecnologías en los estudios filológicos (Philology and computer science. New technologies in philological studies). Editorial Milenio.
- Ueberwasser, S. & Stark, E. (2017). What’s up, Switzerland? A corpus-based research project in a multilingual country. Linguistik online, 84(5). https://doi.org/10.13092/lo.84.3849
- Vásquez, C. (2022). Research Methods for Digital Discourse Analysis. Bloomsbury Vázquez-Cano, E., Mengual-Andrés, S., & Roig-Vila, R. (2015). Análisis lexicométrico de la especificidad de la escritura digital del adolescente en WhatsApp (Lexicometric Analysis of the Specificity of Teenagers’ Digital Writing In WhatsApp). Revista de Lingüística Teórica y Aplicada, 53(1), 83-105. https://doi.org/10.4067/S0718-48832015000100005
- Vela Delfa, C. & Cantamutto, L. (2016). De participante a observador: el método etnográfico en el analisis de las interacciones digitales de WhatsApp (From Participant to Observer: The Ethnographic Method In The Analysis Of Whatsapp Digital Interactions). Tonos Digital: Revista Electrónica de Estudios Filológicos, (31), 1-22. http://www.tonosdigital.com/ojs/index.php/tonos/article/view/1531
- Ueberwasser, S. & Stark, E. (2017). What’s up, Switzerland? A corpus-based research project in a multilingual country. Linguistik online 84(5), 105-126. https://doi.org/10.13092/lo.84.3849
- Verheijen, L. & Stoop, W. (2016). Collecting Facebook Posts and WhatsApp Chats. In P. Sojka, A. Horák, I. Kopeček, & K. Pala (Eds.), Text, Speech, and Dialogue (pp. 249-258). Springer. https://doi.org/10.1007/978-3-319-45510-5_29
- Yus, F. (2021). Smartphone Communication: Interactions in the App Ecosystem. Routledge.