Distant Supervised Construction and Evaluation of a Novel Dataset of Emotion‑Tagged Social Media Comments in Spanish

Mostrar el registro sencillo del ítem

dc.rights.license https://creativecommons.org/licenses/by-nc-nd/2.5/ar/ es_ES
dc.creator Tessore, Juan Pablo es_ES
dc.creator Esnaola, Leonardo Martín es_ES
dc.creator Lanzarini, Laura es_ES
dc.creator Baldassarri, Sandra es_ES
dc.date.accessioned 2021-07-26T14:44:02Z
dc.date.available info:eu-repo/date/embargoEnd/2022-01-17 es_ES
dc.date.available 2021-07-26T14:44:02Z
dc.date.issued 2021-01-18
dc.identifier.citation Tessore, J.P., Esnaola, L.M., Lanzarini, L. et al. Distant Supervised Construction and Evaluation of a Novel Dataset of Emotion-Tagged Social Media Comments in Spanish. Cogn Comput (2021). https://doi.org/10.1007/s12559-020-09800-x es_ES
dc.identifier.issn 1866-9964 es_ES
dc.identifier.issn 1866-9956 es_ES
dc.identifier.uri https://repositorio.unnoba.edu.ar/xmlui/handle/23601/142
dc.description.abstract Tagged language resources are an essential requirement for developing machine-learning text-based classifiers. However, manual tagging is extremely time consuming and the resulting datasets are rather small, containing only a few thousand samples. Basic emotion datasets are particularly difficult to classify manually because categorization is prone to subjectivity, and thus, redundant classification is required to validate the assigned tag. Even though, in recent years, the amount of emotion-tagged text datasets in Spanish has been growing, it cannot be compared with the number, size, and quality of the datasets in English. Quality is a particularly concerning issue, as not many datasets in Spanish included a validation step in the construction process. In this article, a dataset of social media comments in Spanish is compiled, selected, filtered, and presented. A sample of the dataset is reclassified by a group of psychologists and validated using the Fleiss Kappa interrater agreement measure. Error analysis is performed by using the Sentic Computing tool BabelSenticNet. Results indicate that the agreement between the human raters and the automatically acquired tag is moderate, similar to other manually tagged datasets, with the advantages that the presented dataset contains several hundreds of thousands of tagged comments and it does not require extensive manual tagging. The agreement measured between human raters is very similar to the one between human raters and the original tag. Every measure presented is in the moderate agreement zone and, as such, suitable for training classification algorithms in sentiment analysis field. es_ES
dc.description.sponsorship Fil: Tessore, Juan Pablo. Universidad Nacional del Noroeste de la Provincia de Buenos Aires. Escuela de Tecnología. Instituto de Investigación y Transferencia en Tecnología, Centro Asociado CIC; Argentina. es_ES
dc.description.sponsorship Fil: Tessore, Juan Pablo. Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Ciudad Autónoma de Buenos Aires, Argentina es_ES
dc.description.sponsorship Fil: Esnaola, Leonardo Martín. Universidad Nacional del Noroeste de la Provincia de Buenos Aires. Escuela de Tecnología. Instituto de Investigación y Transferencia en Tecnología, Centro Asociado CIC; Argentina es_ES
dc.description.sponsorship Fil: Lanzarini, Laura. Facultad de Informática, Instituto de Investigación en Informática LIDI (Centro CICPBA), Universidad Nacional de La Plata, La Plata, Buenos Aires, Argentina es_ES
dc.description.sponsorship Fil: Baldassarri, Sandra. Departamento de Informática e Ingeniería de Sistemas, Universidad de Zaragoza, Aragon, Zaragoza, España es_ES
dc.description.sponsorship Fil: Baldassarri, Sandra. Instituto de Investigación en Ingeniería (I3A), Universidad de Zaragoza, Zaragoza, Aragon, España es_ES
dc.format application/pdf es_ES
dc.language.iso eng es_ES
dc.publisher Springer Science+Business Media LLC es_ES
dc.relation info:eu-repo/grantAgreement/UNNOBA/SIB2017/EXP 195/2017/AR. Buenos Aires/Tecnología y Aplicaciones de Sistemas de Software: Calidad e Innovación en procesos, productos y servicios es_ES
dc.rights info:eu-repo/semantics/embargoedAccess es_ES
dc.source Cognitive Computation es_ES
dc.subject Sentiment analysis es_ES
dc.subject Dataset construction es_ES
dc.subject Dataset validation es_ES
dc.subject Facebook es_ES
dc.subject Text mining es_ES
dc.title Distant Supervised Construction and Evaluation of a Novel Dataset of Emotion‑Tagged Social Media Comments in Spanish es_ES
dc.type info:eu-repo/semantics/article es_ES
dc.type info:ar-repo/semantics/artículo es_ES
dc.type info:eu-repo/semantics/acceptedVersion es_ES
dc.type info:eu-repo/semantics/article es_ES
dc.type info:ar-repo/semantics/artículo es_ES
dc.type info:eu-repo/semantics/acceptedVersion es_ES
dc.type info:eu-repo/semantics/article es_ES
dc.type info:ar-repo/semantics/artículo es_ES
dc.type info:eu-repo/semantics/acceptedVersion es_ES
dc.description.version Con referato es_ES
dc.relation.publisherversion https://link.springer.com/article/10.1007/s12559-020-09800-x es_ES
dc.contributor.orcid 0000-0002-2111-0976 es_ES
dc.contributor.orcid 0000-0001-6298-9019 es_ES
dc.contributor.orcid 0000-0001-7027-7564 es_ES
dc.contributor.orcid 0000-0002-9315-6391 es_ES


Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)

  • ITT - Artículos [67]
    Instituto de Investigación y Transferencia en Tecnología

Mostrar el registro sencillo del ítem

Buscar en el Repositorio


Búsqueda avanzada

Listar

Mi cuenta