Name: LÍGIA IUNES VENTUROTT
Publication date: 25/10/2021
Advisor:
Name | Role |
---|---|
PATRICK MARQUES CIARELLI | Advisor * |
Examining board:
Name | Role |
---|---|
ELIAS SILVA DE OLIVEIRA | External Examiner * |
JORGE LEONID ACHING SAMATELO | External Examiner * |
PATRICK MARQUES CIARELLI | Advisor * |
Summary: In the last decade, online social networks went through a quick expansion. The main goal of these platforms is to allow the communication between people from different backgrounds, religions, cultures and countries. However, this new form of contact, allied to the feeling of anonymity and impunity of the digital enviroment, turned social networks into a favorable enviroment for disseminating hate speech, such as xenophobia, racism, sexism, homophobia, and others. Most platforms, such as Twitter and Facebook, explicitly forbid this kind of behaviour. However, the large volume of daily posts make manually detecting hate speech an almost impossible task. In this context, there is a need for automatic detection tools for hate speech in social networks, but most works focus on detecting of hateful content in English. This work develops a method for detecting hate speech in social networks focused on Portuguese, using deep neural networks as the main resource. To that end, first we identified the main issues regarding hate speech detection in Portuguese, and it was observed that there is a lack of labeled datasets for hate speech and offensive language in Portuguese. The few existing datasets consist of few documents, which makes the application of deep learning techniques difficult.
In order to mitigate this problem, we propose using data augmentation techniques. Three techniques were selected from the literature and were applied in different scenarios, WHERE we tried to identify in which cases these techniques would be the most beneficial. It was concluded that the data augmentation techniques selected can be helpful when applied to very reduced datasets, varying from 1,000 to 2,000 documents.
Keywords: Hate Speech, Social Networks, Convolutional Neural Networks, Recursive Neural Networks, Data Augmentation.