Name: Marta Talitha Carvalho Freire de Amorim
Type: PhD thesis
Publication date: 17/08/2020
Advisor:

Name Rolesort descending
Patrick Marques Ciarelli Advisor *

Examining board:

Name Rolesort descending
Patrick Marques Ciarelli Advisor *
Claudine Santos Badue Gonçalves External Examiner *
Elias Silva de Oliveira External Examiner *
Luiz Alberto Pinto External Examiner *
Adrião Duarte Dória Neto External Examiner *

Summary: Social media has played a very important role in detecting novelties or events, as they are widely available, in addition to allowing the rapid spread of various types of information. However, the data are unstructured, so the challenge arises of mining events in a large mass of data that are constantly growing. Therefore, it is necessary not only to identify information, but also to detect the most relevant information. The relevance of an information can be related to several aspects and characteristics of the applications, such as: time, audience, context, among others. Such applications seek to identify new
or unfamiliar patterns in data sets, some examples are: news coverage, product trends, suspicious behaviors on social media to detect crimes, among others. This research proposes two new novelty detection architectures in social media data based on three pillars: data fusion, temporal windows and a model to qualify the audience for the novelty. Both architectures have the following pipelines in common: coding, merging and detection. Coding uses neural networks to represent unstructured data in dense vectors. The fusion transforms different data structures into a single structure or combines the
scores (or probabilities) in the classifiers output. Detection uses unsupervised algorithms to identify novelties. The main difference between the two architecture is in the fusion. The first architecture fuses data at the input of unsupervised algorithms. The second architecture fuses the output of unsupervised algorithms. The following unsupervised algorithms are used: HBOS, Feature Bagging, Isolation Foresting, Autoencoders, and unsupervised versions of kNN and LSTM. The main contribution in this work is an approach based on deep neural networks that performs data fusion and novelty detection in unstructured and heterogeneous data. The innovations in the work are in the creation of a new architecture for detecting news from social networks, making a new merger with unstructured data, creating a database with unstructured data for the task of detecting news, and a context-free method for defining what is a novelty. The experiment stage showed that the proposed fusion had an improvement of approximately 11% in relation to the non-use of the fusion in detecting novelties.

Access to document

Acesso à informação
Transparência Pública

© 2013 Universidade Federal do Espírito Santo. Todos os direitos reservados.
Av. Fernando Ferrari, 514 - Goiabeiras, Vitória - ES | CEP 29075-910