Tesis Doctoral
DOI
https://doi.org/10.11606/T.11.2021.tde-12112021-114459
Documento
Autor
Nombre completo
Pórtya Piscitelli Cavalcanti
Dirección Electrónica
Área de Conocimiento
Fecha de Defensa
Publicación
Piracicaba, 2021
Director
Tribunal
Dias, Carlos Tadeu dos Santos (Presidente)
Ferreira, Eric Batista
Hongyu, Kuang
Título en inglés
Archetypal analysis as an imputation method and multivariate data augmentation
Palabras clave en inglés
Missing data
Multivariate statistics
Simulation study
Unsupervised method
Resumen en inglés
Multivariate statistics studies the relation between a set of random variables and how to analyze them simultaneously. In Multivariate Statistics, archetypes are extreme elements capable of rewriting all observations of a sample, or population, by means of linear combinations. Through the Archetypal Analysis (AA), a multivariate technique that aims to reduce the dimensionality of observations, it is possible to find and select their archetypes, which are convex combinations of the data. AA can be applied in several areas of knowledge and with different uses of archetypes. On this thesis we proposed two different uses of the AA in multivariate contexts: as a sample augmentation method and as an imputation method. The first approach was addressed in samples from bivariate correlated normal random variables from different covariance structures and a simulation study was carried out to evaluate three proposed algorithms and compare them to traditional methods. It was observed that regardless of the correlation structure between the variables, it is possible to increase up to 20% of the sample size. The second approach have evaluated the use of archetypes to impute values by Single and Multiple imputation in a multivariate dataset, with simulated missing data. It was also conducted a simulation study to evaluate the proposed methods that were compared to traditional ones too. The results were promising and the imputed values were very similar to the originals. Therefore, in the two approaches discussed in this work the results points out to the ability of the archetypes representing the dataset and so expressing it as a new data or filling up possible missing values satisfactorily.
Título en portugués
Análise de Arquétipos como método de imputação e aumento de dados multivariados
Palabras clave en portugués
Estudo de simulação