Mémoire de Maîtrise
DOI
https://doi.org/10.11606/D.104.2017.tde-12092017-083813
Document
Auteur
Nom complet
Marco Henrique de Almeida Inácio
Unité de l'USP
Domain de Connaissance
Date de Soutenance
Editeur
São Carlos, 2017
Directeur
Jury
Izbicki, Rafael (Président)
Lopes, Danilo Lourenço
Prates, Marcos Oliveira
Titre en anglais
Comparing two populations using Bayesian Fourier series density estimation
Mots-clés en anglais
Density estimation
Discrete sampling
Fourier series
Orthogonal series
Stan
Resumé en anglais
Given two samples from two populations, one could ask how similar the populations are, that is, how close their probability distributions are. For absolutely continuous distributions, one way to measure the proximity of such populations is to use a measure of distance (metric) between the probability density functions (which are unknown given that only samples are observed). In this work, we work with the integrated squared distance as metric. To measure the uncertainty of the squared integrated distance, we first model the uncertainty of each of the probability density functions using a nonparametric Bayesian method. The method consists of estimating the probability density function f (or its logarithm) using Fourier series {f0;f1; :::;fI}. Assigning a prior distribution to f is then equivalent to assigning a prior distribution to the coefficients of this series. We used the prior suggested by Scricciolo (2006) (sieve prior), which not only places a prior on such coefficients, but also on I itself, so that in reality we work with a Bayesian mixture of finite dimensional models. To obtain posterior samples of such mixture, we marginalize out the discrete model index parameter I and use a statistical software called Stan. We conclude that the Bayesian Fourier series method has good performance when compared to kernel density estimation, although both methods often have problems in the estimation of the probability density function near the boundaries. Lastly, we showed how the methodology of Fourier series can be used to access the uncertainty regarding the similarity of two samples. In particular, we applied this method to dataset of patients with Alzheimer.
Titre en portugais
Comparação de duas populações utilizando estimação bayesiana de densidades por séries de Fourier
Mots-clés en portugais
Amostragem discreta
Séries de Fourier
Séries ortogonais
Stan
Resumé en portugais