Um modelo Bayesiano de meta-análise para dados de ChIP-Seq

Andrade, Pablo de Morais

doi:10.11606/T.95.2019.tde-04102019-141931

Home

Facilities

Doctoral Thesis

DOI

https://doi.org/10.11606/T.95.2019.tde-04102019-141931

Document

Doctoral Thesis

Author

Andrade, Pablo de Morais (Catálogo USP)

Full name

Pablo de Morais Andrade

E-mail

Institute/School/College

Interunidades em Bioinformática

Knowledge Area

Bio-informatics

Date of Defense

2017-04-17

Published

São Paulo, 2017

Supervisor

Pereira, Carlos Alberto de Braganca (Catálogo USP)

Committee

Pereira, Carlos Alberto de Braganca (President)
Brentani, Helena Paula
Nishiyama Junior, Milton Yutaka
Setubal, João Carlos
Wechsler, Sergio

Title in Portuguese

Um modelo Bayesiano de meta-análise para dados de ChIP-Seq

Keywords in Portuguese

ChIP-Seq
Estatística Bayesiana
Meta-análise

Abstract in Portuguese

Com o desenvolvimento do sequenciamento em larga escala, novas tecnologias surgiram para auxiliar o estudo de sequências de ácidos nucleicos (DNA e cDNA); como consequência, o desenvolvimento de novas ferramentas para analisar o grande volume de dados gerados fez-se necessário. Entre essas novas tecnologias, uma, em particular, chamada Imunoprecipitação de Cromatina seguida de sequenciamento de DNA em larga escala ou CHIP-Seq, tem recebido muita atenção nos últimos anos. Esta tecnologia tornou-se um método usado amplamente para mapear sítios de ligação de proteínas de interesse no genoma. A análise de dados resultantes de experimentos de ChIP-Seq é desaadora porque o mapeamento das sequências no genoma apresenta diferentes formas de viés. Os métodos existentes usados para encontrar picos em dados de ChIP-Seq apresentam limitações relacionadas ao número de amostras de controle e tratamento usadas, e em relação à forma como essas amostras são combinadas. Nessa tese, mostramos que métodos baseados em testes estatísticos de hipótese tendem a encontrar um número muito maior de picos à medida que aumentamos o tamanho da amostra, o que os torna pouco conáveis para análise de um grande volume de dados. O presente estudo descreve um método estatístico Bayesiano, que utiliza meta-análise para encontrar sítios de ligação de proteínas de interesse no genoma resultante de experimentos de ChIPSeq. Esse métodos foi chamado Meta-Analysis Bayesian Approach ou MABayApp. Nós mostramos que o nosso método é robusto e pode ser utilizado com diferentes números de amostras de controle e tratamentos, assim como quando comparando amostras provenientes de diferentes tratamentos.

Title in English

A meta-analysis Bayesian model for ChIP-Seq data

Keywords in English

Bayesian Model
ChIP-Seq peak calling
Meta-Analysis

Abstract in English

With the development of high-throughput sequencing, new technologies emerged for the study of nucleic acid sequences (DNA and cDNA) and as a consequence, the necessity for tools to analyse a great volume of data was made necessary. Among these new technologies, one in special Chromatin Immunoprecipitation followed by massive parallel DNA Sequencing, or ChIP-Seq, has been evidenced during the last years. This technology has become a widely used method to map locations of binding sites for a given protein in the genome. The analysis of data resulting from ChIP-Seq experiments is challenging since it can have dierent sources of bias during the sequencing and mapping of reads to the genome. Current methods used to nd peaks in this ChIP-Seq have limitations regarding the number of treatment and control samples used and on how these samples should be used together. In this thesis we show that since most of these methods are based on traditional statistical hypothesis tests, by increasing the sample size the number of peaks considered signicant changes considerably. This study describes a Bayesian statistical method using meta-analysis to discover binding sites of a protein of interest based on peaks of reads found in ChIP-Seq data. We call it Meta- Analysis Bayesian Approach or MABayApp. We show that our method is robust and can be used for dierent number of control and treatment samples, as well as when comparing samples under dierent treatments.

WARNING - Viewing this document is conditioned on your acceptance of the following terms of use:
This document is only for private use for research and teaching activities. Reproduction for commercial use is forbidden. This rights cover the whole data about this document as well as its contents. Any uses or copies of this document in whole or in part must include the author's name.

MABayAppFinal_PhD_Disseration_PablodeMoraisAndrade.pdf (12.27 Mbytes)

Publishing Date

2019-10-07

Derived works

WARNING: Learn what derived works are clicking here.