Doctoral Thesis
DOI
10.11606/T.45.2012.tde-12092012-230830
Document
Author
Full name
Francisco Eloi Soares de Araujo
E-mail
Institute/School/College
Knowledge Area
Date of Defense
Published
São Paulo, 2012
Supervisor
Committee
Soares, Jose Augusto Ramos (President)
Dias, Zanoni
Ferreira, Carlos Eduardo
Martinez, Fabio Henrique Viduani
Rozante, Luiz Carlos da Silva
Title in Portuguese
Alinhamentos e comparação de sequências
Keywords in Portuguese
alinhamento de sequências
alinhamento de várias sequências.
alinhamento estendido
custo normalizado de alinhamentos
distância de edição
matrizes equivalentes
métrica
Abstract in Portuguese
Title in English
Alignment and comparison of sequences
Keywords in English
edit distance
equivalent matrices
extended alignment
metric
multiple sequence alignments.
normalized alignment cost
sequence alignment
Abstract in English
Comparison of finite sequences is a tool used to solve problems in several areas. In order to compare sequences, we infer which are the edit operations of substitution, insertion and deletion of symbols that transform one sequence into another. Scoring matrices are a widely used structure to define a cost for each type of edit operation. A scoring matrix G is indexed by symbols of an alphabet. The entry in G in row A and column B measures the cost of the edit operation for replacing symbol A by symbol B. Scoring matrices induce functions that assign a score for a set of edit operations. Some of these functions for comparing two and multiple sequences are studied in this thesis. If each symbol is edited exactly once for transforming a sequence into another, the set of edit operations can be represented by a structure called alignment. We describe a structure to represent the set of edit operations that cannot be represented by a conventional alignment and we design an algorithm to find the cost of an optimal sequence of edit operations by using a known algorithm to find the cost of an optimal alignment. Considering three different kinds of induced scoring functions, we characterize, for each one of them, the class of matrices for which the induced scoring functions are metrics on sequences. Given two scoring matrices G and G', we say they are equivalent for a given function that is induced by a scoring matrix and that evaluates the quality of an alignment if, for any two alignments A and B of two sequences, we have the following: alignment A is ``better'' than B considering scoring matrix G if and only if A is ``better'' than B considering scoring matrix G'. In this work, we determine necessary and sufficient conditions for scoring matrices to be equivalent. Finally, we define three new criteria for scoring alignments of several sequence. Every criterion considers the length of the alignment and the edit operations represented by it. An algorithm for each criterion is studied and the corresponding decision problem is shown to be NP-complete.

WARNING - Viewing this document is conditioned on your acceptance of the following terms of use:
This document is only for private use for research and teaching activities. Reproduction for commercial use is forbidden. This rights cover the whole data about this document as well as its contents. Any uses or copies of this document in whole or in part must include the author's name.
Publishing Date
2012-09-13

WARNING: Learn what derived works are clicking here.
All rights of the thesis/dissertation are from the authors
Centro de Informática de São Carlos