• JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
 
  Bookmark and Share
 
 
Master's Dissertation
DOI
https://doi.org/10.11606/D.45.2013.tde-28112013-185051
Document
Author
Full name
Michel Oleynik
E-mail
Institute/School/College
Knowledge Area
Date of Defense
Published
São Paulo, 2013
Supervisor
Committee
Finger, Marcelo (President)
Lago, Alair Pereira do
Schulz, Stefan Paul
Title in Portuguese
Extração de informações de narrativas clínicas
Keywords in Portuguese
classificação de texto
laudos de anatomia patológica
processamento de linguagem natural
Abstract in Portuguese
Narrativas clínicas são normalmente escritas em linguagem natural devido a seu poder descritivo e facilidade de comunicação entre os especialistas. Processar esses dados para fins de descoberta de conhecimento e coleta de estatísticas exige técnicas de extração de informações, com alguns resultados já apresentados na literatura para o domínio jornalístico, mas ainda raras no domínio médico. O presente trabalho visa desenvolver um classificador de laudos de anatomia patológica que seja capaz de inferir a topografia e a morfologia de um câncer na Classificação Internacional de Doenças para Oncologia (CID-O). Dados fornecidos pelo A.C. Camargo Cancer Center em São Paulo foram utilizados para treinamento e validação. Técnicas de processamento de linguagem natural (PLN) aliadas a classificadores bayesianos foram exploradas na busca de qualidade da recuperação da informação, avaliada por meio da medida-F2. Valores acima de 74% para o grupo topográfico e de 61% para o grupo morfológico são relatados, com pequena contribuição das técnicas de PLN e suavização. Os resultados corroboram trabalhos similares e demonstram a necessidade de retreinamento das ferramentas de PLN no domínio médico.
Title in English
Clinical reports information retrieval
Keywords in English
natural language processing
pathology reports
text classication
Abstract in English
Clinical reports are usually written in natural language due to its descriptive power and ease of communication among specialists. Processing data for knowledge discovery and statistical analysis requires information retrieval techniques, already established for newswire texts, but still rare in the medical subdomain. The present work aims at developing an automated classifier of pathology reports, which should be able to infer the topography and the morphology classes of a cancer using codes of the International Classification of Diseases for Oncology (ICD-O). Data provided by the A.C. Camargo Cancer Center located in Sao Paulo was used for training and validation. Techniques of natural language processing (NLP) and Bayes classifiers were used in search for information retrieval quality, evaluated by F2-score. Measures upper than 74% in the topographic group and 61% in the morphologic group are reported, with small contribution from NLP or smoothing techniques. The results agree with similar studies and show that a retraining of NLP tools in the medical domain is necessary.
 
WARNING - Viewing this document is conditioned on your acceptance of the following terms of use:
This document is only for private use for research and teaching activities. Reproduction for commercial use is forbidden. This rights cover the whole data about this document as well as its contents. Any uses or copies of this document in whole or in part must include the author's name.
Publishing Date
2014-01-03
 
WARNING: Learn what derived works are clicking here.
All rights of the thesis/dissertation are from the authors
CeTI-SC/STI
Digital Library of Theses and Dissertations of USP. Copyright © 2001-2024. All rights reserved.