Extração de conhecimento utilizando aprendizado de máquina

Baranauskas, José Augusto

doi:10.11606/T.59.2016.tde-03082023-170549

Home

Facilities

Habilitation Thesis

DOI

https://doi.org/10.11606/T.59.2016.tde-03082023-170549

Document

Habilitation Thesis

Author

Baranauskas, José Augusto (Catálogo USP)

Full name

José Augusto Baranauskas

E-mail

Institute/School/College

Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto

Knowledge Area

Artificial Intelligence

Date of Defense

2016-09-05

Published

Ribeirão Preto, 2016

Committee

Liang, Zhao (President)
Batista, Gustavo Enrique de Almeida Prado Alves
Carvalho, André Carlos Ponce de Leon Ferreira de
Finger, Marcelo
Guilherme, Ivan Rizzo

Title in Portuguese

Extração de conhecimento utilizando aprendizado de máquina

Keywords in Portuguese

Aprendizado de máquina
Extração de conhecimento
Modelos simbólicos

Abstract in Portuguese

Em fevereiro de 1898, logo após uma tempestade, uma amostra de pardais foi levada ao Laboratório de Anatomia do professor Hermon Carey Bumpus na Universidade de Brown, Rhode Island, EUA. Ele encontrou 136 pardais (Passer domesticus) machucados no chão e decidiu coletar um conjunto de características físicas dos mesmos, dentre elas o comprimento total, comprimento das asas, comprimentos do bico e cabeça, comprimento do úmero, comprimento da envergadura do esterno, dentre outros. Posteriormente, Bumpus observou que apenas 72 pardais sobreviveram e viu neste fato a oportunidade de analisar o processo de seleção natural no qual os pardais que sobreviveram o fizeram porque eles possuíam certas características físicas; por outro lado, os pardais que pereceram, pereceram não por acidente, mas porque eram fisicamente desqualificados. Diante da observação de processos naturais ou artificiais por meio da coleta de informações há duas perguntas que surgem: (i) É possível criar alguma hipótese a partir dos dados coletados? (ii) Dada uma hipótese, como saber se ela generaliza para dados futuros? As respostas a estas perguntas podem ser dadas pela pesquisa em Aprendizado de Máquina (AM) que tem como objetivo desenvolver algoritmos capazes de adquirir conhecimento de forma automática, baseando-se em experiências acumuladas por meio da solução bem sucedida em problemas anteriores. Os algoritmos de AM podem ser categorizados de acordo com o grau de compreensibilidade proporcionado ao ser humano em: (a) sistemas tipo caixa-preta que desenvolvem sua própria representação do conceito, isto é, sua representação interna pode não ser facilmente interpretada por humanos e (b) sistemas orientados a conhecimento que têm como objetivo a criação de estruturas simbólicas que sejam compreensíveis por humanos. De especial interesse neste trabalho estão os sistemas de aprendizado simbólico (orientados a conhecimento) que buscam aprender construindo representações de um conceito tipicamente na forma de alguma expressão lógica, árvore de decisão, regras ou rede semântica. Assim, este trabalho se concentra em sistemas que contribuem para a compreensão dos dados em contraste com indutores que visam apenas uma grande precisão. Um exemplo típico é o desenvolvimento de sistemas especialistas nos quais é importante que especialistas humanos possam, fácil e confiavelmente, verificar o conhecimento extraído e relacioná-lo ao seu próprio conhecimento. Além disso, algoritmos de aprendizado que induzem estruturas compreensíveis, contribuindo para a compreensão do domínio considerado, podem produzir conhecimento novo.

Title in English

Knowledge discovery using machine learning

Keywords in English

Knowledge discovery
Machine learning
Symbolic models

Abstract in English

In February 1898, after an uncommonly severe storm, a sample os sparrows were brought to the Anatomical Laboratory of Brown University, Rhode Island, USA, led by Professor Hermon Carey Bumpus. He found 136 sparrows (Passer domesticus) injured on the ground and then decided to collect a set physical characteristics of them, among which the total length, the length of the wings, the length of the beak, lhe length of the head, humeral length, the length of sternum wingspan, among others. Later, Bumpus noted only 72 sparrows survived and saw in this fact the opportunity to analyze the process of natural selection in which the sparrows who survived did so because they had certain physical characteristics; on the other hand, the sparrows who perished, perished not by accident but because they were physically unfit. Given the observation of natural or artificial processes by collecting information, two questions arise: (i) Is it possible to create a hypothesis from the data collected? (ii) Given one hypothesis, how to know if it generalizes to future data? The answers to these questions can be given by Machine Learning (ML) research which aims to develop algorithms able to acquire knowledge automatically, based on experiences gained through the successful solution of previous problems. ML algorithms can be categorized according to the degree of comprehensibility provided to the human being: (a) black-box systems that develop their own concept of representation, that is, its internal representation can not be easily interpreted by humans e (b) knowledge-oriented systems that aim to create symbolic structures that are understandable by humans. Of particular interest in this work are the symbolic learning systems (knowledge-oriented) that aim to learn symbolic representations of a concept, typically in the form of logical expressions, decision trees, rules or semantic networks. In this way, the work described here focuses on systems that contribute to the understanding of the data in contrast to classifiers that target only high accuracy. A typical example is the development of expert systems in which it is important that human experts can, easily and reliably, verify the extracted knowledge and relate it to their own knowledge domam. In addition, learning algorithms that induce understandable structures, contributing to the understanding of the domain under consideration, can produce new knowledge.

WARNING - Viewing this document is conditioned on your acceptance of the following terms of use:
This document is only for private use for research and teaching activities. Reproduction for commercial use is forbidden. This rights cover the whole data about this document as well as its contents. Any uses or copies of this document in whole or in part must include the author's name.

002794764.pdf (32.78 Mbytes)

Publishing Date

2023-08-10

Derived works

WARNING: Learn what derived works are clicking here.