Método para execução de redes neurais convolucionais em FPGA.

Sousa, Mark Cappello Ferreira de

doi:10.11606/D.3.2019.tde-14082019-110912

Home

Facilities

Master's Dissertation

DOI

https://doi.org/10.11606/D.3.2019.tde-14082019-110912

Document

Master's Dissertation

Author

Sousa, Mark Cappello Ferreira de (Catálogo USP)

Full name

Mark Cappello Ferreira de Sousa

E-mail

Institute/School/College

Escola Politécnica

Knowledge Area

Microelectronics

Date of Defense

2019-04-26

Published

São Paulo, 2019

Supervisor

Hernandez, Emílio Del Moral (Catálogo USP)

Committee

Hernandez, Emílio Del Moral (President)
Giorno, Fernando Antonio de Castro
Pumarica, Julio César Saldaña

Title in Portuguese

Método para execução de redes neurais convolucionais em FPGA.

Keywords in Portuguese

AlexNet
FPGA
Reconhecimento de imagem
Reconhecimento embarcado de padrões
Redes neurais
Sistema-em-um-chip

Abstract in Portuguese

Redes Neurais Convolucionais têm sido utilizadas com sucesso para reconhecimento de padrões em imagens. Porém, o seu alto custo computacional e a grande quantidade de parâmetros envolvidos dificultam a execução em tempo real deste tipo de rede neural artificial em aplicações embarcadas, onde o poder de processamento e a capacidade de armazenamento de dados são restritos. Este trabalho estudou e desenvolveu um método para execução em tempo real em FPGAs de uma Rede Neural Convolucional treinada, aproveitando o poder de processamento paralelo deste tipo de dispositivo. O foco deste trabalho consistiu na execução das camadas convolucionais, pois estas camadas podem contribuir com até 99% da carga computacional de toda a rede. Nos experimentos, um dispositivo FPGA foi utilizado conjugado com um processador ARM dual-core em um mesmo substrato de silício. Apenas o dispositivo FPGA foi utilizado para executar as camadas convolucionais da Rede Neural Convolucional AlexNet. O método estudado neste trabalho foca na distribuição eficiente dos recursos do FPGA por meio do balanceamento do pipeline formado pelas camadas convolucionais, uso de buffers para redução e reutilização de memória para armazenamento dos dados intermediários (gerados e consumidos pelas camadas convolucionais) e uso de precisão numérica de 8 bits para armazenamento dos kernels e aumento da vazão de leitura dos mesmos. Com o método desenvolvido, foi possível executar todas as cinco camadas convolucionais da AlexNet em 3,9 ms, com a frequência máxima de operação de 76,9 MHz. Também foi possível armazenar todos os parâmetros das camadas convolucionais na memória interna do FPGA, eliminando possíveis gargalos de acesso à memória externa.

Title in English

A method for execution of convolutional neural networks in FPGA.

Keywords in English

AlexNet
Convolutional neural networks
Embedded pattern recognition
FPGA
Image recognition
System-on-chip

Abstract in English

Convolutional Neural Networks have been used successfully for pattern recognition in images. However, their high computational cost and the large number of parameters involved make it difficult to perform this type of artificial neural network in real time in embedded applications, where the processing power and the data storage capacity are restricted. This work studied and developed methods for real-time execution in FPGAs of a trained convolutional neural network, taking advantage of the parallel processing power of this type of device. The focus of this work was the execution of convolutional layers, since these layers can contribute up to 99% of the computational load of the entire network. In the experiments, an FPGA device was used in conjunction with a dual-core ARM processor on the same silicon substrate. The FPGA was used to perform convolutional layers of the AlexNet Convolutional Neural Network. The methods studied in this work focus on the efficient distribution of the FPGA resources through the balancing of the pipeline formed by the convolutional layers, the use of buffers for the reduction and reuse of memory for the storage of intermediate data (generated and consumed by the convolutional layers) and 8 bits for storage of the kernels and increase of the flow of reading of them. With the developed methods, it was possible to execute all five AlexNet convolutional layers in 3.9 ms with the maximum operating frequency of 76.9 MHz. It was also possible to store all the parameters of the convolutional layers in the internal memory of the FPGA, eliminating possible external access memory bottlenecks.

WARNING - Viewing this document is conditioned on your acceptance of the following terms of use:
This document is only for private use for research and teaching activities. Reproduction for commercial use is forbidden. This rights cover the whole data about this document as well as its contents. Any uses or copies of this document in whole or in part must include the author's name.

MarkCappelloFerreiradeSousaCorr19.pdf (3.07 Mbytes)

Publishing Date

2019-08-22

Derived works

WARNING: Learn what derived works are clicking here.