|Article title||HARDWARE IMPLEMENTATION OF A CONVOLUTIONAL NEURAL NETWORK IN FPGA BASED ON FIXED POINT CALCULATIONS|
|Authors||R. A. Solovyev, A. G. Kustov, V. S. Ruhlov, A. N. Shchelokov, D. V. Puzyrkov|
|Section||SECTION IV. COMPUTER SCIENCE AND ELECTRONICS|
|Month, Year||07, 2017 @en|
|Abstract||The latest research in the field of neural networks has shown that they cope well with a variety of tasks related to the classification and processing of images, audio and video data. The dimension and computational complexity during classification is so great that even powerful general-purpose CPUs cannot cope well with these computations. For high-grade work with modern neural networks powerful and therefore expensive GPU (video cards) are usually used. This is especially true for processing video information in real time. Some structures of neural networks, with very high accuracy of image classification, have properties that are easily transferred to the hardware platform. Since the requirements for hardware for working with neural networks are constantly growing, it is necessary to develop special hardware units for use in VLSI and FPGA. In this work, we propose to develop a set of hardware blocks and methods for implementing neural networks on FPGAs in order to accelerate the calculation and development of equipment for performing image classification tasks. It is assumed that at the input we already have a pre-conditioned neural network and we need to create a device that performs the classification operation. Initially, we are given both the structure and the weights of the neural network. A technique is proposed for the transition from a floating point model to a fixed-point calculation, without loss of classification accuracy. The implementation of the basic blocks of the test neural network in hardware is suggested. The results of experiments on a test convolutional neural network, trained on a set of MNIST images, are presented.|
|Keywords||Сonvolutional neural nets; FPGA; fixed point calculations; 2D convolution.|
|References||1. Very Deep Convolutional Networks for Large-Scale Image Recognition. Available at: https://arxiv.org/pdf/1409.1556v6.pdf.
2. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classifica-tion. Available at: http://arxiv.org/abs/1502.01852.
3. Going Deeper with Convolutions. Available at: http://www.cs.unc.edu/~wliu/papers/ GoogLeNet.pdf.
4. Deep Residual Learning for Image Recognition. Available at: http://arxiv.org/abs/1512.03385.
5. U-Net: Convolutional Networks for Biomedical Image Segmentation. Available at: http://arxiv.org/abs/1505.04597.
6. Iandola F. et al. Densenet: Implementing efficient convnet descriptor pyramids, arXiv preprint arXiv:1404.1869, 2014.
7. Liu W. et al. Ssd: Single shot multibox detector, European conference on computer vision. Springer, Cham, 2016, pp. 21-37.
8. Benchmarks for popular CNN models. Available at: https://github.com/jcjohnson/cnn-benchmarks.
9. Krizhevsky A., Sutskever I., Hinton G.E. Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems, 2012, pp. 1097-1105.
10. Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson. How transferable are features in deep neural networks? Available at: https://arxiv.org/abs/1411.1792.
11. Keras: Deep Learning library for Theano and TensorFlow. Available at: https://keras.io/.
12. LeCun Y., Bottou L., Bengio Y., and Haffner P. Gradient-based learning applied to document recognition, Proceedings of the IEEE, November 1998, Vol. 86 (11), pp. 2278-2324.
13. Nair V., Hinton G.E. Rectified linear units improve restricted boltzmann machines, Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807-814.
14. Govindu G. et al. Analysis of high-performance floating-point arithmetic on FPGAs, Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th IEEE International. IEEE, 2004, pp. 149.
15. Graham B. Fractional max-pooling, arXiv preprint arXiv:1412.6071, 2014.
16. Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun. Deep Residual Learning for Image Recognition // arXiv preprint arXiv:1512.03385. Available at: https://arxiv.org/pdf/ 1512.03385.pdf.
17. Oudjida A.K., Liacha A., Bakiri M., and Chaillet N. Multiple Constant Multiplication Algorithm for High Speed and Low Power Design, IEEE Trans. on Circuits and Systems (TCAS II), February 2016, Vol. 63, No. 2, pp. 176-180,
18. Liacha A. et al. A variable RADIX-2 r algorithm for single constant multiplication, New Circuits and Systems Conference (NEWCAS), 2017 15th IEEE International. IEEE, 2017,
19. Liacha A. et al. Design of High-Speed, Low-Power, and Area-Efficient FIR Filters, IET Cir-cuits, Devices & Systems, 2017.
20. Zeinolabedin S.M.A., Zhou J., Kim T.T.H. A Power and Area Efficient Ultra-Low Voltage Laplacian Pyramid Processing Engine With Adaptive Data Compression, IEEE Transactions on Circuits and Systems I: Regular Papers, 2016, Vol. 63, No. 10, pp. 1690-1700.