Spanish National Research Council · University of Seville
 HOME
INTRANET
esp    ing
IMSE-CNM in Digital.CSIC


 


In all publications
Author: Velasco Montero, Delia
Year: Since 2002
All publications
Towards a Simplified Procedure for CNN Performance Prediction on Embedded Platforms
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference - Workshop on the Architecture of Smart Cameras WASC 2019
[abstract]
Vision is arguably the technical field benefiting the most from the renaissance of artificial intelligence in the last few years. In particular, the convergence of massive datasets for training, boosted computational power, and enhanced machine learning techniques has given rise to highly accurate vision algorithms -even outperforming humans in certain tasks- based on convolutional neural networks (CNNs). The potential of these algorithms has attracted attention from many parties, both in academia and industry, spurring the development of a myriad of hardware platforms and software frameworks. The challenge now is how to efficiently leverage and integrate this variety of components in practical realizations, taking also into account that CNN models keep evolving at a rapid pace. With this scenario in mind, we have been working on a simplified procedure to predict the performance of CNNs running on embedded platforms in terms of throughput and power consumption. The objective is to facilitate the evaluation of the aforementioned components and CNN models prior to actually implementing them, thereby speeding up the deployment of optimal solutions. In this talk, we will describe key aspects of the proposed procedure. Specifically, we will elaborate on SweepNet, a deep neural network tailored for meaningful per-layer characterization. The performance models extracted from SweepNet for a hardware platform allow to accurately predict layer by layer the execution time and energy consumption of any other CNN running on that platform.

On the Correlation of CNN Performance and Hardware Metrics for Visual Inference on a Low-Cost CPU-based Platform
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference - International Conference on Systems, Signals and Image Processing IWSSIP 2019
[abstract]
While providing the same functionality, the various Deep Learning software frameworks available these days do not provide similar performance when running the same network model on a particular hardware platform. On the contrary, we show that the different coding techniques and underlying acceleration libraries have a great impact on the instantaneous throughput and CPU utilization when carrying out the same inference with Caffe, OpenCV, TensorFlow and Caffe2 on an ARM Cortex-A53 multi-core processor. Direct modelling of this dissimilar performance is not practical, mainly because of the complexity and rapid evolution of the toolchains. Alternatively, we examine how the hardware resources are distinctly exploited by the frameworks. We demonstrate that there is a strong correlation between inference performance - including power consumption - and critical parameters associated with memory usage and instruction flow control. This identified correlation is a preliminary step for the development of a simple empirical model. The objective is to facilitate selection and further performance tuning among the ever-growing zoo of deep neural networks and frameworks, as well as the exploration of new network architectures.

Optimum Selection of DNN Model and Framework for Edge Inference
D.Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and Á. Rodríguez-Vázquez
Journal Paper - IEEE Access, vol. 6, pp 51680-51692, 2018
IEEE    DOI: 10.1109/ACCESS.2018.2869929    ISSN: 2169-3536    » doi
[abstract]
This paper describes a methodology to select the optimum combination of deep neural network and software framework for visual inference on embedded systems. As a first step, benchmarking is required. In particular, we have benchmarked six popular network models running on four deep learning frameworks implemented on a low-cost embedded platform. Three key performance metrics have been measured and compared with the resulting 24 combinations: accuracy, throughput, and power consumption. Then, application-level specifications come into play. We propose a figure of merit enabling the evaluation of each network/framework pair in terms of relative importance of the aforementioned metrics for a targeted application. We prove through numerical analysis and meaningful graphical representations that only a reduced subset of the combinations must actually be considered for real deployment. Our approach can be extended to other networks, frameworks, and performance parameters, thus supporting system-level design decisions in the ever-changing ecosystem of embedded deep learning technology.

On-The-Fly Deployment of Deep Neural Networks on Heterogeneous Hardware in a Low-Cost Smart Camera
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galan and A. Rodríguez-Vázquez
Conference - ACM International Conference on Distributed Smart Cameras ICDSC 2018
[abstract]
This demo showcases a low-cost smart camera where different hardware configurations can be selected to perform image recognition on deep neural networks. Both the hardware configuration and the network model can be changed any time on the fly. Up to 24 hardware-model combinations are possible, enabling dynamic reconfiguration according to prescribed application requirements.

Optimum Network/Framework Selection from High-Level Specifications in Embedded Deep Learning Vision Applications
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper - Lecture Notes in Computer Science LNCS, vol. 11182, pp 369-379, 2018
SPRINGER    DOI: 10.1007/978-3-030-01449-0_31    ISSN: 0302-9743    » doi
[abstract]
This paper benchmarks 16 combinations of popular Deep Neural Networks and Deep Learning frameworks on an embedded platform. A Figure of Merit based on high-level specifications is introduced. By sweeping the relative weight of accuracy, throughput and power consumption on global performance, we demonstrate that only a reduced set of the analyzed combinations must actually be considered for real deployment. We also report the optimum network/framework selection for all possible application scenarios defined in those terms, i.e. weighted balance of the aforementioned parameters. Our approach can be extended to other networks, frameworks and performance parameters, thus supporting system-level design decisions in the ever-changing ecosystem of Deep Learning technology.

Optimum Network/Framework Selection from High-Level Specifications in Embedded Deep Learning Vision Applications
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference - Advanced Concepts for Intelligent Vision Systems ACIVS 2018
[abstract]
This paper benchmarks 16 combinations of popular Deep Neural Networks and Deep Learning frameworks on an embedded platform. A Figure of Merit based on high-level specifications is introduced. By sweeping the relative weight of accuracy, throughput and power consumption on global performance, we demonstrate that only a reduced set of the analyzed combinations must actually be considered for real deployment. We also report the optimum network/framework selection for all possible application scenarios defined in those terms, i.e. weighted balance of the aforementioned parameters. Our approach can be extended to other networks, frameworks and performance parameters, thus supporting system-level design decisions in the ever-changing ecosystem of Deep Learning technology.

Performance Analysis of Real-Time DNN Inference on Raspberry Pi
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference - SPIE Real-Time Image and Video Processing 2018
[abstract]
Deep Neural Networks (DNNs) have emerged as the reference processing architecture for the implementation of multiple computer vision tasks. They achieve much higher accuracy than traditional algorithms based on shallow learning. However, it comes at the cost of a substantial increase of computational resources. This constitutes a challenge for embedded vision systems performing edge inference as opposed to cloud processing. In such a demanding scenario, several open-source frameworks have been developed, e.g. Ca e, OpenCV, TensorFlow, Theano, Torch or MXNet. All of these tools enable the deployment of various state-of-the-art DNN models for inference, though each one relies on particular optimization libraries and techniques resulting in di erent performance behavior. In this paper, we present a comparative study of some of these frameworks in terms of power consumption, throughput and precision for some of the most popular Convolutional Neural Networks (CNN) models. The benchmarking system is Raspberry Pi 3 Model B, a low-cost embedded platform with limited resources. We highlight the advantages and limitations associated with the practical use of the analyzed frameworks. Some guidelines are provided for suitable selection of a speci c tool according to prescribed application requirements.

Scopus access Wok access