IMSE Publications

Found results matching for:

Author: Jorge Fernández Berni
Year: Since 2002

Journal Papers


A Pipelining-Based Heterogeneous Scheduling and Energy-Throughput Optimization Scheme for CNNs Leveraging Apache TVM
D. Velasco-Montero, B. Goossens, J. Fernández-Berni, Á. Rodríguez-Vázquez and W. Philips
Journal Paper · IEEE Access, 2023
abstract      doi      

Extracting information of interest from continuous video streams is a strongly demanded computer vision task. For the realization of this task at the edge using the current de-facto standard approach, i.e., deep learning, it is critical to optimize key performance metrics such as throughput and energy consumption according to prescribed application requirements. This allows achieving timely decision-making while extending the battery lifetime as much as possible. In this context, we propose a method to boost neural-network performance based on a co-execution strategy that exploits hardware heterogeneity on edge platforms. The enabling tool is Apache TVM, a highly efficient machine-learning compiler compatible with a diversity of hardware back-ends. The proposed approach solves the problem of network partitioning and distributes the workloads to make concurrent use of all the processors available on the board following a pipeline scheme. We conducted experiments on various popular CNNs compiled with TVM on the Jetson TX2 platform. The experimental results based on measurements show a significant improvement in throughput with respect to a single-processor execution, ranging from 14% to 150% over all tested networks. Power-efficient configurations were also identified, accomplishing energy reductions above 10%.

Impact of Thermal Throttling on Long-Term Visual Inference in a CPU-Based Edge Device
T. Benoit-Cattin, D. Velasco-Montero and J. Fernández-Berni
Journal Paper · Electronics, vol. 9, no. 12, article 2106, 2020
abstract      doi      pdf

Many application scenarios of edge visual inference, e.g., robotics or environmental monitoring, eventually require long periods of continuous operation. In such periods, the processor temperature plays a critical role to keep a prescribed frame rate. Particularly, the heavy computational load of convolutional neural networks (CNNs) may lead to thermal throttling and hence performance degradation in few seconds. In this paper, we report and analyze the long-term performance of 80 different cases resulting from running five CNN models on four software frameworks and two operating systems without and with active cooling. This comprehensive study was conducted on a low-cost edge platform, namely Raspberry Pi 4B (RPi4B), under stable indoor conditions. The results show that hysteresis-based active cooling prevented thermal throttling in all cases, thereby improving the throughput up to approximately 90% versus no cooling. Interestingly, the range of fan usage during active cooling varied from 33% to 65%. Given the impact of the fan on the power consumption of the system as a whole, these results stress the importance of a suitable selection of CNN model and software components. To assess the performance in outdoor applications, we integrated an external temperature sensor with the RPi4B and conducted a set of experiments with no active cooling in a wide interval of ambient temperature, ranging from 22 °C to 36 °C. Variations up to 27.7% were measured with respect to the maximum throughput achieved in that interval. This demonstrates that ambient temperature is a critical parameter in case active cooling cannot be applied.

Fixed Pattern Noise Analysis for Feature Descriptors in CMOS APS Images
J. Zapata-Pérez, G. Domenech-Asensi, R. Ruiz-Merino, J.J. Martínez-Álvarez, J. Fernández-Berni and R. Carmona-Galán
Journal Paper · Sensing and Imaging, vol. 21, article 14, 2020
abstract      doi      pdf

This paper provides a comparative performance evaluation of local features for images from CMOS APS sensors affected by fixed pattern noise for different combinations of common detectors and descriptors. Although numerous studies report comparisons of local features designed for ordinary visual images, their performance on images with fixed pattern noise is far less assessed. The goal of this work is to develop a tool that allows to evaluate the performance of computer vision algorithms and their implementations subject to deviations of the physical parameters of the CMOS sensor. This tool will facilitate the quantification of the high-level effects produced by circuit random noise, enabling the optimization of the sensor during the design flow with specifications much closer to the application scope. Likewise, this tool will provide the electronic designer with a relationship between high-level algorithm accuracy and maximum fixed pattern noise. Thus the contribution is double: (1) to evaluate the performance of both local float type and more recent binary type detectors and descriptors when combined under a variety of image transformations, and (2) to extract relevant information from circuit-level simulation and to develop a basic noise model to be employed in the design of the feature descriptor evaluation. The utility of this approach is illustrated by the evaluation of the effect of column-wise and pixel-wise fixed pattern noise at the sensor on the performance of different local feature descriptors.

Comparison between Digital Tone-Mapping Operators and a Focal-Plane Pixel-Parallel Circuit
G.M.S. Nunes, F.D.V.R. Oliveira, M.C.Q. Farias, J.G. Gomes, A. Petraglia, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper · Signal Processing: Image Communication, vol. 88, article 115937, 2020
abstract      doi      pdf

In a previous work, we proposed a color extension of an existing focal-plane tone-mapping operator (FPTMO) and introduced circuit modifications that led to smaller sensing array area and simpler off-chip white-balance operations. The proposed FPTMO chip was fabricated and its experimental test setup is under development. In the present work, we systematically compare the previously proposed FPTMOs with conventional digital tone-mapping operators (DTMOs), both in terms of overall image quality and execution time. To assess the image quality, we use the Tone-Mapped image Quality Index (TMQI) metric, the Blind Tone-Mapped Quality Index (BTMQI) metric, and an image colorfulness metric. By statistically comparing the results given by both metrics, we verify that the proposed FPTMOs achieve results similar to those obtained with DTMOs, occasionally outperforming them. To compare the tone-mapping operators execution times, we perform a detailed complexity analysis. Results show that system-level software FPTMO descriptions (FPTMO functionality described in software) rank among the fastest DTMOs and that the hardware FPTMO (described in software) speed-up depends on the target frame rate. For 30-frames-per-second color image generation, the fastest FPTMO is 15 times faster than the DTMO with highest TMQI and BTMQI metric scores, and 170 times faster than the DTMO with best colorfulness metric score.

PreVIous: A Methodology for Prediction of Visual Inference Performance on IoT Devices
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galan and A. Rodríguez-Vázquez
Journal Paper · IEEE Internet of Things Journal, vol. 7, no. 10, pp 9227-9240, 2020
abstract      doi      pdf

This paper presents PreVIous, a methodology to predict the performance of convolutional neural networks (CNNs) in terms of throughput and energy consumption on vision-enabled devices for the Internet of Things. CNNs typically constitute a massive computational load for such devices, which are characterized by scarce hardware resources to be shared among multiple concurrent tasks. Therefore, it is critical to select the optimal CNN architecture for a particular hardware platform according to prescribed application requirements. However, the zoo of CNN models is already vast and rapidly growing. To facilitate a suitable selection, we introduce a prediction framework that allows to evaluate the performance of CNNs prior to their actual implementation. The proposed methodology is based on PreVIousNet, a neural network specifically designed to build accurate per-layer performance predictive models. PreVIousNet incorporates the most usual parameters found in state-of-the-art network architectures. The resulting predictive models for inference time and energy have been tested against comprehensive characterizations of seven well-known CNN models running on two different software frameworks and two different embedded platforms. To the best of our knowledge, this is the most extensive study in the literature concerning CNN performance prediction on low-power low-cost devices. The average deviation between predictions and real measurements is remarkably low, ranging from 3% to 10%. This means state-of-the-art modeling accuracy. As an additional asset, the fine-grained a priori analysis provided by PreVIous could also be exploited by neural architecture search engines.

Performance Assessment of Deep Learning Frameworks through Metrics of CPU Hardware Exploitation on an Embedded Platform
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper · International Journal of Electrical and Computer Engineering Systems, vol. 11, no. 1, pp 1-11, 2020
abstract      doi      pdf

In this paper, we analyze heterogeneous performance exhibited by some popular deep learning software frameworks for visual inference on a resource-constrained hardware platform. Benchmarking of Caffe, OpenCV, TensorFlow, and Caffe2 is performed on the same set of convolutional neural networks in terms of instantaneous throughput, power consumption, memory footprint, and CPU utilization. To understand the resulting dissimilar behavior, we thoroughly examine how the resources in the processor are differently exploited by these frameworks. We demonstrate that a strong correlation exists between hardware events occurring in the processor and inference performance. The proposed hardware-aware analysis aims to find limitations and bottlenecks emerging from the joint interaction of frameworks and networks on a particular CPU-based platform. This provides insight into introducing suitable modifications in both types of components to enhance their global performance. It also facilitates the selection of frameworks and networks among a large diversity of these components available these days for visual understanding.

Optimum Selection of DNN Model and Framework for Edge Inference
D.Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and Á. Rodríguez-Vázquez
Journal Paper · IEEE Access, vol. 6, pp 51680-51692, 2018
abstract      doi      pdf

This paper describes a methodology to select the optimum combination of deep neural network and software framework for visual inference on embedded systems. As a first step, benchmarking is required. In particular, we have benchmarked six popular network models running on four deep learning frameworks implemented on a low-cost embedded platform. Three key performance metrics have been measured and compared with the resulting 24 combinations: accuracy, throughput, and power consumption. Then, application-level specifications come into play. We propose a figure of merit enabling the evaluation of each network/framework pair in terms of relative importance of the aforementioned metrics for a targeted application. We prove through numerical analysis and meaningful graphical representations that only a reduced subset of the combinations must actually be considered for real deployment. Our approach can be extended to other networks, frameworks, and performance parameters, thus supporting system-level design decisions in the ever-changing ecosystem of embedded deep learning technology.

Optimum Network/Framework Selection from High-Level Specifications in Embedded Deep Learning Vision Applications
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper · Lecture Notes in Computer Science LNCS, vol. 11182, pp 369-379, 2018
abstract      doi      pdf

This paper benchmarks 16 combinations of popular Deep Neural Networks and Deep Learning frameworks on an embedded platform. A Figure of Merit based on high-level specifications is introduced. By sweeping the relative weight of accuracy, throughput and power consumption on global performance, we demonstrate that only a reduced set of the analyzed combinations must actually be considered for real deployment. We also report the optimum network/framework selection for all possible application scenarios defined in those terms, i.e. weighted balance of the aforementioned parameters. Our approach can be extended to other networks, frameworks and performance parameters, thus supporting system-level design decisions in the ever-changing ecosystem of Deep Learning technology.

Guest editorial special issue on computational image sensors and smart camera hardware
J. Fernández-Berni, R. Carmona-Galán, G. Sicard and A. Dupret
Journal Paper · International Journal of Circuit Theory and Applications, Computational Image Sensors and Smart Camera Hardware, vol. 46, no. 9, pp 1577-1579, 2018
abstract      doi      

Recent advances in both software and hardware technologies are enabling the emergence of vision as a key sensorial modality in various application scenarios. Concerning hardware, all of the components along the signal chain play a significant role when it comes to implementing smart vision-enabled systems. At the front end, new circuit structures for sensing, processing, and signal conditioning are adding functionalities in CMOS imagers beyond the mere generation of 2-D intensity maps. Moreover, the development of vertical integration technologies is facilitating monolithic realizations of visual sensors where the incorporation of computational capabilities has no impact at all on image quality. Typically, the outcome of the front-end device in a smart camera will be a preprocessed flow of information ready for further efficient analysis. At this point, specific ICs known as vision processing units can be inserted to accelerate the processing flow according to the targeted application. On the other hand, reconfigurability is a valuable asset in the ever-changing field of vision. FPGAs leverage cutting-edge digital technologies to offer flexible hardware for exploration of different memory arrangements, data flows, and processing parallelization. It is precisely parallelization for which GPUs constitute an interesting alternative in smart cameras when massive pixel-level operation is required. This is the case of state-of-the-art vision algorithms based on convolutional neural networks. At higher level, DSPs and multicore CPUs make software development notably easier at the cost of losing hardware specificity. Overall, this special issue aims at covering some of the latest research works in the vast ecosystem of hardware for artificial vision.

CMOS Vision Sensors: Embedding Computer Vision at Imaging Front-Ends
A. Rodríguez-Vázquez, J. Fernández-Berni, J.A. Leñero-Bardallo, I. Vornicu and R. Carmona-Galán
Journal Paper · IEEE Circuits and Systems Magazine, vol. 18, no. 2, pp 90-107, 2018
abstract      doi      pdf

CMOS Image Sensors (CIS) are key for imaging technologies. These chips are conceived for capturing optical scenes focused on their surface, and for delivering electrical images, commonly in digital format. CISs may incorporate intelligence; however, their smartness basically concerns calibration, error correction and other similar tasks. The term CVISs (CMOS VIsion Sensors) defines other class of sensor front-ends which are aimed at performing vision tasks right at the focal plane. They have been running under names such as computational image sensors, vision sensors and silicon retinas, among others. CVIS and CISs are similar regarding physical implementation. However, while inputs of both CIS and CVIS are images captured by photo-sensors placed at the focal-plane, CVISs primary outputs may not be images but either image features or even decisions based on the spatial-temporal analysis of the scenes. We may hence state that CVISs are more ‘intelligent’ than CISs as they focus on information instead of on raw data. Actually, CVIS architectures capable of extracting and interpreting the information contained in images, and prompting reaction commands thereof, have been explored for years in academia, and industrial applications are recently ramping up. One of the challenges of CVISs architects is incorporating computer vision concepts into the design flow. The endeavor is ambitious because imaging and computer vision communities are rather disjoint groups talking different languages. The Cellular Nonlinear Network Universal Machine (CNNUM) paradigm, proposed by Profs. Chua and Roska, defined an adequate framework for such conciliation as it is particularly well suited for hardware-software co-design. This paper overviews CVISs chips that were conceived and prototyped at IMS E Vision Lab over the past twenty years. Some of them fit the CNNUM paradigm while others are tangential to it. All of them employ per-pixel mixed-signal processing circuitry to achieve sensor-processing concurrency in the quest of fast operation with reduced energy budget.

Special issue on computational image sensors and smart camera hardware
J. Fernández-Berni, R. Carmona-Galán, G. Sicard A. Dupret
Journal Paper · International Journal of Circuit Theory and Applications, vol. 45, no. 6, pp 729-730, 2017
abstract      doi      pdf

Embedded computer vision is becoming a disruptive technological component for key market drivers of the semiconductor industry like smartphones, the Internet of Things or automotive. The recent incorporation of advanced artificial intelligence techniques into robust and precise inference schemes is underpinning this disruptiveness, along with the increase of on-chip computational power and the development of tools for rapid prototyping and experimentation. As a result, visual sensing is being embedded in all kinds of products and services, with different degrees of scene understanding. These products either did not exist before or are ousting existing ones. In this scenario, vision-enabled systems are expected to have ever-growing need for lower power consumption, lower cost, smaller form factor, higher image resolution and higher throughput. These burdensome requirements demand new approaches when it comes to dealing with the massive flow of information associated with the visual stimulus. In particular, early vision stands out as the critical stage where raw pixels are transformed into useful features for the targeted task. In order to effectively cope with this stage, front-end hardware resources play a major role. The incorporation of advanced sensing and computational capabilities in image sensors allows exploiting parallelism and distributed memory from the very beginning of the signal processing chain. Sensing structures can be designed to produce multiple sensorial modalities, e.g. 2D/depth. Circuit blocks can be tuned to adapt the response of the sensor to distinct specifications of the subsequent processing stage, where parallelization and memory management will be also critical. An adequate dataflow organization in FPGAs, GPUs, DSPs, etc. is of utmost importance in order to preserve the goodness of previous performance boosters and further increase the throughput. All in all, this special issue focuses on those aspects of embedded vision systems having an impact on their degree of integrated intelligence.

Gaussian Pyramid: Comparative Analysis of Hardware Architectures
F.D.V.R. Oliveira, J.G.R.C. Gomes, J. Fernandez-Berni, R. Carmona-Galan, R. del Rio and A. Rodriguez-Vazquez
Journal Paper · IEEE Transactions on Circuits and Systems I-Regular Papers, vol. 64, no. 9, pp 2308-2321, 2017
abstract      doi      

This paper addresses a comparison of architectures for the hardware implementation of Gaussian image pyramids. Main differences between architectural choices are in the sensor front-end. One side is for architectures consisting of a conventional sensor that delivers digital images and which is followed by digital processors. The other side is for architectures employing a non-conventional sensor with per-pixel embedded preprocessing structures for Gaussian spatial filtering. This later choice belongs to the general category of " artificial retina" sensors which have been for long claimed as potentially advantageous for enhancing throughput and reducing energy consumption of vision systems. These advantages are very important in the internet of things context, where imaging systems are constantly exchanging information. This paper attempts to quantify these potential advantages within a design space in which the degrees of freedom are the number and type of ADCs, single-slope, SAR, cyclic, Sigma Delta, and pipeline, and the number of digital processors. Results show that speed and energy advantages of preprocessing sensors are not granted by default and are only realized through proper architectural design. The methodology presented for the comparison between focal-plane and digital approaches is a useful tool for imager design, allowing for the assessment of focal-plane processing advantages.

TFET-based Well Capacity Adjustment in Active Pixel Sensor for Enhanced High Dynamic Range
J. Fernández-Berni, M. Niemier, X.S. Hu, H. Lu, W. Li, P. Fay, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper · Electronics Letters, vol.53, no. 9, pp 622-624, 2017
abstract      doi      pdf

A tunnel field-effect transistor (TFET)-based pixel circuit for well capacity adjustment that does not require subthreshold operation on the part of the reset transistor is presented. In CMOS, this subthreshold operation leads to temporal noise, distortion and fixed pattern noise, becoming a primary limiting performance factor. In the proposed circuit, the asymmetric conduction associated with TFETs is exploited. This property, arising from the inherent physical structure of the device, provides the selective well adjustments during photo-integration which are demanded for achieving high dynamic range. A GaN-based heterojunction TFET has been designed according to the specific requirements for this application.

Low-Power CMOS Vision Sensor for Gaussian Pyramid Extraction
M. Suárez, V.M. Brea, J. Fernández-Berni, R. Carmona-Galán, D. Cabello and A. Rodríguez-Vázquez
Journal Paper · IEEE Journal of Solid-State Circuits, vol. 52, no. 2, pp 483-495, 2017
abstract      doi      pdf

This paper introduces a CMOS vision sensor chip in a standard 0.18 μm CMOS technology for Gaussian pyramid extraction. The Gaussian pyramid provides computer vision algorithms with scale invariance, which permits having the same response regardless of the distance of the scene to the camera. The chip comprises 176 x 120 photosensors arranged into 88 x 60 processing elements (PEs). The Gaussian pyramid is generated with a double-Euler switched capacitor (SC) network. Every PE comprises four photodiodes, one 8 b single-slope analog-to-digital converter, one correlated double sampling circuit, and four state capacitors with their corresponding switches to implement the double-Euler SC network. Every PE occupies 44 x 44 μm^2. Measurements from the chip are presented to assess the accuracy of the generated Gaussian pyramid for visual tracking applications. Error levels are below 2% full-scale output, thus making the chip feasible for these applications. Also, energy cost is 26.5 nJ/px at 2.64 Mpx/s, thus outperforming conventional solutions of imager plus microprocessor unit.

Special Section Guest Editorial: Advances on Distributed Smart Cameras
J. Fernández-Berni, F. Berry and C. Micheloni
Journal Paper · Journal of Electronic Imaging, vol. 25, no. 4, pp 1-2, 2016
abstract      doi      pdf

This special section was aimed at bringing together the latest contributions to the exciting multidisciplinary field of distributed smart cameras. An open call for papers was issued, with relevant topics ranging from vision chips and dedicated real-time image processing hardware to high-level information processing and smart camera networks. Invitations were also issued for extended versions of selected works from the 2015 edition of the flagship academic event in this field, the International Conference on Distributed Smart Cameras, where the guest editors served as technical program chairs. A total of 20 manuscripts were submitted, out of which 12 papers were finally accepted -60% acceptance rate- after peer review by at least two experts in the field.

Image Sensing Scheme Enabling Fully-Programmable Light Adaptation and Tone Mapping with a Single Exposure
J. Fernández-Berni, F.D.V.R. Oliveira, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper · IEEE Sensors Journal, vol. 16, no. 13, pp. 5121-5122, 2016
abstract      doi      pdf

This letter presents new insights into a high dynamic range (HDR) technique recently reported. We demonstrate that two intertwined photodiodes per pixel can perform tone mapping under unconstrained illumination conditions with a single exposure. Experimental results attained from a prototype chip confirm the proposed theoretical framework. It opens the door to the realization of imagers providing HDR images free of artifacts without requiring any digital post-processing at all.

Single-Exposure HDR Technique Based on Tunable Balance Between Local and Global Adaptation
J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper · IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 63, no. 5, pp. 488-492, 2016
abstract      doi      pdf

This brief describes a high-dynamic-range technique that compresses wide ranges of illuminations into the available signal range with a single exposure. An online analysis of the image histogram provides the sensor with the necessary feedback to dynamically accommodate changing illumination conditions. This adaptation is accomplished by properly weighing the influence of local and global illuminations on each pixel response. The main advantages of this technique with respect to similar approaches previously reported are as follows: 1) standard active-pixel-sensor circuitry can be used to render the pixel values and 2) the resulting compressed image representation is ready either for readout or for early vision processing at the very focal plane without requiring any additional peripheral circuit block. Experimental results from a prototype smart image sensor achieving a dynamic range of 102 dB are presented.

Bottom-up performance analysis of focal-plane mixed-signal hardware for Viola-Jones early vision tasks
J. Fernández-Berni, R. Carmona-Galán, R. del Río and A. Rodríguez-Vázquez
Journal Paper · International Journal of Circuit Theory and Applications, vol. 43, no. 8, pp 1063-1079, 2015
abstract      doi      pdf

Focal-plane mixed-signal arrays have traditionally been designed according to the general claim that moderate accuracy in processing is affordable. The performance of their circuitry has been analyzed in these terms without a comprehensive study of the ultimate consequences of such moderate accuracy. In this paper, for the first time to the best of our knowledge, we do carry out this study. We move expectable performance of mixed-signal image processing hardware directly into the vision algorithm making use of it. This permits to close a wider design loop, enabling a more aggressive design of this kind of hardware provided that the algorithm, at the highest level -semantic interpretation of the scene-, can afford it. Thus, we present a thorough analysis of the non-idealities associated with the implementation of a QVGA array tailored for the distinctive characteristics of the Viola-Jones processing framework. The resulting deviation models are then introduced in the processing flow of this framework provided by the OpenCV library. We have found, contrary to what could be expected, that these deviations do not necessarily degrade the performance of the Viola-Jones algorithm. They could be even beneficial for certain high-level specifications. Additionally, we demonstrate the architectural advantages of our approach: exploitation of focal-plane distributed memory and ultra-low-power operation.

High dynamic range adaptation for ROI tracking based on reconfigurable concurrent dual-sensing
J. Fernández-Berni, R. Carmona-Galán, R. del Río and A. Rodríguez-Vázquez
Journal Paper · Electronics Letters, vol. 50, no. 24, pp 1832-1834, 2014
abstract      doi      pdf

A single-exposure technique to extend the dynamic range of vision sensors is presented. It is particularly suitable for vision algorithms requiring region-of-interest (ROI) tracking under varying illumination conditions. The operation is supported by two intertwined photodiodes at pixel level and two digital registers at the periphery of the pixel matrix. These registers divide the focal plane into independent regions within which automatic concurrent adjustment of the integration time takes place for each frame. At pixel level, one of the photodiodes senses the pixel value itself, whereas the other, in collaboration with its counterparts in every prescribed ROI, senses the mean illumination of that specific ROI. An additional circuitry interconnecting both photodiodes asynchronously determines the integration period for each ROI according to its mean illumination. The experimental results for a quarter video graphics array prototype CMOS vision sensor are reported.

Focal-plane sensing-processing: A power-efficient approach for the implementation of privacy-aware networked visual sensors
J. Fernandez-Berni, R. Carmona-Galan, R. del Rio, R. Kleihorst, W. Philips and A. Rodriguez-Vazquez
Journal Paper · Sensors, vol. 14, no. 8, pp. 15203-15226, 2014
abstract      doi      pdf

The capture, processing and distribution of visual information is one of the major challenges for the paradigm of the Internet of Things. Privacy emerges as a fundamental barrier to overcome. The idea of networked image sensors pervasively collecting data generates social rejection in the face of sensitive information being tampered by hackers or misused by legitimate users. Power consumption also constitutes a crucial aspect. Images contain a massive amount of data to be processed under strict timing requirements, demanding high-performance vision systems. In this paper, we describe a hardware-based strategy to concurrently address these two key issues. By conveying processing capabilities to the focal plane in addition to sensing, we can implement privacy protection measures just at the point where sensitive data are generated. Furthermore, such measures can be tailored for efficiently reducing the computational load of subsequent processing stages. As a proof of concept, a full-custom QVGA vision sensor chip is presented. It incorporates a mixed-signal focal-plane sensing-processing array providing programmable pixelation of multiple image regions in parallel. In addition to this functionality, the sensor exploits reconfigurability to implement other processing primitives, namely block-wise dynamic range adaptation, integral image computation and multi-resolution filtering. The proposed circuitry is also suitable to build a granular space, becoming the raw material for subsequent feature extraction and recognition of categorized objects.

Special Issue on Advances in sensing and communication circuits (ICECS 2012)
A. Rodríguez-Vázquez, J. Fernández-Berni and J.M. de la Rosa
Journal Paper · Analog Integrated Circuits and Signal Processing , vol. 77, no. 3, pp 315-317, 2013
abstract      doi      

Abstract not available

A hierarchical vision processing architecture oriented to 3D integration of smart camera chips
R. Carmona-Galán, Á. Zarándy, C. Rekeczky, P. Földesy, A. Rodríguez-Pérez, C. Domínguez-Matas, J. Fernández-Berni, G. Liñán-Cembrano, B. Pérez-Verdú, Z. Kárász, M. Suárez-Cambre, V. Brea-Sánchez, T. Roska, Á. Rodríguez-Vázquez
Journal Paper · Journal of Systems Architecture, vol. 59, no. 10 part A, pp 908-919, 2013
abstract      doi      pdf

This paper introduces a vision processing architecture that is directly mappable on a 3D chip integration technology. Due to the aggregated nature of the information contained in the visual stimulus, adapted architectures are more efficient than conventional processing schemes. Given the relatively minor importance of the value of an isolated pixel, converting every one of them to digital prior to any processing is inefficient. Instead of this, our system relies on focal-plane image filtering and key point detection for feature extraction. The originally large amount of data representing the image is now reduced to a smaller number of abstracted entities, simplifying the operation of the subsequent digital processor. There are certain limitations to the implementation of such hierarchical scheme. The incorporation of processing elements close to the photo-sensing devices in a planar technology has a negative influence in the fill factor, pixel pitch and image size. It therefore affects the sensitivity and spatial resolution of the image sensor. A fundamental tradeoff needs to be solved. The larger the amount of processing conveyed to the sensor plane, the larger the pixel pitch. On the contrary, using a smaller pixel pitch sends more processing circuitry to the periphery of the sensor and tightens the data bottleneck between the sensor plane and the memory plane. 3D integration technologies with a high density of through-silicon-vias can help overcome these limitations. Vertical integration of the sensor plane and the processing and memory planes with a fully parallel connection eliminates data bottlenecks without compromising fill factor and pixel pitch. A case study is presented: a smart vision chip designed on a 3D integration technology provided by MIT Lincoln Labs, whose base process is 0.15 μm FD-SOI. Simulation results advance performance improvements with respect to the state-of-the-art in smart vision chips.

Ultralow-power processing array for image enhancement and edge detection
J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper · IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 59, no. 11, pp 751-755, 2012
abstract      doi      pdf

This paper presents a massively parallel processing array designed for the 0.13-μm 1.5-V standard CMOS base process of a commercial 3-D through-silicon via stack. The array, which will constitute one of the fundamental blocks of a smart CMOS imager currently under design, implements isotropic Gaussian filtering by means of a MOS-based RC network. Alternatively, this filtering can be turned into anisotropic by a very simple voltage comparator between neighboring nodes whose output controls the gate of the elementary MOS resistor. Anisotropic diffusion enables image enhancement by removing noise and small local variations while preserving edges. A binary edge image can also be attained by combining the output of the voltage comparators. In addition to these processing capabilities, the simulations have confirmed the robustness of the array against process variations and mismatch. The power consumption extrapolated for VGA-resolution array processing images at 30 fps is 570 μW.

Early forest fire detection by vision-enabled wireless sensor networks
J. Fernández-Berni, R. Carmona-Galán, J.F. Martínez-Carmona and A. Rodríguez-Vázquez
Journal Paper · International Journal of Wildland Fire, vol. 21, no. 8, pp 938-949, 2012
abstract      doi      pdf

Wireless sensor networks constitute a powerful technology particularly suitable for environmental monitoring. With regard to wildfires, they enable low-cost fine-grained surveillance of hazardous locations like wildland-urban interfaces. This paper presents work developed during the last 4 years targeting a vision-enabled wireless sensor network node for the reliable, early on-site detection of forest fires. The tasks carried out ranged from devising a robust vision algorithm for smoke detection to the design and physical implementation of a power-efficient smart imager tailored to the characteristics of such an algorithm. By integrating this smart imager with a commercial wireless platform, we endowed the resulting system with vision capabilities and radio communication. Numerous tests were arranged in different natural scenarios in order to progressively tune all the parameters involved in the autonomous operation of this prototype node. The last test carried out, involving the prescribed burning of a 9520-m shrub plot, confirmed the high degree of reliability of our approach in terms of both successful early detection and a very low false-alarm rate. Journal compilation.

CMOS-3D Smart Imager Architectures for Feature Detection
V.M. Brea, J. Fernández-Berni, R. Carmona-Galán, G. Liñán, D. Cabello and A. Rodríguez-Vázquez
Journal Paper · IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 2, no. 4, pp 723-736, 2012
abstract      doi      pdf

This paper reports a multi-layered smart image sensor architecture for feature extraction based on detection of interest points. The architecture is conceived for 3-D integrated circuit technologies consisting of two layers (tiers) plus memory. The top tier includes sensing and processing circuitry aimed to perform Gaussian filtering and generate Gaussian pyramids in fully concurrent way. The circuitry in this tier operates in mixed-signal domain. It embeds in-pixel correlated double sampling, a switched-capacitor network for Gaussian pyramid generation, analog memories and a comparator for in-pixel analog-to-digital conversion. This tier can be further split into two for improved resolution; one containing the sensors and another containing a capacitor per sensor plus the mixed-signal processing circuitry. Regarding the bottom tier, it embeds digital circuitry entitled for the calculation of Harris, Hessian, and difference-of-Gaussian detectors. The overall system can hence be configured by the user to detect interest points by using the algorithm out of these three better suited to practical applications. The paper describes the different kind of algorithms featured and the circuitry employed at top and bottom tiers. The Gaussian pyramid is implemented with a switched-capacitor network in less than 50 μs, outperforming more conventional solutions.

All-MOS implementation of RC networks for time-controlled Gaussian spatial filtering
J. Fernández-Berni and R. Carmona-Galán
Journal Paper · International Journal of Circuit Theory and Applications,vol. 40, no. 8, pp 859-876, 2012
abstract      doi      pdf

This paper addresses the design and VLSI implementation of MOS-based RC networks capable of performing time-controlled Gaussian filtering. In these networks, all the resistors are substituted one by one by a single MOS transistor biased in the ohmic region. The design of this elementary transistor is carefully realized according to the value of the ideal resistor to be emulated. For a prescribed signal range, the MOSFET in triode region delivers an interval of instantaneous resistance values. We demonstrate that, for the elementary 2-node network, establishing the design equation at a particular point within this interval guarantees minimum error. This equation is then corroborated for networks of arbitrary size by analyzing them from a stochastic point of view. Following the design methodology proposed, the error committed by an MOS-based grid when compared with its equivalent ideal RC network is, despite the intrinsic nonlinearities of the transistors, below 1% even under mismatch conditions of 10%. In terms of image processing, this error hardly affects the outcome, which is perceptually equivalent to that of the ideal network. These results, extracted from simulation, are verified in a prototype vision chip with QCIF resolution manufactured in the AMS 0.35Âμm CMOS-OPTO process. This prototype incorporates a focal-plane MOS-based RC network that performs fully programmable Gaussian filtering. Copyright © 2011 John Wiley & Sons, Ltd. This paper addresses the design and VLSI implementation of all-MOS RC networks capable of performing timecontrolled Gaussian filtering. Following the design methodology proposed, the error committed by a MOS-based grid when compared to its equivalent ideal RC networks is, despite the intrinsic nonlinearities of the transistors, below 1% even under mismatch conditions of 10%. These results, extracted from simulation, are verified in a prototype vision chip with QCIF resolution manufactured in the AMS 0.35Âμm CMOS-OPTO process. Copyright © 2011 John Wiley & Sons, Ltd. Copyright © 2011 John Wiley & Sons, Ltd.

FLIP-Q: a QCIF resolution focal-plane array for low-power image processing
J. Fernández-Berni, R. Carmona-Galán and L. Carranza-González
Journal Paper · IEEE Journal of Solid-State Circuits, vol. 46,  no. 3, pp 669-680, 2011
abstract      doi      pdf

This paper reports a 176x144-pixel smart image sensor designed and fabricated in a 0.35 mu m CMOS-OPTO process. The chip implements a massively parallel focal-plane processing array which can output different simplified representations of the scene at very low power. The array is composed of pixel-level processing elements which carry out analog image processing concurrently with photosensing. These processing elements can be grouped into fully-programmable rectangular-shape areas by loading the appropriate interconnection patterns into the registers at the edge of the array. The targeted processing can be thus performed block-wise. Readout is done pixel-by-pixel in a random access fashion. On-chip 8b ADC is provided. The image processing primitives implemented by the chip, experimentally tested and fully functional, are scale space and Gaussian pyramid generation, fully-programmable multiresolution scene representation-including foveation-and block-wise energy-based scene representation. The power consumption associated to the capture, processing and A/D conversion of an image flow at 30 fps, with full-frame processing but reduced frame size output, ranges from 2.7 mW to 5.6 mW, depending on the operation to be performed.

On the implementation of linear diffusion in transconductance-based cellular nonlinear networks
J. Fernández-Berni and R. Carmona-Galán
Journal Paper · International Journal of Circuit Theory and Applications, vol. 37, no. 4, pp 543-567, 2009
abstract      doi      

In theory, cellular nonlinear networks (CNN) are well capable of implementing discrete-space linear diffusion by means of the appropriate templates. In practice, good results have not been demonstrated with transconductance-based circuits. In this paper, we prove that inherent mismatch to very large scale integration implementation is the reason. Although previous works consider that the small perturbations of the network parameters lead to small deviations from the ideal behavior, we consider that this is over optimistic. When interactions between nodes are supported by unidirectional building blocks, originally balanced current paths are realized by mismatched elements. In the case of linear diffusion, the singular location of natural frequencies of the system implies that a small perturbation of balanced current paths renders qualitatively different network dynamics. We analyze and compare a set of linear templates performing unconstrained and constrained linear diffusion in transconductance-based CNN hardware. Several numerical examples are also presented to visualize the consequences of mismatch on the processing. Finally, in order to emphasize the importance of having balanced current paths, we tested the influence of mismatch in a grid built with MOS transistors. In spite of their nonlinearity, the resulting network is much more robust to mismatch. This last result coincides with previous studies. Copyright 2008 John Wiley & Sons, Ltd.

Conferences


Accurate Face Recognition on Highly Compressed Samples
A. Khan, J. Fernández-Berni and R. Carmona-Galán
Conference · International Conference on Signal Image Technology and Internet Based Systems SITIS 2022
abstract     

Compressive sensing is an emerging field for lowdimensional data acquisition. Samples are acquired in the compressed domain and utilized for signal reconstruction or as input features for a classifier. In this work, hardware-aware face recognition using compressed samples was investigated. A linear support vector machine (SVM) classifier was exploited with compressed samples as input features; Faces can be reliably recognized with high average accuracy (up to 99%). To assess the robustness of the proposed scheme, three image datasets covering different facial and illumination conditions were analyzed. Random (binary) and structured (Haar-transform-based) measurement matrices were employed for generating compressed samples. For one of the datasets, Extended Yale B, and using a random binary measurement matrix, the proposed scheme achieved 82% accuracy from as few as 15 compressed samples, which means a 1/20480 sensing ratio. Accuracy and compression are also remarkably high with respect to the state-of-the-art for the other two datasets.

An Architecture for On-Chip Face Recognition in a Compressive Image Sensor
A. Khan, J. Fernandez-Berni and R. Carmona-Galan
Conference · IEEE International System-On-Chip Conference SOCC 2022
abstract     

Compressive sensing has been widely explored for image reconstruction; however, compressed samples themselves contain relevant signal information and, hence, could be exploited for inference purposes. Many previous studies investigated image recognition on compressed samples, but few of them considered on-chip realization. In this study, an architecture for face recognition that exploits compressed samples was investigated. We found that by using a linear support-vector-machine (SVM) classifier with such samples as input, faces can be recognized with higher performance than in previous works for the level of compression expected in a system-on-chip (SoC) implementation. To compare our results with those of existing works, a figure of merit is proposed. Three image datasets were analyzed to cover diverse aspects, such as variation in illumination, different poses, aging effect, and changing backgrounds. The compression scheme shows robustness under this variety of input signals. A pseudo-diagonal measurement matrix and an architecture suited for in-sensor on-chip implementation are proposed. The resulting inference framework is suitable for an SoC implementation encompassing an image sensor to perform CS acquisition and a DSP to run the SVM.

Visual and Acoustic Inference on Low-Cost IoT Systems
J. Fernández-Berni
Conference · Midterm Summer School MENELAOS 2021
abstract     

The irruption of deep learning (DL) into the fields of artificialvision and audio recognition in the last few years has led to a paradigmchange with respect to classical hand-crafted algorithms. These days,accurate but computationally heavy neural networks are ubiquitouslyemployed. The demanding processing and storage requirements of suchnetworks come into conflict with the limited hardware resources availablein IoT devices. Meanwhile, a great deal of technological components at bothhardware and software levels have been released to support DL-baseddeployments. Each of these components claims specific advantages interms of performance, system integration, energy consumption, etc. Thistalk will provide insight to navigate this complex scenario in an effectivemanner. Concepts, figures of merit, procedures, and methodologies to assiston the selection of components for inference on edge devices will bepresented. Practical examples in the application framework of remoteanimal monitoring will be described. In short, a variety of theoretical andpractical guidelines will be set out in this presentation about theimplementation of visual and acoustic inference on cyber-physical systemswith scarce computational and memory resources.

Pixel Circuitry for HDR and Tone Mapping vs. Digital Tone-Mapping Operators
J. Fernández-Berni
Conference · Huawei Workshop on Future Technologies for ISP 2021
abstract     

Abstract not available

High-Level Inference On-Chip Enabled by Compressed Sensing
A. Khan, J. Fernández-Berni and R. Carmona-Galán
Conference · Conference on Design of Circuits and Integrated Systems DCIS 2021
abstract     

Compressed sensing (CS) proposes an alternative approach for image acquisition. The traditional method implies acquiring an image and then compressing it. In CS, the image is directly acquired in the compressed domain. Learning can be performed in the compressed domain, known as compressed learning. When it comes to conducting inference on a CS-based image, compressed learning eliminates the need of reconstructing the compressively acquired image. Thus, in this work, compressed learning is suggested for high-level inference. Our ultimate goal is to implement a CMOS smart sensor-processor chip exploiting CS for on-chip inference based on the hypotheses that working in the compressed domain allows implementing high-level inference under severe restrictions on computational and power resources.

VersaTile Convolutional Neural Network Mapping on FPGAs
A. Muñío-Gracia, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2020
abstract      pdf

Convolutional Neural Networks (ConvNets) are directed acyclic graphs with node transitions determined by a set of configuration parameters. In this paper, we describe a dynamically configurable hardware architecture that enables data allocation strategy adjustment according to ConvNets layer characteristics. The proposed flexible scheduling solution allows the accelerator design to be portable across various scenarios of computation and memory resources availability. For instance, FPGA block-RAM resources can be properly balanced for optimization of data distribution and minimization of off-chip memory accesses. We explore the selection of tailored scheduling policies that translate into efficient on-chip data reuse and hence lower energy consumption. The system can autonomously adapt its behavior with no need of platform reconfiguration nor user supervision. Experimental results are presented and compared with state-of-the-art accelerators.

Demo: CNN Performance Prediction on a CPU-based Edge Platform
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez Vázquez
Conference · International Conference on Sustainable Development in Civil Engineering ICSDC 2019
abstract     

The implementation of algorithms based on Dee p Learning at edge visual systems is currently a challenge. In addition to accuracy, the network architecture also has an impact on inference performance in terms of throughput and power consumption. This demo showcases per layer inference performance of various convolut ional neural networks running at a low cost edge platform . Furthermore, a n empirical model is applied to predict processing time and power consumption prior to actually running the networks A comparison between the prediction from our model and the actual inference performance is displayed in real time.

PhD Forum: Impact of CNNs Pooling Layer Implementation on FPGAs Accelerator Design
A. Muñío-Gracia, J. Fernández-Berni, R. Carmona-Galán and Á. Rodríguez-Vázquez
Conference · International Conference on Distributed Smart Cameras ICSDC 2019
abstract     

Convolutional Neural Networks have demonstrated their competence in extracting information from data, especially in the field of computer vision. Their computational complexity prompts for hardware acceleration. The challenge in the design of hardware accelerators for CNNs is providing a sustained throughput with low power consumption, for what FPGAs have captured community attention. In CNNs pooling layers are introduced to reduce model spatial dimensions. This work explores the influence of pooling layers modification in some state-of-the-art CNNs, namely AlexNet and SqueezeNet. The objective is to optimize hardware resources utilization without negative impact on inference accuracy.

Towards a Simplified Procedure for CNN Performance Prediction on Embedded Platforms
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2019
abstract     

Vision is arguably the technical field benefiting the most from the renaissance of artificial intelligence in the last few years. In particular, the convergence of massive datasets for training, boosted computational power, and enhanced machine learning techniques has given rise to highly accurate vision algorithms -even outperforming humans in certain tasks- based on convolutional neural networks (CNNs). The potential of these algorithms has attracted attention from many parties, both in academia and industry, spurring the development of a myriad of hardware platforms and software frameworks. The challenge now is how to efficiently leverage and integrate this variety of components in practical realizations, taking also into account that CNN models keep evolving at a rapid pace. With this scenario in mind, we have been working on a simplified procedure to predict the performance of CNNs running on embedded platforms in terms of throughput and power consumption. The objective is to facilitate the evaluation of the aforementioned components and CNN models prior to actually implementing them, thereby speeding up the deployment of optimal solutions. In this talk, we will describe key aspects of the proposed procedure. Specifically, we will elaborate on SweepNet, a deep neural network tailored for meaningful per-layer characterization. The performance models extracted from SweepNet for a hardware platform allow to accurately predict layer by layer the execution time and energy consumption of any other CNN running on that platform.

On the Balanced Allocation of Convolutional Neural Network Models on FPGAs
A. Muñío-Gracia, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2019
abstract     

Deep Learning (DL) algorithms have demonstrated their competence in accurately extracting information from data, especially in the field of computer vision. DL has emerged as an end-to-end approach based on learned multi-level scene representations. A number of open-source frameworks have been created to describe convolutional neural network (CNN) models -a class of the deep neural networks (DNNs) that support DL. Their computational complexity prompts for hardware acceleration. The challenge in the design of hardware accelerators for CNNs is providing a sustained throughput with low power consumption. In order to test our architectural proposals, we will be employing FPGAs. They are reconfigurable, efficient, and have adjustable precision. FPGAs permit architectural exploration with shorter development time and lower cost than ASICs. This work introduces an scalable, frameworkagnostic, architecture whose behavior self-adapts to the selected CNN configuration. A design space analysis is performed for some state-of-the-art CNNs, namely VGG-16, Tiny DarkNet, and SqueezeNet. The objective is a balanced allocation of resources. For this, tiling parameterization will be optimized attending to decisive performance criteria such as the number of memory accesses, data movement policy and throughput.

On the Correlation of CNN Performance and Hardware Metrics for Visual Inference on a Low-Cost CPU-based Platform
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · International Conference on Systems, Signals and Image Processing IWSSIP 2019
abstract      pdf

While providing the same functionality, the various Deep Learning software frameworks available these days do not provide similar performance when running the same network model on a particular hardware platform. On the contrary, we show that the different coding techniques and underlying acceleration libraries have a great impact on the instantaneous throughput and CPU utilization when carrying out the same inference with Caffe, OpenCV, TensorFlow and Caffe2 on an ARM Cortex-A53 multi-core processor. Direct modelling of this dissimilar performance is not practical, mainly because of the complexity and rapid evolution of the toolchains. Alternatively, we examine how the hardware resources are distinctly exploited by the frameworks. We demonstrate that there is a strong correlation between inference performance - including power consumption - and critical parameters associated with memory usage and instruction flow control. This identified correlation is a preliminary step for the development of a simple empirical model. The objective is to facilitate selection and further performance tuning among the ever-growing zoo of deep neural networks and frameworks, as well as the exploration of new network architectures.

Gaussian Pyramid: Comparative Analysis of Hardware Architectures
F.D.V.R. Oliveira, J.G.R.C. Gomes, J. Fernandez-Berni, R. Carmona-Galan, R. del Rio and A. Rodriguez-Vazquez
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2018
abstract     

A comparison of architectures for hardware implementation of Gaussian image pyramids is addressed. Architectures consisting of a conventional sensor followed by digital processors are compared to architectures employing per-pixel embedded pre-processing structures. The later is potentially advantageous for enhancing throughput and reducing energy consumption, important features in the IoT context. These advantages are quantified considering different numbers of digital processors and ADCs, and different ADCs types. Results show that the advantages of pre-processing sensors are not granted by default, requiring proper architectural design. The methodology presented for comparing focal-plane and digital approaches allows for the assessment of focal-plane processing advantages.

Results of 'iCaVeats', a project on the integration of architectures and components for embedded vision
R. Carmona-Galán, J. Fernández-Berni, A. Rodríguez-Vázquez, P. López-Martínez, V.M. Brea-Sánchez, D. Cabello-Ferrer, G. Domenech-Asensi, R. Ruiz-Merino and J. Zapata-Pérez
Conference · International Conference on Distributed Smart Cameras ICDSC 2018
abstract      pdf

iCaveats is a Project on the integration of components and architectures for embedded vision in transport and security applications. A compact and efficient implementation of autonomous vision systems is difficult to be accomplished by using the conventional image processing chain. In this project we have targeted alternative approaches, that exploit the inherent parallelism in the visual stimulus, and hierarchical multilevel optimization. A set of demos showcase the advances at sensor level, in adapted architectures for signal processing and in power management and energy harvesting.

On the characterization of light sources irradiation profiles with an HDR image sensor
J.A. Leñero-Bardallo, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · International Conference on Distributed Smart Cameras ICDSC 2018
abstract     

We demonstrate how light emissions of very bright light sources can be rendered with an HDR image sensor with linear operation. We showcase the device usefulness to study transient variations of very high illumination levels and to determine the irradiance profile of light sources. The sensor can track transient illumination changes at video rates, preserving details of darker regions within the visual scene.

On-The-Fly Deployment of Deep Neural Networks on Heterogeneous Hardware in a Low-Cost Smart Camera
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galan and A. Rodríguez-Vázquez
Conference · International Conference on Distributed Smart Cameras ICDSC 2018
abstract     

This demo showcases a low-cost smart camera where different hardware configurations can be selected to perform image recognition on deep neural networks. Both the hardware configuration and the network model can be changed any time on the fly. Up to 24 hardware-model combinations are possible, enabling dynamic reconfiguration according to prescribed application requirements.

Optimum Network/Framework Selection from High-Level Specifications in Embedded Deep Learning Vision Applications
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · Advanced Concepts for Intelligent Vision Systems ACIVS 2018
abstract     

This paper benchmarks 16 combinations of popular Deep Neural Networks and Deep Learning frameworks on an embedded platform. A Figure of Merit based on high-level specifications is introduced. By sweeping the relative weight of accuracy, throughput and power consumption on global performance, we demonstrate that only a reduced set of the analyzed combinations must actually be considered for real deployment. We also report the optimum network/framework selection for all possible application scenarios defined in those terms, i.e. weighted balance of the aforementioned parameters. Our approach can be extended to other networks, frameworks and performance parameters, thus supporting system-level design decisions in the ever-changing ecosystem of Deep Learning technology.

Live Demonstration: Low-Power Low-Cost Cyber-Physical System for Bird Monitoring
A. García-Rodríguez, R. Rodríguez-Sakamoto, J. Fernández-Berni, R. del Río, J. Marín, M. Baena, J. Bustamante, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2018
abstract      pdf

This live demonstration showcases a cyber-physical system tailored for inexpensive remote bird monitoring. A comprehensive analysis of the application requirements along with a tight system integration have given rise to a smart autonomous nest-box ready for deployment. This nest-box includes radiofrequency identification (RFID), a weighing scale, two temperature sensors, passive infrared devices (PIR), massive data storage and internet connection via mobile infrastructure. It is powered through a solar panel. The bill of materials has been diminished 77% with respect to the previous version of the nest-box whereas the power consumption has been reduced 84%.

Color Tone-Mapping Circuit for a Focal-Plane Implementation
G.M.S. Nunes, F.D.V.R. Oliveira, J.G. Gomes, A. Petraglia, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2018
abstract      pdf

In this article, we present a review of the driving principles and parameters of a previously reported focal-plane tone-mapping operator. We then extend it in order to include color information processing. The signal processing operations required for handling color images are white balance and demosaicing. Neither white balance nor demosaicing are carried out in the focal plane, in order to avoid increasing circuit size and complexity. Since, in this case, white balance is carried out after tone mapping, multiplication of red and blue channels by constant gains may lead to wrong color results. An alternative approach is proposed, in which different gains are assigned for every red and blue pixel of the matrix. Because of the introduction of color, a modification in the original circuit is proposed, which affects the integration time of red and blue pixels. This modification leads to a reduction in the number of photodiodes required in the pixel array, and hence to a reduction of the sensing circuit area. The results produced by the operator are compared to those obtained from two other digital tone-mapping operators.

Concurrent focal-plane generation of compressed samples from time-encoded pixel values
M. Trevisi, H.C. Bandala, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · Design Automation and Test in Europe DATE 2018
abstract      pdf

Compressive sampling allows wrapping the relevant content of an image in a reduced set of data. It exploits the sparsity of natural images. This principle can be employed to deliver images over a network under a restricted data rate and still receive enough meaningful information. An efficient implementation of this principle lies in the generation of the compressed samples right at the imager. Otherwise, i. e. digitizing the complete image and then composing the compressed samples in the digital plane, the required memory and processing resources can seriously compromise the budget of an autonomous camera node. In this paper we present the design of a pixel architecture that encodes light intensity into time, followed by a global strategy to pseudo-randomly combine pixel values and generate, on-chip and on-line, the compressed samples.

Performance Analysis of Real-Time DNN Inference on Raspberry Pi
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · SPIE Real-Time Image and Video Processing 2018
abstract      pdf

Deep Neural Networks (DNNs) have emerged as the reference processing architecture for the implementation of multiple computer vision tasks. They achieve much higher accuracy than traditional algorithms based on shallow learning. However, it comes at the cost of a substantial increase of computational resources. This constitutes a challenge for embedded vision systems performing edge inference as opposed to cloud processing. In such a demanding scenario, several open-source frameworks have been developed, e.g. Ca e, OpenCV, TensorFlow, Theano, Torch or MXNet. All of these tools enable the deployment of various state-of-the-art DNN models for inference, though each one relies on particular optimization libraries and techniques resulting in di erent performance behavior. In this paper, we present a comparative study of some of these frameworks in terms of power consumption, throughput and precision for some of the most popular Convolutional Neural Networks (CNN) models. The benchmarking system is Raspberry Pi 3 Model B, a low-cost embedded platform with limited resources. We highlight the advantages and limitations associated with the practical use of the analyzed frameworks. Some guidelines are provided for suitable selection of a speci c tool according to prescribed application requirements.

Gaussian Pyramid: Comparative Analysis of Hardware Architectures
F.D.V.R. Oliveira, J.G.R.C. Gomes, J. Fernandez-Berni, R. Carmona-Galan, R. del Rio and A. Rodriguez-Vazquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2017
abstract     

Abstract not avaliable

TCAD Simulation of Electrical Crosstalk in 4T-Active Pixel Sensors
J.M. López-Martínez, R. Carmona-Galán, J. Fernández-Berni and A. Rodríguez-Vázquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2017
abstract     

CMOS image sensors (CIS) are widely used nowadays in consumer electronics as well as in high-end applications. This is mainly due to their advantages regarding low dark current and low noise characteristics of the pinned photodiode (PPD). Much effort has been put into better understanding key electrical properties of PPDs, like full well capacity, photodiode´s capacitance or pinning voltage. Another important source of sensitivity degradation is crosstalk (CTK). It has been assessed for CCDs and some CMOS devices. However, addressing CTK in CMOS 4T-APS pixels at the design phase is not easy, mainly due to the unavailability of CIS technology parameters.an additional problem is the computational cost of TCAD simulation; e.g., a five pixel linear array like the one shown in Fig. 1, already introduce long periods of computing due to the complexity of the structure. Crosstalk occurs when the charge generated by photon incident on a pixel are finally sensed by a neighboring pixel. CTK degrades performance, cutting down spatial resolution, reducing the overall sensitivity, degrading color separation, and increasing image noise. Crosstalk is defined as the percentage of the total charge generated by incident light that is diverted to non-illuminated pixels in the neighborhood. There are two components in CTK. Optical crosstalk is related to illumination, reflection, refraction and scattering of photons in the different layers of the material that cover the photodiode. This generates stray photons that are absorbed in the neighborhood. The second component is electrical, and it involves the diffusion of photo-generated carriers between adjacent devices. The characterization of electrical CTK in 4T-APS can be achieved using TCAD tools. Particularly, the relation between CKT and quantum efficiency (QE) can be explored and linked to the thickness of the epitaxial layer.

In the quest of vision-sensors-on-chip: Pre-processing sensors for data reduction
A. Rodríguez-Vázquez, R. Carmona-Galán, J. Fernández-Berni, V. Brea, J.A. Leñero-Bardallo
Conference · IS&T International Symposium on Electronic Imaging 2017
abstract     

This paper shows that the implementation of vision systems benefits from the usage of sensing front-end chips with embedded pre-processing capabilities -called CVIS. Such embedded pre-processors reduce the number of data to be delivered for ulterior processing. This strategy, which is also adopted by natural vision systems, relaxes system-level requirements regarding data storage and communications and enables highly compact and fast vision systems. The paper includes several proof-o-concept CVIS chips with embedded pre-processing and illustrate their potential advantages.

Demo: Image Sensing Scheme Enabling Fully-Programmable Light Adaptation and Tone Mapping with a Single Exposure
J. Fernández-Berni, F.D.V.R. Oliveira, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · International Conference on Distributed Smart Cameras ICDSC 2016
abstract     

This demo showcases a High Dynamic Range (HDR) technique recently reported. We demonstrate that two intertwined photodiodes per pixel can perform tone mapping under unconstrained illumination conditions with a single exposure. The proposed technique has been implemented on a prototype smart image sensor achieving a dynamic range of 102dB. It opens the door to the realization of smart cameras and vision sensors capable of rendering HDR images free of artifacts without requiring any digital post-processing at all.

Image dynamic range extension by using stacked (unmatched) photodiodes in CMOS
R. Carmona-Galán, J. A. Leñero-Bardallo, J. Fernández-Berni and Á. Rodríguez-Vázquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2016
abstract     

Capturing images containing unevenly illuminated areas within the same frame is very useful in application fields like surveillance, assisted driving, intelligent transportation, or industrial applications with high intra-scene contrast. Without the appropriate dynamic range to allocate these diverse illumination values, obtaining a detailed view of the brightest zones can easily obscure other elements in the scene. In order to increase the image dynamic range within the same frame, different techniques have been developed: using a sensor with a companding scheme, providing the means to avoid saturation, or employing multiple image captures. The problem with multiple captures is that uncorrelation between the different integration times can generate inexistent edges and distort the interpretation of the scene. In order to realize multiple captures in parallel, we need to be simultaneously sensitive to different illumination ranges. CMOS technology offers a variety of devices to capture light in the visible and near infrared range. If a deep-n-well is available, these structures can be stacked so spatial alignment is obtained by construction (Fig. 1a). The conversion gain of the different photodiodes is defined by their capacitance per unit area (Fig. 1b); therefore each of them will render a different voltage for the same light intensity. This discrepancy in the response can be exploited to extract information from different illumination ranges simultaneously. In this way, light can be sensed in parallel with different conversion gains and the resulting output voltages can then be digitized and combined into a single digital word with a larger number of bits. This mechanism for dynamic range extension does not depend on the difference of exposure times, so artifacts related with unmatched dynamics in the sensor and the scene can be avoided.

Pixel-wise parameter adaptation for single-exposure extension of the image dynamic range
R. Carmona-Galán, J.A. Leñero-Bardallo, J. Fernández-Berni and A. Rodríguez-Vázquez
Conference · International Conference on Distributed Smart Cameras ICDSC 2016
abstract      pdf

High dynamic range imaging is central in application fields like surveillance, intelligent transportation and advanced driving assistance systems. In some scenarios, methods for dynamic range extension based on multiple captures have shown limitations in apprehending the dynamics of the scene. Artifacts appear that can put at risk the correct segmentation of objects in the image. We have developed several techniques for the on-chip implementation of single-exposure extension of the dynamic range. We work on the upper extreme of the range, i. e. administering the available full-well capacity. Parameters are adapted pixel-wise in order to accommodate a high intra-scene range of illuminations.

Experimental Evidence of Power Efficiency due to Architecture in Cellular Processor Array Chips
R. Carmona-Galán, J. Fernández Berni and A. Rodríguez-Vázquez
Conference · International Workshop on Cellular Nanoscale Networks and their Applications CNNA 2016
abstract      pdf

Speeding up algorithm execution can be achieved by increasing the number of processing cores working in parallel. Of course, this speedup is limited by the degree to which the algorithm can be parallelized. Equivalently, by lowering the operating frequency of the elementary processors, the algorithm can be realized in the same amount of time but with measurable power savings. An additional result of parallelization is that using a larger number of processors results in a more efficient implementation in terms of GOPS/W. We have found experimental evidence for this in the study of massively parallel array processors, mainly dedicated to image processing. Their distributed architecture reduces the energy overhead dedicated to data handling, thus resulting in a power efficient implementation.

Hardware-Aware Performance Evaluation for the Co-Design of Image Sensors and Vision Algorithms
C. Villegas-Pachón, R. Carmona-Galán, J. Fernández-Berni and A. Rodríguez-Vázquez
Conference · Int. Conf. on Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit Design SMACD 2016
abstract      pdf

The top-down approach to system design allows obtaining separate specifications for each subsystem. In the case of vision systems, this means propagating system-level specifications down to particular specifications for e. g. the image sensor, the image processor, etc. This permits to adopt different design strategies for each one of them, as long as they meet their own specifications. This approach can lead to over-design, which is not always affordable. Conversely, if higher-level specifications are too tight, they can lead to impossible specifications at the lower levels. This is certainly the case for embedded vision systems in which high-performance needs to be paired with a very restricted power budget. In order to explore alternative architectures, we need tools that allow for simultaneous optimization of different blocks. However, the link between low-level non-idealities and high-level performance is missing. CAD tools for the design and verification of analog and mixed-signal integrated circuits are not well suited for the simulation of higher-level functionalities. Our approach is to extract relevant data from circuit-level simulation and to build an OpenCV model to be employed in the design of the algorithm. The utility of this approach is illustrated by the evaluation of the effect of column-wise and pixel-wise FPN at the sensor on the performance of Viola-Jones face detection.

Live Demonstration: Single-Exposure HDR Image Acquisition Based on Tunable Balance Between Local and Global Adaptation
J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · IEEE International Symposium on Circuits and Systems, ISCAS 2016
abstract      pdf

This live demonstration showcases a high dynamic range technique that compresses wide ranges of illuminations into the available signal range with a single exposure. In order to accomplish such compression, concurrent sensing-processing takes place at the focal plane, weighing the influence of local and global illumination on each pixel response during the image capture. This process is driven by an on-line analysis of the image histogram that also enables the dynamic accommodation of changing illumination conditions. The proposed technique has been implemented on a prototype smart image sensor achieving a dynamic range of 102dB.

Focal-plane scale space generation with a 6T pixel architecture
F. Oliveira, J.G. Gomes, R. Carmona-Galán, J. Fernández-Berni and A. Rodríguez-Vázquez
Conference · IS&T International Symposium on Electronic Imaging 2016
abstract      pdf

Fill factor and focal-plane implementation of instrumental image processing steps, we propose a simple modification in a standard pixel architecture in order to allow for charge redistribution among neighboring pixels. As a result, averaging operations may be performed at the focal plane, and image smoothing based on Gaussian filtering may thus be implemented. By averaging neighboring pixel values, it is also possible to generate intermediate data structures that are required for the computation of Haar-like features. To show that the proposed hardware is suitable for computer vision applications, we present a systemlevel comparison in which the scale-invariant feature transform (SIFT) algorithm is executed twice: first, on data obtained with a classical Gaussian filtering approach, and then on data generated from the proposed approach. Preliminary schematic and extracted layout pixel simulations are also presented.

High-Level Performance Evaluation of Object Detection Based on Massively Parallel Focal-Plane Acceleration Requiring Minimum Pixel Area Overhead
E. Parra-Barrero, J. Fernández-Berni, F.D.V.R. Oliveira, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · International Conference on Computer Vision Theory and Applications VISAPP 2016
abstract      pdf

Smart CMOS image sensors can leverage the inherent data-level parallelism and regular computational flow of early vision by incorporating elementary processors at pixel level. However, it comes at the cost of extra area having a strong impact on the sensor sensitivity, resolution and image quality. In this scenario, the fundamental challenge is to devise new strategies capable of boosting the performance of the targeted vision pipeline while minimally affecting the sensing function itself. Such strategies must also feature enough flexibility to accommodate particular application requirements. From these high-level specifications, we propose a focal-plane processing architecture tailored to speed up object detection via the Viola-Jones algorithm. This architecture is supported by only two extra transistors per pixel and simple peripheral digital circuitry that jointly make up a massively parallel reconfigurable processing lattice. A performance evaluation of the proposed scheme in terms of accuracy and acceleration for face detection is reported.

On the Design of a Sparsifying Dictionary for Compressive Image Feature Extraction
M. Trevisi, R. Carmona-Galán, J. Fernández-Berni and Á. Rodríguez-Vázquez
Conference · IEEE International Conference on Electronics Circuits and Systems ICECS 2015
abstract     

Although there are some works reported on feature extraction from compressed samples, none of them consider the implementation of the feature extractor as a part of the sensor itself. Our approach is to introduce a sparsifying dictionary, feasibly implementable at the focal plane, which describes the image in terms of features. This allows a standard reconstruction algorithm to directly recover the interesting image features, discarding the irrelevant information. In order to validate the approach, we have integrated a Harris-Stephens corner detector into the compressive sampling process. We have evaluated the accuracy of the reconstructed corners compared to applying the detector to a reconstructed image.

CMOS Image Sensor Architecture for Focal Plane Early Vision Processing
F.D.V.R. de Oliveira, J.G.R.C. Gomes, R. Carmona-Galán, J. Fernández-Berni and Á. Rodríguez-Vázquez
Conference · International Conference on Distributed Smart Cameras ICDSC 2015
abstract     

This paper presents a pixel architecture that aims at validating the idea that with a small change in the pixel it is possible to perform important image processing computations at the focal-plane without significantly affecting the fill factor. An overview of two algorithms that may benefit from this new pixel architecture is given, namely the SIFT for object recognition and the Viola-Jones for face detection. A brief discussion of the limitations of the computations performed inside the pixel matrix and the future work is also presented.

Compressive Feature Extraction
M. Trevisi, R. Carmona-Galán, J. Fernández-Berni and Á. Rodríguez-Vázquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2015
abstract     

Compressive sensing (CS) provides an alternative to Nyquist-Shannon sampling when the signal to acquire is known to be sparse or compressible. A sparse signal has a small number of nonzero components compared to its total length. This property can either exist in the sampling domain of the signal or with respect to other basis. Representing a signal in a transform basis involves the choice of a dictionary, a set of elementary signals, used to decompose the signal. When performing analysis of complex data one of the major problems stems from the number of variables involved. Feature extraction is used to reduce the amount of resources required to describe a large set of data. A given feature is often represented by a set of parameters to be evaluated. This set has relevant values only in correspondence of said features. We can consider sets derived this way as being sparse. This sparked the idea to merge a feature extraction algorithm with the compressive sensing theory. To do so we try to adapt one of the most practical CS reconstruction algorithms, the Nesterov algorithm applied to CS (NESTA) to extract the features of one of the simplest corner detection algorithms, the Harris and Stephens algorithm. Our aim is to compare the performance of the combined Harris-NESTA algorithm over the application of a Harris algorithm on a NESTA reconstructed image and to do so we devised a test that takes into account four different parameters.

Assessment of circuit non-idealities' effect on algorithm performance via OpenCV modeling
C. Villegas-Pachón, R. Carmona-Galán and J. Fernández-Berni
Conference · Workshop on the Architecture of Smart Cameras WASC 2015
abstract     

CAD tools for the design and verification of analog and mixed-signal integrated circuits are not well suited for the simulation of higher-level functionalities. The evaluation of the effect of circuit non-idealities on the algorithm performance is precluded for the image sensor design team. For the conventional model in computer vision, in which image capture and processing are completely separated tasks, this does not represent a problem. Chip designers will work for a particular set of specifications, i. e. spatial and temporal resolution, power consumption, etc. At the other end, computer scientists will take care of the algorithm once they receive their pictures fitting to the prescribed specifications. The result of this mindset is an architecture that is theoretically universal, although may not be capable of solving every problem when timing and power requirements are taken into consideration. In those applications fields in which smart camera chips can help overcoming these limitations, a different approach needs to be taken in order to come out with optimal solutions. Let us point out here that, in smart camera architectures, computational efficiency is generally provided by appropriate partition of algorithm tasks, parallelization of heavy loads and using distributed and close-to-sensor resources. Sometimes these actions will require the design of specific circuit blocks and ad-hoc image sensing strategies. All of this needs to be worked out at transistor level, but at the same time, their effect in the overall performance of the algorithm needs to be quickly and accurately evaluated in order to guide the design flow. Our proposal is to make use of the flexibility and versatility of an environment like OpenCV to incorporate hardware non-idealities to the evaluation of the algorithm performance. One of the major attractions of this approach is that computer vision experts will be able to consider lower-level deviations when designing and fine-tuning their vision algorithms without having to develop any expertise in chip design and IC CAD tools.

Live demonstration: Gaussian pyramid extraction with a CMOS vision sensor
M. Suarez, V.M. Brea, J. Fernandez-Berni, R. Carmona-Galan, D. Cabello and A. Rodriguez-Vazquez
Conference · IEEE International Symposium on Circuits and Systems, ISCAS 2015
abstract     

This live demonstration is related to ISCAS track 'Imagers and Vision Processing'. It showcases the Gaussian pyramid with a CMOS vision sensor with a 176 × 120 pixel array in standard 0.18 μm CMOS technology. The sensing elements are 3T-APS with in-pixel ADC and CDS. The Gaussian pyramid is extracted concurrently with a double-Euler switched-capacitor network on the same substrate, giving RMSE errors below 1.2% of FSO. The chip provides a Gaussian pyramid of 3 octaves with 6 scales each with an energy cost of 26.5 nJ/px at 2.64 Mpx/s.

Automatic DR and spatial sampling rate adaptation for secure and privacy-aware ROI tracking based on focal-plane image processing
R. Carmona-Galán, J. Fernández-Berni and Á. Rodríguez-Vázquez
Conference · International Image Sensor Workshop IISW 2015
abstract     

Embedded camera systems for the consumer mobile and wearable application market need to operate in a tight power budget. They need to cope with a vast range of illumination conditions, and at the same time, they need to incorporate enough intelligence to implement security and privacy-protection directives. The incorporation of image signal processing at the focal-plane can help reducing the necessary resources to implement tasks like DR adaptation and privacy-aware ROI tracking. In this paper we present a vision sensor that is able to perform single-exposure HDR imaging and ROI obfuscation on-chip, with the help of a reduced set of focal-plane processing elements.

Real-time single-exposure ROI-driven HDR adaptation based on focal-plane reconfiguration
J. Fernández-Berni, R. Carmona-Galán, R. del Río, R. Kleihorst, W. Philips and Á. Rodríguez-Vázquez
Conference · SPIE Real-Time Image and Video Processing 2015
abstract      pdf

This paper describes a prototype smart imager capable of adjusting the photo-integration time of multiple regions of interest concurrently, automatically and asynchronously with a single exposure period. The operation is supported by two intertwined photo-diodes at pixel level and two digital registers at the periphery of the pixel matrix. These registers divide the focal-plane into independent regions within which automatic concurrent adjustment of the integration time takes place. At pixel level, one of the photo-diodes senses the pixel value itself whereas the other, in collaboration with its counterparts in a particular ROI, senses the mean illumination of that ROI. Additional circuitry interconnecting both photo-diodes enables the asynchronous adjustment of the integration time for each ROI according to this sensed illumination. The sensor can be reconfigured on-the-fly according to the requirements of a vision algorithm.

Using 3-D Technologes for Form Factor Improvement of Low-Power Vision Sensors
J. Fernández-Berni, S. Vargas, J.A. Leñero and B. Pérez-Verdú
Conference · IEEE Latin American Symposium on Circuits and Systems LASCAS 2014
abstract     

While conventional CMOS active pixel sensors embed only the circuitry required for photo-detection, pixel addressing and voltage buffering, smart pixels incorporate also circuitry for data processing, data storage and control of data interchange. This additional circuitry enables data processing be realized concurrently with the acquisition of images which is instrumental to reduce the number of data needed to carry to information contained into images. This way, more efficient vision systems can be built at the cost of larger pixel pitch. Vertically-integrated 3D technologies enable to keep the advnatges of smart pixels while improving the form factor of smart pixels.

Live Demo: Real-time Focal-plane Face Obfuscation through Programmable Pixelation
J. Fernández-Berni, R. Carmona-Galán, R. del Río, J.A. Leñero-Bardallo, R. Kleihorsty, W. Philipsy and Á. Rodríguez-Vázquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2014
abstract     

Privacy concerns are hindering the introduction of smart camera networks in application scenarios like retailing analytics, factories or elderly care. Indeed, there is usually no need of dealing with sensitive data when it comes to carrying out a meaningful visual analysis in these scenarios. Time spent by customers in front of a showcase, trajectories of workers around a manufacturing site or fall detection in a nursing home are three examples where video analytics can be performed without compromising privacy. But still the idea of networked cameras pervasively collecting data generates social rejection in the face of sensitive information being tampered by hackers or misused by legitimate users. New strategies must be developed in order to ensure privacy from the very point where sensitive data are generated: the sensors. Protection measures embedded on-chip at the front-end sensor of each network node significantly reduce the number of trusted system components as well as the impact of potential software flaws. In this demonstration, we present a full-custom QVGA vision sensor that can be reconfigured to implement programmable pixelation of image regions at the focal plane. According to the literature, pixelation provides the best performance in terms of balance between privacy protection and intelligibility of the surveyed scene.

Demo: A prototype vision sensor for real-time focal-plane obfuscation through tunable pixelation
J. Fernández-Berni, R. Carmona-Galán, R. del Río, R. Kleihorst, W. Philips and A. Rodríguez-Vázquez
Conference · IEEE/ACM Int. Conference on Distributed Smart Cameras ICDSC 2014
abstract      pdf

Privacy concerns are hindering the introduction of smart camera networks in prospective application scenarios like retail analytics, factory monitoring or elderly care. The idea of networked cameras pervasively collecting data generates social rejection in the face of sensitive information being tampered by hackers or misused by legitimate users. New strategies must be developed in order to ensure privacy from the very point where sensitive data are generated: the sensors. Protection measures embedded on-chip at the front-end sensor of each network node significantly reduce the number of trusted system components as well as the impact of potential software flaws. In this demonstration, we present a full-custom QVGA vision sensor that can be recongured to implement programmable pixelation of image regions at the focal plane. In particular, we show on-the-fly focal-plane face obfuscation supported by the Viola-Jones frontal face detector provided by OpenCV.

A 26.5 nJ/px 2.64Mpx/s CMOS vision sensor for gaussian pyramid extraction
M. Suárez-Cambre, V. Brea, J. Fernández-Berni, R. Carmona-Galán, D. Cabello and A. Rodríguez-Vázquez
Conference · European Solid-State Circuits Conference ESSCIRC 2014
abstract      pdf

This paper introduces a CMOS vision sensor to extract the Gaussian pyramid with an energy cost of 26.5 nJ/px at 2.64 Mpx/s, thus outperforming conventional solutions employing an imager and a separate digital processor. The chip, manufactured in a 0.18 μm CMOS technology, consists of an arrangement of 88×60 processing elements (PEs) which captures images of 176×120 resolution and performs concurrent parallel processing right at pixel level. The Gaussian pyramid is generated by using a switched-capacitor network. Every PE includes four photodiodes, four MiM capacitors, one 8-bit single-slope ADC and one CDS circuit, occupying 44x44 μm2 . Suitability of the chip is assessed by using metrics pertaining to visual tracking.

Fire detection with a frame-less vision sensor working in the NIR band
J.A. Leñero-Bardallo, J. Fernández-Berni, R. Carmona-Galán, P. Häfliger and Á. Rodríguez-Vázquez
Conference · International Conference on Forest Fire Research ICFFR 2014
abstract      doi      pdf

This paper draws the attention of the community about the capabilities of an emerging generation of bio-inspired vision sensors to be used in fire detection systems. Their principle of operation will be described. Moreover experimental results showing the performance of an event-based vision sensor will be provided. The sensor was intended to monitor flames activity without using optic filters. In this article, we will also extend this preliminary work and explore how its outputs can be processed to detect fire in the environment.

Review of ADCs for imaging
J.A. Leñero-Bardallo, J. Fernández-Berni and Á. Rodríguez-Vázquez
Conference · IS&T International Symposium on Electronic Imaging 2014
abstract      pdf

The aim of this article is to guide image sensors designers to optimize the analog-to-digital conversion of pixel outputs. The most common ADCs topologies for image sensors are presented and discussed. The ADCs specific requirements for these sensors are analyzed and quantified. Finally, we present relevant recent contributions of specific ADCs for image sensors and we compare them using a novel FOM.

A QVGA Vision Sensor with Multi-functional Pixels for Focal-Plane Programmable Obfuscation
J. Fernández-Berni, R. Carmona Galán, R. del Río and Á. Rodríguez-Vázquez
Conference · International Conference on Distributed Smart Cameras ICDSC 2014
abstract      pdf

Privacy awareness constitutes a critical aspect for smart camera networks. An ideal awless protection of sensitive information would boost their application scenarios. However, it is still far from being achieved. Numerous challenges arise at diferent levels, from hardware security to subjective perception. Generally speaking, it can be stated that the closer to the image sensing device the protection measures take place, the higher the privacy and security attainable. Likewise, the integration of heterogeneous camera components becomes simpler since most of them will not require to consider privacy issues. The ultimate objective would be to incorporate complete protection directly into a smart image sensor in such a way that no sensitive data would be delivered off-chip while still permitting the targeted video analytics. This paper presents a 320x240-px prototype vision sensor embedding processing capabilities useful for accomplishing this objective. It is based on recongurable focal-plane sensing-processing that can provide programmable obfuscation. Pixelation of tunable granularity can be applied to multiple image regions in parallel. In addition to this functionality, the sensor exploits reconfigurability to implement other processing primitives, namely block-wise high dynamic range, integral image computation and Gaussian filtering. Its power consumption ranges from 42.6mW for high dynamic range operation to 55.2mW for integral image computation at 30fps. It has been fabricated in a standard 0.18μm CMOS process.

Towards an ultra-low-power low-cost wireless visual sensor node for fine-grain detection of forest fires
J. Fernández-Berni, R. Carmona-Galán, J.A. Leñero-Bardallo, R. Kleihorst and Á. Rodríguez-Vázquez
Conference · International Conference on Forest Fire Research ICFFR 2014
abstract      pdf

Advances in electronics, sensor technologies, embedded hardware and software are boosting the application scenarios of wireless sensor networks. Specifically, the incorporation of visual capabilities into the nodes means a milestone, and a challenge, in terms of the amount of information sensed and processed by these networks. The scarcity of resources -power, processing and memory- imposes strong restrictions on the vision hardware and algorithms suitable for implementation at the nodes. Both, hardware and algorithms must be adapted to the particular characteristics of the targeted application. This permits to achieve the required performance at lower energy and computational cost. We have followed this approach when addressing the detection of forest fires by means of wireless visual sensor networks. From the development of a smoke detection algorithm down to the design of a low-power smart imager, every step along the way has been influenced by the objective of reducing power consumption and computational resources as much as possible. Of course, reliability and robustness against false alarms have also been crucial requirements demanded by this specific application. All in all, we summarize in this paper our experience in this topic. In addition to a prototype vision system based on a full-custom smart imager, we also report results from a vision system based on ultra-low-power low-cost commercial imagers with a resolution of 30x30 pixels. Even for this small number of pixels, we have been able to detect smoke at around 100 meters away without false alarms. For such tiny images, smoke is simply a moving grey stain within a blurry scene, but it features a particular spatio-temporal dynamics. As described in the manuscript, the key point to succeed with so low resolution thus falls on the adequate encoding of that dynamics at algorithm level.

Gaussian pyramid extraction with a CMOS vision sensor
M. Suárez, V.M. Brea, J. Fernández-Berni, R. Carmona-Galán, D. Cabello and A. Rodríguez-Vázquez
Conference · International Workshop on Cellular Nanoscale Networks and their Applications CNNA 2014
abstract      pdf

This paper addresses a CMOS vision sensor with 176x120 pixels in standard 0.18 μm CMOS technology that computes the Gaussian pyramid. The Gaussian pyramid is extracted with a double-Euler switched-capacitor network, giving RMSE errors below 1.2% of full-scale value. The chip provides a Gaussian pyramid of 3 octaves with 6 scales each with an energy cost of 26.5 nJ at 2.64 Mpx/s.

Comparative Analysis of Compressive Sensing Strategies for Smart Compressive Image Sensors
M. Trevisi, R. Carmona-Galán, J. Fernández-Berni and Á. Rodríguez-Vázquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2014
abstract     

Compressive sensing (CS) first appeared eight years ago as a new kind of signal processing theory. Since then, only a few steps have been taken to turn this theory into feasible practice. We have identified two important gaps that stand behind the lag on practical compressive image sensors. The first one is that, technologically speaking, there is not yet any imager based on CS that is able to overrule, at least in resolution/decryption-speed ratio, the capabilities of current standard imagers. The second one is that none of the published reports on compressive image sensors mention how the compressive sensing strategy is passed from the sensor, which is delivering image samples, to the image reconstruction algorithm at the reception side. In order to capture compressed image samples, it is necessary that the imagers implement a compressive strategy, which has the form of a matrix that convolves the original signal. There are two sets of methods that are primarily implemented nowadays to build a compressive strategy, the first one is to pick each element from a random distribution, preferably Gaussian, and the second is to arrange in random order the rows of an incoherent orthobasis matrix, preferably Fourier matrix. The selection of the compressive sensing strategy has an incidence on the physical implementation, for instance it is easier to implement a binary mask on each pixel than to multiply its value by a real number. In order to analyze the effect of this selection in the reconstruction from the samples delivered by the compressive image sensor, we have simulated the process with a MATLAB test bench and compared the reconstruction times and the RMSE vs. the number of samples delivered of three different sensing strategies using a 64×64 image of Lena as test image and a Total Variation NESTA algorithm as reconstruction algorithm.

Parallel Processing Architectures and Power Efficiency in Smart Camera Chips
R. Carmona-Galán, J. Fernández-Berni, M. Trevisi and A. Rodríguez-Vázquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2014
abstract     

Because of the massive amount of data, image and video processing represents a huge computational demand. Providing the necessary resources in embedded systems is not an easy task. Providing them on-chip needs a serious reconsideration of the processing architecture. In order to speed up processing, the number of processors/cores operating in parallel can be increased. This is an intuitive result, but there are two drawbacks. First, this speedup is limited by the degree to which the algorithm can be parallelized, what is known as Amdahl's law. Second, the more hardware operating at the same time the higher the power consumption. However, image processing and, in general, the processing visual information that keeps a retinotopic topology, affords an inherent parallelism that can be exploited to a great extent. Furthermore, there is an additional and less intuitive result of parallelization, which is that using a larger number of processors renders a more efficient implementation in terms of GOPS/W. Evidence of this can be found in massively parallel array processors. Their distributed architecture is adapted to the nature of the visual stimulus to the point that the amount of energy dedicated to data transmission and memory operations is largely reduced. The result is a power efficient implementation, perfectly suited for autonomous embedded vision systems working on a restricted power budget. This trend is observed in multi- and many-core processor chips, GPUs and different types of single-instruction multiple-data (SIMD) arrays containing from tens to hundreds of processing elements (PE). In the case of analog array processors, a similar relation is observed despite the fact that the disparity between design techniques, signal representation and the computation of the effective number of OPS advises against any sort of comparison.

Form Factor Improvement of Smart-Pixels for Vision Sensors through 3-D Vertically-Integrated Technologies
A. Rodríguez-Vázquez, R. Carmona-Galán, J. Fernández Berni, S. Vargas, J.A. Leñero, M. Suárez, V. Brea and B. Pérez-Verdú
Conference · IEEE Latin American Symposium on Circuits and Systems LASCAS 2014
abstract      pdf

While conventional CMOS active pixel sensors embed only the circuitry required for photo-detection, pixel addressing and voltage buffering, smart pixels incorporate also circuitry for data processing, data storage and control of data interchange. This additional circuitry enables data processing be realized concurrently with the acquisition of images which is instrumental to reduce the number of data needed to carry to information contained into images. This way, more efficient vision systems can be built at the cost of larger pixel pitch. Vertically-integrated 3D technologies enable to keep the advnatges of smart pixels while improving the form factor of smart pixels.

Smart imaging for power-efficient extraction of Viola-Jones local descriptors
J. Fernández-Berni, R. Carmona-Galán, R. del Río, J.A. Leñero-Bardallo, M. Suárez-Cambre and A. Rodríguez-Vázquez
Conference · IS&T International Symposium on Electronic Imaging 2014
abstract      pdf

In computer vision, local descriptors permit to summarize relevant visual cues through feature vectors. These vectors constitute inputs for trained classifiers which in turn enable diferent high-level vision tasks. While local descriptors certainly alleviate the computation load of subsequent processing stages by preventing them from handling raw images, they still have to deal with individual pixels. Feature vector extraction can thus become a major limitation for conventional embedded vision hardware. In this paper, we present a power-eficicient sensing-processing array conceived to provide the computation of integral images at diferent scales. These images are intermediate representations that speed up feature extraction. In particular, the mixed-signal array operation is tailored for extraction of Haar-like features. These features feed the cascade of classifiers at the core of the Viola-Jones framework. The processing lattice has been designed for the standard UMC 0.18μm 1P6M CMOS process. In addition to integral image computation, the array can be reprogrammed to deliver other early vision tasks: concurrent rectangular area sum, block-wise HDR imaging, Gaussian pyramids and image pre-warping for subsequent reduced kernel filtering.

A 176x120 Pixel CMOS Vision Chip for Gaussian Filtering with Massivelly Parallel CDS and A/D-Conversion
M. Suárez, V.M. Brea, D. Cabello, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · European Conference on Circuit Theory and Design ECCTD 2013
abstract      pdf

This paper conveys a proof-of-concept chip for Gaussian pyramid generation for image feature detectors. Gaussian filtering and image resizing are performed with a switched-capacitor (SC) network. The chip is conceived as the mapping of a CMOS-3D architecture for feature detectors onto a conventional technology, with some functionality removed, and the corresponding area overhead with respect to that of a CMOS-3D architecture, but preserving masivelly parallel Correlated Double Sampling (CDS) and A/D conversion. The chip has been fabricated on a die of 5×5 mm2 with 0.18 μm CMOS technology, achieving an array of 176×120 sensing elements (pixels). The pixels are arranged in Processing Elements (PEs). Every PE comprises four photodiodes, four SC nodes, one CDS circuit, and local circuitry for one ADC. Every PE occupies an area of 44×44 μm2. The chip senses an image and computes the Gaussian pyramid with an average power consumption lower than 75 nW/pixel at 30 frames/s.

An Ultra-Low-Power Voltage-Mode Asynchronous WTA-LTA Circuit
J. Fernández-Berni, R. Carmona-Galán and A. Rodríquez-Vázquez
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2013
abstract      pdf

This paper presents an asynchronous mixed-signal WTA-LTA circuit conceived to carry out local minimum maximum indexing in massively parallel image processing arrays. The hardware is focused on energy-efficient operation. We describe a realization for the standard CMOS base process of a commercial 3-D TSV stack featuring a power consumption of only 20pW per elementary cell at 30fps. The proposed block is also capable of resolving small voltage differences without requiring any external reference. This leads to a hit percentage greater than 90% even when taking into account global process variations and mismatch conditions.

Real-Time Remote Reporting of Motion Analysis with Wi-Flip
J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · Int. Workshop on Cellular Nanoscale Networks and their Applications CNNA 2012
abstract      pdf

This paper describes a real-time application programmed into Wi-FLIP, a wireless smart camera resulting from the integration of FLIP-Q, a prototype mixed-signal focal-plane array processor, and Imote2, a commercial WSN platform. The application consists in scanning the whole scene by sequentially analyzing small regions. Within each region, motion is detected by background subtraction. Subsequently, information related to that motion - intensity and location - is radio-propagated in order to remotely account for it. By aggregating this information along time, a motion map of the scene is built. This map permits to visualize the different activity patterns taking place. It also provides an elaborated representation of the scene for further remote analysis, preventing raw images from being transmitted. In particular, the scene inspected in this demo corresponds to vehicular traffic in a motorway. The remote representation progressively built enables the assessment of the traffic density.

Design of a smart camera SoC in a 3D-IC technology
R. Carmona-Galán, J. Fernández-Berni, S. Vargas-Sierra, G. Liñán-Cembrano, A. Rodríguez-Vázquez, V. Brea-Sánchez, M. Suárez-Cambre and D. Cabello-Ferrer
Conference · Workshop on Architecture of Smart Camera, 2012
abstract      pdf

Conventional digital signal processing architectures introduce data bottlenecks and are inefficient when dealing with multidimensional sensory signals; Architectures adapted to the nature of the stimulus are more efficient in terms of power consumption per operation but¿;Concurrent sensing, processing and memory in planar technologies introduces serious limitations to image resolution and image size via the penalties in fill factor and pixel pitch; 3D integrated circuit technologies with a dense TSV distribution permits eliminating data bottlenecks without degrading image resolution and size.

Power-efficient focal-plane image representation for extraction of enriched Viola-Jones features
J. Fernández-Berni, L. Acasandrei, R. Carmona-Galán, A. Barriga-Barrios and A. Rodríguez-Vázquez
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2012
abstract      pdf

This paper describes the use of a reconfigurable focal-plane processing array in order to achieve an image representation which dramatically reduces the computational load of the Viola-Jones object detection framework. Additionally, such representation provides richer information than the simple sum of pixels within rectangular regions originally defined in this framework. As a result, more elaborated features could be devised to speed up the execution of the subsequent attentional cascade, boosting thus the performance of the whole algorithm. The proposed circuitry has been successfully implemented in a CMOS prototype smart imager. Experimental results are given, demonstrating the suitability of the approach presented to efficiently deliver enriched Viola-Jones features.

Image filtering by reduced kernels exploiting kernel structure and focal-plane averaging
J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · European Conference on Circuit Theory and Design ECCTD 2011
abstract      pdf

Incorporating multi-resolution capabilities into imagers renders additional power saving mechanisms in the subsequent image processing. In this paper, we show how, by exploiting a certain mask structure, 3 × 3 kernels can be reduced to 2 × 2 kernels if charge redistribution is provided at the focal plane of the imaging device. More precisely, by averaging and shifting a half-resolution pixel grid, we will have a pre-processed image, subsampled by a factor of 2 on each dimension, that can be filtered with a mask of a reduced size. Very useful image filtering kernels, like a 3 × 3 Gaussian kernel for image smoothing, or the well-known Sobel operators, fall into this category of reducible kernels. Operating onto the pre-processed image with one of these reduced kernels represents a smaller number of operations per pixel than realizing all the multiply-accumulate operations needed to apply a 3 × 3 kernel. Memory accesses are reduced in the same fraction. Concerning the difficulties of providing this pre-processed image representation, we propose a methodology for obtaining it at a very low power cost. It requires the implementation of user definable image subdivision and subsampling at the focal plane. Experimental results are given, obtained from measurements on a CMOS imager prototype chip incorporating these multi-resolution capabilities. © 2011 IEEE.

Demo: Real-time remote reporting of active regions with Wi-FLIP
J. Fernández-Berni, R. Carmona-Galán, G. Liñán-Cembrano, A. Zarándy and A. Rodríguez-Vázquez
Conference · International Conference on Distributed Smart Cameras ICDSC 2011
abstract      pdf

This paper describes a real-time application programmed into Wi-FLIP, a wireless smart camera resulting from the integration of FLIP-Q, a focal-plane low-power image processor, and Imote2, a commercial WSN platform. The application, though simple, shows the potentiality of the reduced scene representations achievable at FLIP-Q to speed up the processing. It consists of detecting the active regions within the scene being surveyed, that is, those regions undergoing thresholded variations with respect to the background. If an activity pattern is prescribed, FLIP-Q enables the reconfigurability of the image plane accordingly, making its detection and tracking easier. For each frame, the number of active regions is calculated and wirelessly reported in real time. A base station picks up the radio signal and sends the information to a PC via USB, also in real time. Frame rates up to around 10fps have been achieved, although it greatly depends on the light conditions and the image plane division grid. © 2011 IEEE.

Wi-FLIP: A wireless smart camera based on a focal-plane low-power image processor
J. Fernández-Berni, R. Carmona-Galán, G. Liñán-Cembrano, A. Zarándy and A. Rodríguez-Vázquez
Conference · International Conference on Distributed Smart Cameras ICDSC 2011
abstract      pdf

This paper presents Wi-FLIP, a vision-enabled WSN node resulting from the integration of FLIP-Q, a prototype vision chip, and Imotel, a commercial WSN platform. In Wi-FLIP, image processing is not only constrained to the digital domain like in conventional architectures. Instead, its image sensor - the FLIP-Q prototype - incorporates pixel-level processing elements (PEs) implemented by analog circuitry. These PEs are interconnected, rendering a massively parallel SIMD-based focal-plane array. Low-level image processing tasks fit very well into this processing scheme. They feature a heavy computational load composed of pixel-wise repetitive operations which can be realized in parallel with moderate accuracy. In such circumstances, analog circuitry, not very precise but faster and more area- and power-efficient than its digital counterpart, has been extensively reported to achieve better performance. The Wi-FLIP's image sensor does not therefore output raw but pre-processed images that make the subsequent digital processing much lighter. The energy cost of such pre-processing is really low - 5.6mW for the worst-case scenario. As a result, for the configuration where the Imote2's processor works at minimum clock frequency, the maximum power consumed by our prototype represents only the 5.2% of the whole system power consumption. This percentage gets even lower as the clock frequency increases. We report experimental results for different algorithms, image resolutions and clock frequencies. The main drawback of this first version of Wi-FLIP is the low frame rate reachable due to the non-standard GPIO-based FLIPQ-to-Imote2 interface. © 2011 IEEE.

Multi-resolution low-power gaussian filtering by reconfigurable focal-plane binning
J. Fernández-Berni, R. Carmona-Galán, F. Pozas-Flores, A. Zarándy and A. Rodríguez-Vázquez
Conference · SPIE Microtechnologies for the New Millennium 2011
abstract      pdf

Gaussian filtering is a basic tool for image processing. Noise reduction, scale-space generation or edge detection are examples of tasks where different Gaussian filters can be successfully utilized. However, their implementation in a conventional digital processor by applying a convolution kernel throughout the image is quite inefficient. Not only the value of every single pixel is taken into consideration sucessively, but also contributions from their neighbors need to be taken into account. Processing of the frame is serialized and memory access is intensive and recurrent. The result is a low operation speed or, alternatively, a high power consumption. This inefficiency is specially remarkable for filters with large variance, as the kernel size increases significantly. In this paper, a different approach to achieve Gaussian filtering is proposed. It is oriented to applications with very low power budgets. The key point is a reconfigurable focal-plane binning. Pixels are grouped according to the targeted resolution by means of a division grid. Then, two consecutive shifts of this grid in opposite directions carry out the spread of information to the neighborhood of each pixel in parallel. The outcome is equivalent to the application of a 3x3 binomial filter kernel, which in turns is a good approximation of a Gaussian filter, on the original image. The variance of the closest Gaussian filter is around 0.5. By repeating the operation, Gaussian filters with larger variances can be achieved. A rough estimation of the necessary energy for each repetition until reaching the desired filter is below 20nJ for a QCIF-size array. Finally, experimental results of a QCIF proof-of-concept focal-plane array manufactured in 0.35 mu m CMOS technology are presented. A maximum RMSE of only 1.2% is obtained by the on-chip Gaussian filtering with respect to the corresponding equivalent ideal filter implemented off-chip.

Design of a smart SiPM based on focal-plane processing elements for improved spatial resolution in PET
F. Pozas-Flores, R. Carmona-Galán, J. Fernández-Berni and A. Rodríguez-Vázquez
Conference · SPIE Microtechnologies for the New Millennium 2011
abstract      pdf

Single-photon avalanche diodes are compatible with standard CMOS. It means that photo-multipliers for scintillation detectors in nuclear medicine (i. e. PET, SPECT) can be built in inexpensive technologies. These silicon photo-multipliers consist in arrays of, usually passively-quenched, SPADs whose output current is sensed by some analog readout circuitry. In addition to the implementation of photosensors that are sensitive to single-photon events, analog, digital and mixed-signal processing circuitry can be included in the same CMOS chip. For instance, the SPAD can be employed as an event detector, and with the help of some in-pixel circuitry, a digitized photo-multiplier can be built in which every single-photon detection event is summed up by a counter. Moreover, this concurrent processing circuitry can be employed to realize low level image processing tasks. They can be efficiently implemented by this architecture given their intrinsic parallelism. Our proposal is to operate onto the light-induced signal at the focal plane in order to obtain a more elaborated record of the detection. For instance, by providing some characterization of the light spot. Information about the depth-of-interaction, in scintillation detectors, can be derived from the position and shape of the scintillation light distribution. This will ultimately have an impact on the spatial resolution that can be achieved. We are presenting the design in CMOS of an array of detector cells. Each cell contains a SPAD, an MOS-based passive quenching circuit and drivers for the column and row detection lines.

Focal-plane generation of multi-resolution and multi-scale image representation for low-power vision applications
J. Fernández-Berni, R. Carmona-Galán, L. Carranza-González, A. Zarandy and A. Rodríguez-Vázquez
Conference · SPIE Infrared Technology and Applications XXXVII, 2011
abstract      pdf

Early vision stages represent a considerably heavy computational load. A huge amount of data needs to be processed under strict timing and power requirements. Conventional architectures usually fail to adhere to the specifications in many application fields, especially when autonomous vision-enabled devices are to be implemented, like in lightweight UAVs, robotics or wireless sensor networks. A bioinspired architectural approach can be employed consisting of a hierarchical division of the processing chain, conveying the highest computational demand to the focal plane. There, distributed processing elements, concurrent with the photosensitive devices, influence the image capture and generate a pre-processed representation of the scene where only the information of interest for subsequent stages remains. These focal-plane operators are implemented by analog building blocks, which may individually be a little imprecise, but as a whole render the appropriate image processing very efficiently. As a proof of concept, we have developed a 176x144-pixel smart CMOS imager that delivers lighter but enriched representations of the scene. Each pixel of the array contains a photosensor and some switches and weighted paths allowing reconfigurable resolution and spatial filtering. An energy-based image representation is also supported. These functionalities greatly simplify the operation of the subsequent digital processor implementing the high level logic of the vision algorithm. The resulting figures, 5.6mW@30fps, permit the integration of the smart image sensor with a wireless interface module (Imote2 from Memsic Corp.) for the development of vision-enabled

On-site forest fire smoke detection by low-power autonomous vision sensor
J. Fernández-Berni, R. Carmona-Galán, L. Carranza-González, A. Cano-Rojas, J. F. Martínez-Carmona, A. Rodríguez-Vázquez and S. Morillas-Castillo
Conference · International Conference on Forest Fire Research ICFFR 2010
abstract      pdf

Early detection plays a crucial role to prevent forest fires from spreading. Wireless vision sensor networks deployed throughout high-risk areas can perform fine-grained surveillance and thereby very early detection and precise location of forest fires. One of the fundamental requirements that need to be met at the network nodes is reliable low-power on-site image processing. It greatly simplifies the communication infrastructure of the network as only alarm signals instead of complete images are transmitted, anticipating thus a very competitive cost. As a first approximation to fulfill such a requirement, this paper reports the results achieved from field tests carried out in collaboration with the Andalusian Fire-Fighting Service (INFOCA). Two controlled burns of forest debris were realized (www.youtube.com/user/vmoteProject). Smoke was successfully detected on-site by the EyeRISTM v1.2, a general-purpose autonomous vision system, built by AnaFocus Ltd., in which a vision algorithm was programmed. No false alarm was triggered despite the significant motion other than smoke present in the scene. Finally, as a further step, we describe the preliminary laboratory results obtained from a prototype vision chip which implements, at very low energy cost, some image processing primitives oriented to environmental monitoring.

Robust focal-plane analog processing hardware for dynamic texture segmentation
J. Fernández-Berni and R. Carmona-Galán
Conference · International Workshop on Cellular Nanoscale Networks and their Applications CNNA 2010
abstract      pdf

Cellular Nonlinear Networks (CNN) establish a theoretical framework in which programmable focal-plane image processing arrays can be developed. The conventional support for its analog programmability in VLSI is the implementation of transconductor-based multiplication of the input, output and state variables times the corresponding template elements. However, some distributions of weights can be greatly affected by the intrinsic nonidealities of the physical implementation. This is exactly the case when implementing linear diffusion within a transconductor-based CNN implementation. In this paper we propose an alternative implementation: a resistive grid based on MOSFETs operating in the triode region to realize linear diffusion of the input image, considered as the initial state of the network. In addition, these MOS-resistors can be employed as switches in order to sub-divide the image into bins, sized to track features on the appropriate scale. Thus, by simply controlling the size of the binning and for how long the pixel voltages will diffuse, it will be possible to segment and track dynamic textures along an image flow. Each frame of the flow is described by a smaller image in which each pixel represents the energy of the corresponding image bin, once the non-relevant spatial frequency components have been filtered out. We will demonstrate that the resulting low-resolution representation of the scene is very robust to the different sources of nonidealities in a standard CMOS technology. © 2010 IEEE.

Low-power focal-plane dynamic texture segmentation based on programmable image binning and diffusion hardware
J. Ferández-Berni and R. Carmona-Galán
Conference · SPIE Microtechnologies: Bioengineered and Bioinspired Systems IV, 2009
abstract      pdf

Stand-alone applications of vision are severely constrained by their limited power budget. This is one of the main reasons why vision has not yet been widely incorporated into wireless sensor networks. For them, image processing should be suscribed to the sensor node in order to reduce network traffic and its associated power consumption. In this scenario, operating the conventional acquisition-digitization-processing chain is unfeasible under tight power limitations. A bio-inspired scheme can be followed to meet the timing requirements while maintaining a low power consumption. In our approach, part of the low-level image processing is conveyed to the focal-plane thus speeding up system operation. Moreover, if a moderate accuracy is permissible, signal processing is realized in the analog domain, resulting in a highly efficient implementation. In this paper we propose a circuit to realize dynamic texture segmentation based on focal-plane spatial bandpass filtering of image subdivisions. By the appropriate binning, we introduce some constrains into the spatial extent of the targeted texture. By running time-controlled linear diffusion within each bin, a specific band of spatial frequencies can be highlighted. Measuring the average energy of the components in that band at each image bin the presence of a targeted texture can be detected and quantified. The resulting low-resolution representation of the scene can be then employed to track the texture along an image flow. An application specific chip, based on this analysis, is being developed for natural spaces monitoring by means of a network of low-power vision systems. ©2009 SPIE.

Accurate design of a MOS-based resistive network for time-controlled diffusion filtering
J. Fernández-Berni and R. Carmona-Galán
Conference · European Conference on Circuit Theory and Design ECCTD 2009
abstract      pdf

This paper analyses a MOS-based resistive network suitable for massively parallel image processing. The inclusion of MOS transistors biased in the ohmic region instead of true resistors permits certain control over the underlying spatial filtering while reducing the required area for VLSI implementation. However, it also leads to nonlinearities and thereby to errors with respect to an ideal resistive grid. By studying an elementary network composed of only two nodes we determine the guidelines to be followed in order to minimize the error according to the selected signal range. These guidelines are then extrapolated to larger networks demonstrating that pretty accurate networks can be achieved even for relatively wide signal ranges. Simulations are employed to validate the extrapolated results. The numerical examples will also allow to visualize how the insertion of MOS transistors affects the spatial filtering carried out by the grid.

A VLSI-oriented and power-efficient approach for dynamic texture recognition applied to smoke detection
J. Fernández-Berni, R. Carmona-Galán and L. Carranza-González
Conference · International Conference on Computer Vision Theory and Applications VISAPP 2009
abstract      pdf

The recognition of dynamic textures is fundamental in processing image sequences as they are very common in natural scenes. The computation of the optic flow is the most popular method to detect, segment and analyse dynamic textures. For weak dynamic textures, this method is specially adequate. However, for strong dynamic textures, it implies heavy computational load and therefore an important energy consumption. In this paper, we propose a novel approach intented to be implemented by very low-power integrated vision devices. It is based on a simple and flexible computation at the focal plane implemented by power-efficient hardware. The first stages of the processing are dedicated to remove redundant spatial information in order to obtain a simplified representation of the original scene. This simplified representation can be used by subsequent digital processing stages to finally decide about the presence and evolution of a certain dynamic texture in the scene. As an application of the proposed approach, we present the preliminary results of smoke detection for the development of a forest fire detection system based on a wireless vision sensor network.

A vision-based monitoring system for very early automatic detection of forest fires
J. Fernández-Berni, R. Carmona-Galán and L. Carranza-González
Conference · International Conference on Modelling, Monitoring and Management of Forest Fires, 2008
abstract      doi      pdf

This paper describes a system capable of detecting smoke at the very beginning of a forest fire with a precise spatial resolution. The system is based on a wireless vision sensor network. Each sensor monitors a small area of vegetation by running on-site a tailored vision algorithm to detect the presence of smoke. This algorithm examines chromaticity changes and spatio-temporal patterns in the scene that are characteristic of the smoke dynamics at early stages of propagation. Processing takes place at the sensor nodes and, if that is the case, an alarm signal is transmitted through the network along with a reference to the location of the triggered zone - without requiring complex GIS systems. This method improves the spatial resolution on the surveilled area and reduces the rate of false alarms. An energy efficient implementation of the sensor/processor devices is crucial as it determines the autonomy of the network nodes. At this point, we have developed an ad hoc vision algorithm, adapted to the nature of the problem, to be integrated into a single-chip sensor/processor. As a first step to validate the feasibility of the system, we applied the algorithm to smoke sequences recorded with commercial cameras at real-world scenaRíos that simulate the working conditions of the network nodes. The results obtained point to a very high reliability and robustness in the detection process.

Books


Visual Inference for IoT Systems: A Practical Approach
D. Velasco-Montero, J. Fernández-Berni and A. Rodríguez-Vázquez
Book · 145 p, 2022
abstract      link      

This book presents a systematic approach to the implementation of Internet of Things (IoT) devices achieving visual inference through deep neural networks. Practical aspects are covered, with a focus on providing guidelines to optimally select hardware and software components as well as network architectures according to prescribed application requirements.
The monograph includes a remarkable set of experimental results and functional procedures supporting the theoretical concepts and methodologies introduced. A case study on animal recognition based on smart camera traps is also presented and thoroughly analyzed. In this case study, different system alternatives are explored and a particular realization is completely developed.
Illustrations, numerous plots from simulations and experiments, and supporting information in the form of charts and tables make Visual Inference and IoT Systems: A Practical Approach a clear and detailed guide to the topic. It will be of interest to researchers, industrial practitioners, and graduate students in the fields of computer vision and IoT.

Low-power smart imagers for vision-enabled sensor networks
J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Book · 156 p, 2012
abstract      link      pdf

This book presents a comprehensive, systematic approach to the development of vision system architectures that employ sensory-processing concurrency and parallel processing to meet the autonomy challenges posed by a variety of safety and surveillance applications. Coverage includes a thorough analysis of resistive diffusion networks embedded within an image sensor array. This analysis supports a systematic approach to the design of spatial image filters and their implementation as vision chips in CMOS technology. The book also addresses system-level considerations pertaining to the embedding of these vision chips into vision-enabled wireless sensor networks. Describes a system-level approach for designing of vision devices and embedding them into vision-enabled, wireless sensor networks; Surveys state-of-the-art, vision-enabled WSN nodes; Includes details of specifications and challenges of vision-enabled WSNs; Explains architectures for low-energy CMOS vision chips with embedded, programmable spatial filtering capabilities; Includes considerations pertaining to the integration of vision chips into off-the-shelf WSN platforms.

Book Chapters


Image feature extraction acceleration
J. Fernández-Berni, M. Suárez-Cambre, R. Carmona-Galán, V. Brea, R. del Río, D. Cabello and Á. Rodríguez-Vázquez
Book Chapter · Image Feature Detectors and Descriptors, SCI, vol. 630, pp 109-132, 2016
abstract      doi      pdf

Image feature extraction is instrumental for most of the best-performing algorithms in computer vision. However, it is also expensive in terms of computational and memory resources for embedded systems due to the need of dealing with individual pixels at the earliest processing levels. In this regard, conventional system architectures do not take advantage of potential exploitation of parallelism and distributed memory from the very beginning of the processing chain. Raw pixel values provided by the front-end image sensor are squeezed into a high-speed interface with the rest of system components. Only then, after deserializing this massive dataflow, parallelism, if any, is exploited. This chapter introduces a rather different approach from an architectural point of view. We present two Application-Specific Integrated Circuits (ASICs) where the 2-D array of photo-sensitive devices featured by regular imagers is combined with distributed memory supporting concurrent processing. Custom circuitry is added per pixel in order to accelerate image feature extraction right at the focal plane. Specifically, the proposed sensing-processing chips aim at the acceleration of two flagships algorithms within the computer vision community: the Viola-Jones face detection algorithm and the Scale Invariant Feature Transform (SIFT). Experimental results prove the feasibility and benefits of this architectural solution.

Focal-plane dynamic texture segmentation by programmable binning and scale extraction
J. Fernández-Berni and R. Carmona-Galán
Book Chapter · Focal-Plane Sensor-Processor Chips, pp 105-124, 2011
abstract      doi      pdf

Dynamic textures are spatially repetitive time-varying visual patterns that present, however, some temporal stationarity within their constituting elements. In addition, their spatial and temporal extents are a priori unknown. This kind of pattern is very common in nature; therefore, dynamic texture segmentation is an important task for surveillance and monitoring. Conventional methods employ optic flow computation, though it represents a heavy computational load. Here, we describe texture segmentation based on focal-plane space-scale generation. The programmable size of the subimages to be analysed and the scales to be extracted encode sufficient information from the texture signature to warn its presence. A prototype smart imager has been designed and fabricated in 0.35 μm CMOS, featuring a very low-power scale-space representation of used-defined subimages.

Other publications


No results

  • Journals584
  • Conferences1170
  • Books30
  • Book chapters81
  • Others9
  • 20243
  • 202335
  • 202281
  • 202183
  • 2020103
  • 201977
  • 2018106
  • 2017111
  • 2016104
  • 2015111
  • 2014104
  • 201380
  • 2012108
  • 2011102
  • 2010120
  • 200977
  • 200867
  • 200770
  • 200665
  • 200578
  • 200468
  • 200362
  • 200259
RESEARCH