Academia.eduAcademia.edu

Outline

CMOS realization of a 2-layer CNN universal machine chip

Proceedings of the 2002 7th IEEE International Workshop on Cellular Neural Networks and Their Applications

https://doi.org/10.1109/CNNA.2002.1035082

Abstract

Reina Mercedrs s/n 41U12 Sevrlla lSPAlNl. Ye/.. t34 9.55056666. Fur: +34 955056686. L.'-mail; rcamiow Oinire min. e! Same of the features of the biological retina can be modelled by a cellular neural network (CNN) composed of two dynamically coupled layers of locally connected elementary nonlinear pmessors. In order lo explore the possibilities of these complex spatia-temporal dynamics in image prmessing, a prototype chip has been developed implementing this CNN model with analog sigaal processing blocks. This chip has been designed in a O.Spm CMOS technolo8y. Design challenges, trade-offs and the building blocks of such a high-complexity system (0.5 x 10 transistors, most of them operating io analog mode) are presented in this pap?.

zyx zyxwvuts zyxw CMOS REALIZATION OF A 2-LAYER CNN UNIVERSAL MACHINE CHIP zyxwvutsrqpo R . CARMONA. E JIbl6NEZ-GARRIDO.R. DOMfNGUEZ-CASTRO. s ESPEJOAND A. R O D R ~ E Z . V ~ Z Q U E Z Inrrituro de Micmelec-rronica de Sedb-CNM-CSIC. AI&. Reina Mercedrs s/n 41U12 Sevrlla lSPAlNl. Ye/.. t34 9.55056666. Fur: +34 955056686. L.’-mail; rcamiow Oinire min.e! zyxwvutsrqp zyxwvutsrqpo Same of the features of the biological retina can be modelled by a cellular neural network (CNN) composed of two dynamically coupled layers of locally connected elementary nonlinear pmessors. In order lo explore the possibilities of these complex spatia-temporal dynamics in image prmessing, a prototype chip has been developed implementing this CNN model with analog sigaal processing blocks. This chip has been designed in a O.Spm CMOS technolo8y. Design challenges, trade-offs and the building blocks of such a high-complexity system (0.5 x 10 transistors, most of them operating io analog mode) are presented in this pap?. zyxwvut 1 CNN-UM chip architecture I . I CNN-based analogy of the biological retina The vertebrate retina is composed of several layers of horizontal and amacrine cells I . These layers, coupled by means of bipolar cells, end, on one side, in a layer of photodetectors and, on the other, in a layer of ganglion cells. The photodetectors capture the visual stimuli and translate it into activation patterns. The ganglion cells, at the other end of the retina, convert the continuous activation signals into pulse-like action potential signals that can be transmitted over longer distances by the nervous system. The activation signals in the retina are weighted and promediated to bias photodetectors and to inhibit the vertical pathway. Patterns of activity are formed dynamically by the presence or absence of visual stimuli. In this description, similarities can be found with the CNNs *: not only in the topology, but also in that we have 2D aggregations of continuous signals, local connectivity between elementary nonlinear processors and analog weighted interactions between them. Motivated by these coincidences, a model for the operations of the biological retina based on CNNs has been developed ’. It contains two coupled CNN layers plus an additional layer incorporating analog arithmetics to combine the outputs of the dynamically linked layers. This can be realized by a CNN Universal Machine (CNN-UM) architecture in which each cell contains two first-order cores, common local analog and logic memories (LAMS and LLMs) and common logic and communication units (LLU and LAOU). The evolution law of each cell, C(i, j ) ,is given by two coupled equations: dxi, i j ( 4 .l c. = -g[xl,lj(-r)l+b l l , C Q u l . l j + ’ l . i j + rl “II,klYl.(i+k)(j+I)+01ZY2.ij k=-r,I=-r ‘1 dxz, 7 2 7 =-g[x,tj(-r)l (1) 12 zyx %2, k l y 2 , ( i + k ) ( j + l ) + a 2 1 Y 1 i j + b 2 2 . w u 2 , lj + 2 2 , i j + k -,*I i -r2 i a. This workhasbeensuppaledby ~PR~VV~rojeetIST-1999-190U7andbyON~ICOPGranfN-OWI4-M 1-0429, and the Spanish ClCYT Project TIC-1999-0826. 444 zyxw zyxwvut zyxwvutsr zyxwvuts zyxwvuts zyxw I zyxwvu zyxwvutsrq 445 where the nonlinear losses term and the output function in each layer are those of the FullSignal-Range (FSR) CNN model s, which, having a limitation on the cell state voltage allows for identifying state and output: if mx,, j j g(x,, j j ) = lim m+- xn,ij -mix, jjl xn,ij> 1 ijl if lxn. 5 1 if (2) xn,ii < -1 1.2 Protorype chipfloorplan The proposed chip consists in an analog parallel array processor (MAP)of 32 x 32 identical cells (Fig. 4). It is surrounded by the circuits implementing the boundary conditions for the CNN dynamics. There is also an VO interface, a timing and control unit and a program memory. The VO interface consists in a serializing-deserializing analog multiplexor. The program memory is composed of 24 blocks of SRAM of 64 bytes of capacity, 1kB dedicated to the analog program, and O S k B to the logic program. In addition, the analog instructions and reference signals need to be transmitted to every cell in the network in the form of analog voltages. Thus, a bank of DIA converters interfaces the analog program memory with the processing array. Finally, the timing unit is composed by an internal clockkounter and a set of finite-state- machines that generate the internal signals that enable the processes of image up/downloading and program memory accesses. 1.3 Basic cell scheme The elementary processor of the CNN includes two coupled continuous-time cores (Fig. l(a)). Each one belongs to one of the two different layers of the network. The synaptic connections between processing elements of the same or different layer are represented by arrows in the diagram. The basic processor contains also the LLU, and the LAMS and LLMs to store intermediate results. All the blocks in the cell communicate via an intra-cell data bus, which is multiplexed to the array VO interface. Control bits and switch configuration are passed to the cell directly from the global programming unit. The internal structure of each CNN core is depicted in the diagram of Fig. l(b). Each core receives contributions from the rest of the processing nodes in the neighbourhood, and these contributions are summed and integrated in the state capacitor. The two layers differ in that the first layer has a scalable time constant, controlled by the appropriate binary code, while the second layer has a fixed time constant. The evolution of the state variable is also driven by self-feedhackand by the feedforward action of the stored input and bias patterns. There is a voltage limiter for implementing the FSR CNN model. The state variable is transmitted in voltage form to the synaptic blocks, in the periphery of the cell, where weighted contributions to the neighbours’ are generated. There is also a current memory that will be employed for cancellation of the offset of the synaptic blocks. 446 zyxwvutsr zyxwvutsrq y 110 2nd CNN layer node zyxwvutsrqp zyxwvutsrq Figure 1. (a) Coocephlal diagram of the basic cell and (b) the CNN layen’ nodes. zyxwvu zyxwvu Initialization of the state, input and/or bias voltages is done through a mesh of multiplexing analog switches that connect to the cell’s internal data bus. 2 Analog building blocks for the basic cell zyxw 2.1 Single-transistor synapse The synapse is a four-quadrant analog multiplier. Their inputs will be the cell state ( Y x ) identified with the cell output in the FSR model- or input and the weight voltages ( Y,), while the output ( I , ) will be the cell’s current contribution to a neighbouring cell. It can be achieved by a single transistor biased in the ohmic region For a PMOS with gate voltage Y , = Y , + Y,, and the p-diffusion terminals at V , = Yw0+ Y , and Y , -where Yxo and Y , are the reference central values for the state and weight voltages, that allow signals Y , and V , to have either sign- the drain-to-source current is: ‘. I , = - P , Y , Y , - P , Y ~ Y ~ ~ + I i . . J - YW O zy “9 --2 (4) which is a four-quadrant multiplier with an offset term that is time-invariant -at least during the evolution of the network- and not depending on the cell state. This offset that can be eliminated by a calibration step, with the help of a current memory. 2.2 Current conveyor and level shifring zyxwv For the synapse to operate properly, the input node of the CNN core must be kept at constant voltage, independently of what current is entered. This is achieved by a current conveyor (Fig. 2(a)). Any difference between the voltage at node and the reference Ywois amplified and the negative feedback corrects the deviation. Notice that a voltage offset in the amplifier will result in an error of the same order. An offset cancellation mechanism is provided (Fig. 2(b)). Signal oca, shorts the Operational Transconductance Amplifier (OTA) inputs and enables diode-mode operation of transistor M,, , that will conduce a 0 vos z zyxw zyxw 447 Io zyxwvutsrqpo zyxwvutsrqp zyx zyxwvuts zyxwvutsrq Figure 2. (a) Current conveyor and (b) OTA realization with offsetcorrection mechanism current I,, such as to cancel out the current offset. Once &, is turned off, the total current injected into the load capacitor is offset-free: I, = I, + I,,, - I, = g,v, (5) 2.3 $1 current memory As referred, the offset term of the synapse current must be removed for its output current to represent the result of a four-quadrant multiplication. For this purpose all the synapses are reset to V , = V, , Then, the resulting current, which is the sum of the offset currents of all the synapses concurrently connected to the same node, is memorized. This value will be substracted on-line from the input current when the CNN loop is closed, resulting in a one-step cancellation of the errors of all the synapses. The validity of this method relies in the accuracy of the current memory. For instance, in this chip, the sum of all the contributions will range, for the applications for which it has been designed, from 18pA to 46kA . On the other side, the maximum signal to be handled is 1pA . If a signal resolution of 8b is pretended, then OSLSB = Z n A . Thus, our current memory must be able to distinguish Z n A out of 46pA. This represents an equivalent resolution of 14.5b. In order to achieve such accuracy level, a S'I current memory is used. It is composed by three stages (Fig. 3). each one consisting in a switch, a capacitor and a transistor. I , is the current to be memorized. After memorization the only error left corresponds to the last stage. The former stages do not contribute to the error in the memorized current. If the S'I block is designed so as to store the most significant bits in the first capacitor, and the less significant bits in the last one, this error can be made quite small. 2.4 lime-constant scaling The differential equation that governs the evolution of the network, Eq. 1, can be written as a sum of current contributions injected to the state capacitor. Scaling up/down this sum 448 zyxwvutsrqpo zyxwvutsr of currents is equivalent to scaling the capacitor and, thus, speeding up/down the network dynamics. Therefore, scaling the input current with the help of a current mirror, for instance, will have the effect of scaling the time-constant. A circuit for continuously adjusting the current gain of a mirror can be designed based on a regulated-Cascode current mirror in the ohmic region. But the strong dependence of the ohmic-region biased transistors on the power rail voltage causes mismatches in T between cells in the same layer. An alternative to this is a binary programmable current mirror. It trades resolution in T for robustness, hence, the mismatch between the time constants of the different cells is now fairly attenuated. A new problem arises, though, because of current scaling. If the input current is allowed to be reshaped to a 16-times smaller waveform, then the current memory is obliged to operate over a wider dynamic range. But, if designed to operate on large currents, the current memory will not work for the tiny currents of the scaled version of the input. On the contrary, if it is designed to run on small input currents, long transistors will be needed, and the operation will be unreliable for the larger currents. One way of avoiding this situation is to make the S31 memory to work on the original unscaled version of the input current. Therefore, the adjustable-time-constant CNN core consists in a current 3 conveyor, followed by the S I current memory and then the binary weighted current mirror. The problem now is that the offsets introduced by the scaling block add up to the signal and the required accuracy levels can be lost. Our proposal is depicted in Fig. 3. It consists in placing the scaling block (programmable mirror) between the current conveyor and the current memory. In this way, any offset error will be cancelled at the auto-zeroing phase. In the picture, the voltage reference generated with the current conveyor, the regu3 lated-Cascode current mirrors and the S I memory can be easily identified. The inverter, A i , driving the gates of the transistors of the current memory is required for stability. Without it. the output node, will diverge from the equilibrium. zyxwv zyxwvut zyxwvut zyxwvutsr 8, Rgure 3. laput blaek with current scaling. S’I memory and offset-corrected OTA schematic. 449 zyxw zyxwvutsrqpo zyxwvutsrqponml Figure 4. Prototype chip photopph zyxwvut zyxwvutsr zyxwvut 3 Chip data and simulations A prototype chip has been designed and fabricated in a 0.5pm single-ply triple-metal CMOS technology. Its dimensions are 9.27 x 8.45 sq. mm (photograph in Fig. 4). The cell density achieved is 29.24cells/mm2. The programmable dynamics of the chip permit the observation of different phenomena of the type of propagation of waves, pattern generation, etc. Fig. 5 displays the evolution of the state variable in a reduced network, 1 x 8 cells, in which the propagation of a wave front in 1-D has been programmed. It is triggered by a marker in the first layer of cell C,,and induced in the second layer as can be seen. By controlling the network dynamics and combining the results with the help of the builtin local logic and arithmetic operators, rather involved image processing tasks can be programmed, for instance, grayscale contour detection, skeletonization, etc. '. 4 Conclusions The proposed approach supposes a promising altemative to conventional digital image processing for applications related with early-vision and low-level focal-planeimage processing. Based on a simple but precise model of part of the real biological system, a feasible efficient implementation of an artificial vision device has been designed. The peak operation speed of the chip will outdo its digital counterparts due to the fully parallel nature of the processing, which is, once more, based on the analogy not on the simulation. X16 zyxwvutsrqponmlkjihg ......... I........... ................. I 0 0.2 I 0.4 (a) Slow CNN layer I I I zyxwvutsr zyxw 0.6 lime (semnda) 0.6 1 x10-4 1.2 zyxwvutsrqpo 0 0.2 (b) Fast CNN layer Figure 5 . 1-Dwave propagation 0.4 0.6 lime ( m n d s ) 0.6 1 x,0.41.2 zyxw zyxwvutsrqponm zyxwvuts zyxwvut zyxwvu zyxwvutsrqp 451 References 1. E Werblin, Synaptic Connections,Receptive Fields and Patterns of Activity in the Tiger Salamander Retina, Znv. Oph. and vis. Sc. 32 (3), 459 (1991). 2. E Werblin, T. Roska and L. 0.Chua, The Analogic Cellular Neural Network as a Bionic Eye. Znt. J. Circ.Theos andApp. 23 (6).541 (Wiley, Boston, 1995). 3. Cs. Rekeczky, T. Serrano-Gotmedona. T. Roska and A. Rodriguez-VBzquez,A Stored Program 2nd Orded3-Layer Complex Cell CNN-UM. Pmc. 6th Znt. U! Cel. Neuc Net. Apps., 219 (Catania, 2000). 4. T. Roska and L. 0. Chua, The CNN Universal Machine: An Analogic Array Computer. ZEEE Trans. Circ.Syst. ZI: Anal. Dig.Sign. Proc., 40 (3), 163 (1993). 5. S . Espejo, R. Carmona, R. Dominguez-Castro and A. Rodriguez-Vdzquez, A VLSI Oriented Continuous-Time CNN Model, h t . J. Circ. Theor: Apps., 24 (3) 341 (Wiley, Boston, 1996) 6. R. Domfnguez-Castro, A. Rodriguez-VAzquez, S . Espejo and R. Carmona, FourQuadrant One-Transistor Synapse for High Density CNN Implementations. Proc. 5th Znt. W Cel. Neus Net. and Apps., 243 (London, 1998).

References (6)

  1. E Werblin, Synaptic Connections, Receptive Fields and Patterns of Activity in the Tiger Salamander Retina, Znv. Oph. and vis. Sc. 32 (3), 459 (1991).
  2. E Werblin, T. Roska and L. 0. Chua, The Analogic Cellular Neural Network as a Bionic Eye. Znt. J. Circ. Theos andApp. 23 (6). 541 (Wiley, Boston, 1995).
  3. Cs. Rekeczky, T. Serrano-Gotmedona. T. Roska and A. Rodriguez-VBzquez, A Stored Program 2nd Orded3-Layer Complex Cell CNN-UM. Pmc. 6th Znt. U! Cel. Neuc Net. Apps., 219 (Catania, 2000).
  4. T. Roska and L. 0. Chua, The CNN Universal Machine: An Analogic Array Computer. ZEEE Trans. Circ. Syst. ZI: Anal. Dig. Sign. Proc., 40 (3), 163 (1993).
  5. S . Espejo, R. Carmona, R. Dominguez-Castro and A. Rodriguez-Vdzquez, A VLSI Oriented Continuous-Time CNN Model, h t . J. Circ. Theor: Apps., 24 (3) 341 (Wiley, Boston, 1996)
  6. R. Domfnguez-Castro, A. Rodriguez-VAzquez, S . Espejo and R. Carmona, Four- Quadrant One-Transistor Synapse for High Density CNN Implementations. Proc. 5th Znt. W Cel. Neus Net. and Apps., 243 (London, 1998).
About the author
Papers
66
Followers
9
View all papers from Ricardo Carmona Galánarrow_forward