Academia.eduAcademia.edu

Analog integrated circuits and signal processing

1996, Analog Integrated Circuits and Signal Processing

https://doi.org/10.1007/S10470-012-9991-8

Abstract

Your article is protected by copyright and all rights are held exclusively by Springer Science +Business Media New York. This e-offprint is for personal use only and shall not be selfarchived in electronic repositories. If you wish to self-archive your article, please use the accepted manuscript version for posting on your own website. You may further deposit the accepted manuscript version in any repository, provided it is only made publicly available 12 months after official publication or later and provided acknowledgement is given to the original source of publication and a link is inserted to the published article on Springer's website. The link must be accompanied by the following text: "The final publication is available at link.springer.com".

Key takeaways
sparkles

AI

  1. The proposed Hamming weight comparator operates at 1 GHz with 1.136 mW power consumption for N=16.
  2. The design utilizes a novel switched-capacitor array to enable concurrent accumulation of bit-vector weights.
  3. Capacitance mismatches minimally impact performance, reducing resolution margin from 22.7 to 22.3 mV for N=16.
  4. Circuit adaptability supports variable input vector sizes and achieves low latency through four sequential events.
  5. The architecture integrates both analog and digital components for high-speed, low-power Hamming weight comparison.
High-speed and low-power scalable Hamming weight comparator based on a non-weighted switched-capacitor array Saleh Abdel-hafeez & Behrooz Parhami Analog Integrated Circuits and Signal Processing An International Journal ISSN 0925-1030 Volume 75 Number 3 Analog Integr Circ Sig Process (2013) 75:417-434 DOI 10.1007/s10470-012-9991-8 1 23 Your article is protected by copyright and all rights are held exclusively by Springer Science +Business Media New York. This e-offprint is for personal use only and shall not be selfarchived in electronic repositories. If you wish to self-archive your article, please use the accepted manuscript version for posting on your own website. You may further deposit the accepted manuscript version in any repository, provided it is only made publicly available 12 months after official publication or later and provided acknowledgement is given to the original source of publication and a link is inserted to the published article on Springer's website. The link must be accompanied by the following text: "The final publication is available at link.springer.com”. 1 23 Author's personal copy Analog Integr Circ Sig Process (2013) 75:417–434 DOI 10.1007/s10470-012-9991-8 High-speed and low-power scalable Hamming weight comparator based on a non-weighted switched-capacitor array Saleh Abdel-hafeez • Behrooz Parhami Received: 8 March 2012 / Revised: 4 August 2012 / Accepted: 16 November 2012 / Published online: 7 December 2012  Springer Science+Business Media New York 2012 Abstract Many practical applications require a comparison of the Hamming weights of two N-bit binary vectors. This comparison can be performed in a fully digital manner or by a mix of analog and digital techniques. In this paper, we propose a design in the latter category that exhibits advantages in speed and power dissipation compared with the best previous designs. The proposed design comprises of two switched-capacitor arrays associated with two comparators placed in parallel, collectively providing a complete comparison outcome of ‘‘[’’, ‘‘\’’, or ‘‘=’’. The switchedcapacitor array circuit is composed of uniform capacitances, thus associating identical charges with all bits, independent of their positions in the input bit-vectors. Once charge accumulation has occurred based on the asserted inputs, the two comparators release the final decision concurrently. The structure is shown to support wide input vectors on the order of 64 bits, while requiring a small silicon area for capacitor array structures, CMOS switches, and latched comparators to compute and store the comparison outcome. HSPICE simulation shows a total power consumption of 1.136 mW, evaluated at the operating frequency of 1 GHz (based on input-to-decision delay) for 16 bit input vectors under 0.15 lm TSMC technology. Keywords Analog summing  Comparator  GHz-class circuit  Hamming weight  Low-power electronic circuit  S. Abdel-hafeez Jordan University of Science and Technology, Irbid 22110, Jordan e-mail: [email protected] B. Parhami (&) Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA 93106-9560, USA e-mail: [email protected] Mixed analog/digital design  Population count  Switched-capacitor array 1 Introduction Comparing the Hamming weights of two bit-vectors of common length N finds applications in neural networks [1], threshold voting circuits [2, 3], digital filtering [4, 5], pattern matching/recognition [6–8], and encoding for data compression [9–11]. Accordingly, many researchers and hardware designers have proposed designs for such comparators [12–18], exploring trade offs between operating frequency, Hamming vector size, power consumption, and area requirements to satisfy design specifications. An early design [12], which required eight cascaded logic stages, was based on deriving a complete Boolean expression for comparing weighted vectors of size 2. The operating frequency resulting from this approach would be unsuitable for large vector sizes. Subsequent modifications [13] were aimed at extending the application of this method to wider vectors. But the extension is accomplished at the expense of a large number of comparators, which imply unacceptably high power consumption. Power can be saved by converting binary inputs to time delay behavior [14, 15], for example, by assigning a variable delay element to every bit in input bit-vectors. With this approach, the circuit achieves its power economy at the expense of long latency, compared with the design of [13]. A subsequent design by Pedroni [16] reduces the delay somewhat, but the presence of N cascaded switches makes the design too slow for wide inputs. A further speed enhancement method [17] minimizes the number of cascaded switches by using sorting networks [9], but the multistage structure of the design would still be problematic for 123 Author's personal copy 418 Analog Integr Circ Sig Process (2013) 75:417–434 wide inputs. Parhami’s design [18] uses arithmetic operations for counting and comparing the number of 1s through high-speed parallel up/down counters, with the vector bits viewed as positive or negative in the process of accumulation. However, sequential clocking precludes high-speed parallel operation, with speed more of a problem for wider input bit-vectors. In this paper, we present a novel architecture that uses a mix of analog charge accumulation and digital switching of weighted bits value. Because the charges associated with binary digits 0 and 1 do not have a positional weighting, the accumulating switched-capacitor array comprises of uniform capacitance values, in contrast with conventional switched-capacitor arrays used for A/D and D/A conversion [19–22]. Once the two input Hamming bit-vectors have been accumulated through the differential switched-capacitor array, differential comparators produce the desired result. A capsule summary of the main attributes and contributions of our design is captured in the following five points: 1. 2. 3. 4. 5. All bits in the input bit-vectors are accumulated concurrently, regardless of the bit-width. Thus, the circuit is attractive for high-speed and broad-scale applications. Small capacitances are used in the circuit to hold charges based on asserted bit values, in much the same way as dynamic low-power memory, making it suitable for low-power design. The circuit is adaptable to variable and unmatched vector sizes, or to comparison between an input vector and a threshold value. A variable size can simply be added on two sides of differential switched-capacitor array, where empty bit positions can be replaced by 0s. Full ternary comparison function (‘‘[’’, ‘‘\’’, and ‘‘=’’) is made possible by integrating two comparators with two switched-capacitor arrays. The circuit is reliable for continued technology scaling and provides enough resolution at the comparator inputs for wide vectors and advanced technologies with low supply voltage. The rest of this paper is organized as follows. Section 2 illustrates the comparison principle for our Hamming weight circuit based on a switched-capacitor array. Section 3 provides CMOS circuit design with related equations and event timings. Section 4 contains sensitivity analysis and effect on resolutions against the input vector size, speed, and continued reduction of the supply voltage. Section 5 reports on the results of our HSPICE simulation-based on 0.15 lm and 1.5 V integrated-circuit technology. Section 6 compares the simulation results against previous efforts having similar objectives to ours (low cost, high-speed, low power, and variable application scale). Section 7 concludes the paper. 123 2 Comparison principle Switched-capacitor arrays constitute common mechanisms for analog-to-digital or digital-to-analog conversion. In these applications, the capacitance values C0, C1,…,CN–1 within the array, and thus the charges they hold, are proportional to successive powers of 2 (Fig. 1). Such switchedcapacitor arrays can be modified to assign equal charges to all 1s in a bit-vector. This modification is appropriate for Hamming weight comparison [23], where all positions in the bit-vectors being compared are equally weighted. Thus, the non-weighted switched-capacitor array is a modified form of the conventional binary weighted variant, with all capacitor values chosen to be the same, that is, C0 = C1= _ = CN-1. Such an array provides a total charge equal to the non-weighted sum of the capacitor charges. In this way, comparing the Hamming weights of two bit-vectors is reduced to comparing the sum of charges they induce on their corresponding capacitors within the array, or comparing voltages resulting from these charges. The non-weighted switched-capacitor array of Fig. 1 functions as follows. The Hamming bit-vectors (Hx, Hy) are fed through a pair of Hamming registers (RHx, RHy) as inputs to the array. The array accumulates the total charges associated with the Hamming bit-vectors, furnishing the Hamming weights (WHx, WHy). These weight-related charge totals go through a series of evaluation steps based on system clock (CLK) phases, providing the comparator input voltages (VHx, VHy). Once the voltages have been established, the comparator (COMP) releases the decision of differential outputs (Voutp, Voutn), indicating a greater than (WHx [ WHy) or less than (WHx \ WHy) outcome. Another comparator is employed to establish the equality outcome (WHx = WHy), in a manner that will become apparent in Sects. 2, 3. The complete direct realization of the Hamming weight comparison is illustrated by the examples in Fig. 2, where the Hamming input values have been chosen for clarity in drawing and ease of discussing the operational principles. There are two Hamming switched-capacitor arrays, associated with two comparators, with their labels and parameters distinguished through the use of the indices ‘‘1’’ and ‘‘2’’, respectively. For ease of reference, Table 1 summarizes all notations related to our Hamming switchedcapacitor array circuit and the associated descriptions. The Hamming bit-vectors Hx and Hy are common inputs to the two switched-capacitor arrays, meaning that Hx1 = Hx2 and Hy1 = Hy2 at the primary input. However, we differentiate the inputs by the indices ‘‘1’’ and ‘‘2’’ due to the different shifting operations applied to them by each circuit. More specifically, the first circuit (comparator structure) applies a shift operation on Hx to form Hx1, while the second circuit applies shifting on Hy to derive Hy2. Author's personal copy Analog Integr Circ Sig Process (2013) 75:417–434 Fig. 1 Schematic diagram of a modified switched-capacitor array to compare the Hamming weights of two bit-vectors Hx and Hy, presented in Hamming registers RHx and RHy 419 CLK Hy Hamming Register-2 (RHy) Control Switches CN-1 Control Switches VHm C1 C0 Voutp VHy VHx Voutn CN-1 C1 C0 Control Switches Hamming Register-1 (RHx) Hx The two Hamming circuits are identical in their design parameters and operational principles. The first one applies an extra KC0 weight, which is less than a unit-Hamming weight, on input Hx1; we refer to this input as the ‘‘right-shifted’’ or simply ‘‘shifted’’ input. Likewise, the second Hamming weight circuit applies the same additional weight, but on the Hamming input Hy2. We next describe Fig. 2(a), which illustrates the Hamming comparison process when WHx = WHy = 2. The right-shifting of Hx1 through the first Hamming weightedcapacitor array implies VHx1 [ VHy1. Thus, the first comparator output value is ‘‘high’’; viz., Voutp1 = Vdd. Concurrently, the right-shifting of Hy2 using the circuit of the second Hamming switched-capacitor array produces VHy2 [ VHx2. Hence, the second comparator output value is ‘‘low’’; viz., Voutp2 = 0 V. Therefore, when the two inputs have equal Hamming weights, the two comparators produce the outputs ‘‘10’’. Similar reasoning can be applied to the cases in Fig. 2(b), (c), yielding the comparator outputs ‘‘00’’ and ‘‘11,’’ respectively. We call this novel structure for handling Hamming inputs (Hx, Hy) through a pair of switched-capacitor arrays with uniform capacitances and an inherent shift with comparator decision, a ‘‘Hamming switched-capacitor array’’. For proper operation, the shift amount must be smaller than a unit-Hamming weight voltage difference at comparator inputs (VHx1, VHy1, VHx2, VHy2), that is, when the Hamming weights of two bit-vectors differ by 1. 3 Circuit descriptions The proposed Hamming circuit, realized in Fig. 3, utilizes fully differential comparators, Hamming switched-capacitor arrays, CMOS switches, and basic registers to hold the Hamming inputs. The comparator is similar to those described in [24, 25] for a given technology scaling of 0.15 lm CMOS process [26]. There are two identical branches at the inputs of the comparator, each is based on a set of uniform C0 capacitors for a total of N Hamming bit-vector width with an addition of extra weighted capacitances KC0. The Hamming decision structure operates in four events within the system clock periods (TCLK). In this way, the CMOS switches of the Hamming circuit are toggled through these events in order to evaluate the voltages, VHx and VHy, at the inputs of comparators. The four signals (PHM, PHR, PHE, PHC), shown in Fig. 4, are generated from the system CLK by a simple finite-state machine (i.e. Mealy), where each signal presents a particular event within CLK, but tailored to different position of duty cycles. Consequently, each phase entails particular event operations where charges are carried to next phase, until the final accumulated charges take hold and the comparison decision is drawn. During the first event, beginning at time (n-1)T and shown in Fig. 4 with the dashed circle 1, both branches of the first and second comparators’ inputs (VHx1, VHy1, VHx2, VHy2) are set at the power supply voltage VHm = 123 Author's personal copy Fig. 2 Three examples of Hamming weight comparisons, with the relationship between WHy and WHx being: a equal, b greater than, and c less than Analog Integr Circ Sig Process (2013) 75:417–434 VHy1 VHy1,VHx1 420 Voutp1 COMP1 VHx1 (Shift Right) Voutn1 VHy1 At the input of first comparator Voutp1 > Voutn1 COMP1 = 1 Hamming Weight Assume: WHy = WHx = 2 At the input of the Hamming switched-capacitor array VHy2 (Shift Right) VHx2 1 VHy2,VHx2 Voutp1 COMP2 Voutn1 VHx1 2 VHx2 3 4 N VHy2 At the input of second comparator Voutp1 < Voutn1 COMP2 = 0 Hamming Weight 1 2 3 4 N VHy1 VHy1,VHx1 (a) Case 1: WHy = WHx Voutp1 COMP1 VHx1 (Shift Right) Voutn1 VHx1 At the input of first comparator Voutp1 < Voutn1 COMP1 = 0 Hamming Weight Assume: WHy = 2 and WHx = 1 At the input of the Hamming switched-capacitor array VHy2 (Shift Right) COMP2 VHx2 1 VHy2,VHx2 Voutp1 Voutn1 VHy1 2 VHx2 3 4 N VHy2 At the input of second comparator Voutp2 < Voutn2 COMP2 = 0 Hamming Weight 1 2 3 4 N VHy1 VHy1,VHx1 (b) Case 2: WHy > WHx Voutp1 COMP1 VHx1 (Shift Right) Voutn1 VHy1 At the input of first comparator Voutp1 > Voutn1 COMP1 = 1 Hamming Weight Assume: WHy= 1 and WHx = 2 At the input of the Hamming switched-capacitor array Voutp1 COMP2 VHx2 Voutn1 1 VHy2,VHx2 VHy2 (Shift Right) VHx1 2 VHy2 3 4 N VHx2 At the input of second comparator Voutp2 > Voutn2 COMP2 = 1 Hamming Weight 1 2 3 4 N (c) Case 3: WHy < WHx Vdd through the switch SHm (shown in Fig. 3). At the same time, the outer plates of all capacitors are tied to Vdd through switches SHxj and SHyj, which are controlled by the event signal PHM, independent of the values of the stored bits RHxj and RHyj (0 B j B n-1) in Hamming registers RHx and RHy, respectively. Accordingly, the total charge during the first event is evaluated using the charge balance equation QTx1 ¼ fVdd ½ðn  1ÞT   Vdd g ðR0  j  N1 Cj þ KC0 Þ ð1Þ 123 where QTx1 donates the total charge in the first comparator structure associated with the Hamming input Hx, and (n-1)T refers to the first event’s timing as depicted in Fig. 4. Because both comparators structure are initialized to the same voltages during this first event, and since all capacitors have the same capacitance value C0, Eq. (1) yields: QTx1 ¼ QTy1 ¼ QTx2 ¼ QTy2 ¼ 0: ð2Þ During the second event, beginning at time (n-1/2)T, the SHm switch is opened, while the capacitors’ outer Author's personal copy Analog Integr Circ Sig Process (2013) 75:417–434 421 Symbol Definition plates are connected to Vdd/2. This is controlled by the event signal PHR independent of the Hamming input bits value. Thus, charge conservation implies: Vdd Power supply voltage fVHx1 ½ðn  1=2ÞT   Vdd=2g ðR0  j  N1 Cj þ KC0 Þ ¼ 0 C0 Unit capacitance K Shifting ratio factor N Input bit-vector length Hxi [Hyi] First [second] Hamming input of the ith Hamming circuit RHx [RHy] Hamming register for storing the first [second] Hamming input First [second] Hamming weightrelated to the ith Hamming circuit Table 1 Terminology for the proposed Hamming switched-capacitor array (i = 1 or 2 is the index of the Hamming circuit) WHxi [WHyi] COMPi ð3Þ Differential comparator in the ith Hamming circuit VHxi [VHyi] First [second] input voltage for COMPi VHm Common Hamming mode voltage equal to Vdd Voutpi [Voutni] Positive [negative] output voltage for COMPi Fig. 3 The proposed Hamming switched-capacitor array circuit Using the same assumption of equal capacitance values, and due to the symmetry of comparator structures, we have VHx1 ½ðn  1=2ÞT  ¼ VHy1 ½ðn  1=2ÞT  ¼ Vdd=2 VHx2 ½ðn  1=2ÞT  ¼ VHy2 ½ðn  1=2ÞT  ¼ Vdd=2 ð4Þ Note that up to this point, the CMOS switches operate independently of the input values. During event 3, beginning at time instant nT, the circuit switches to the evaluation mode, where the outer capacitor plates are connected to input voltages, whose values depend on the corresponding bits of the Hamming registers RHx and RHy: Vdd for an input bit 1 and Vdd/2 for an input bit 0, leading to the voltage value (1 ? b)Vdd/2 for an Vdd/2 Vdd SHyN-1 SHy0 C0 KC0 Voutp1 SHm VHy1 COMP1 VHm VHx1 Voutn1 C0 SHxN-1 C0 KC0 SHx0 First Comparator Structure C0 Shifting Hx by inserting Vdd to KC0 Vdd Vdd/2 Vdd/2 Shifting Hy by inserting Vdd to KC0 Vdd SHyN-1 SHy0 C0 KC0 Voutp2 SHm VHy2 VHm COMP2 VHx2 Voutn2 C0 SHxN-1 C0 SHx0 KC0 Second Comparator Structure C0 Vdd Vdd/2 123 Author's personal copy 422 Fig. 4 Timing events for the proposed Hamming N-bit vector circuit Analog Integr Circ Sig Process (2013) 75:417–434 RESET (n-1)T (n-1/2)T nT (n+1)T (n+1/2)T CLK PHM PHR PHE PHC Event Event Event Event 1 2 3 4 New cycle with new start for new Hamming measurement Cycle 1 input bit value b. During event 3, the evaluation voltage VHx1 is ‘‘right-shifted’’ by applying Vdd to the outer plate of the capacitor labeled KC0 in Fig. 3, with charge preservation implying: R0  j  N1 C0 fVHx1 ½ðn  1=2ÞT   Vdd=2g þ KC0 fVHx1 ½ðn 1=2ÞT   Vdd=2g  ¼ R0  j  N1 C0 VHx1 ½nT   1 þ Hxj Vdd=2 þ KC0 fVHx1 ½nT   Vdd g ð5Þ Note that the first line of Eq. (5) represents the total charge at time (n-1/2)T and the second line constitutes the total charge at time nT, with the latter being a function of the bit values Hxj or their associated applied voltages (1 ? Hxj)Vdd/2. Plugging (4) into (5) and simplifying, we have: PN1 ðN2 þ KÞ þ j¼0 Hxj =2 VHx1 ½nT ¼ Vdd ð6Þ NþK Concurrently, VHy1 is also evaluated during event 3, but without the right-shifting, that is, the voltage Vdd/2 to outer plate of the capacitor labeled KC0 in Fig. 3. Thus, the counterparts to Eqs. (5) and (6) are: 123 R0  j  N1 C0 fVHy1 ½ðn  1=2ÞT   Vdd=2g þ KC0 fVHy1 ½ðn 1=2ÞT   Vdd=2g  ¼ R0  j  N1 C0 VHy1 ½nT   1 þ Hyj Vdd=2 þ KC0 fVHy1 ½nT   Vdd g  PN1 N K j¼0 Hyj =2 2 þ 2 þ VHy1 ½nT ¼ Vdd NþK ð7Þ ð8Þ Notice that the required right-shifting is performed by simply applying a different voltage value to the outer plates of capacitances KC0. This strategy renders the Hamming switched-capacitor structures on both branches identical, thus ensuring balance in the geometry of capacitance areas on both inputs to the comparator. This reduces the mismatch error. Following the same process as above, the voltages associated with the second comparator structure are readily seen to be: PN1 Hxj =2 ðN2 þ K2 Þ þ i¼0 VHx2 ½nT ¼ Vdd NþK PN1 Hyj =2 ðN2 þ KÞ þ i¼0 VHy2 ½nT ¼ Vdd NþK ð9Þ ð10Þ Author's personal copy Analog Integr Circ Sig Process (2013) 75:417–434 423 We note that if the Hamming inputs Hx and Hy are both the all 0s vector, then Eqs. (6), (8), (9), and (10) reduce to:   Vdd K 1þ VHx1 ½nT ¼ &VHy1 ½nT ¼ Vdd=2 2 NþK   Vdd Vdd K & VHy2 ½nT ¼ 1þ : VHx2 ½nT ¼ 2 2 NþK ð11Þ ð12Þ Consequently, for equal Hamming weights, Eq. (11) suggests that the shift amount is K/2 towards Hx, while in (Eq. 12) the shift amount is K/2 towards Hy. Thus, these equations confirm the comparison principle derived in Sect. 2. The final event 4, beginning at time instant (n ? 1/2)T, completes the operation of the comparators by providing output values at Voutp1 and Voutp2, based on the weighted sum of the comparators’ inputs (VHx1, VHy1, VHx2, VHy2). The comparator is realized with latched output that holds the current comparison results during the next new comparison cycle. Table 2 provides the resulting voltages and comparison results for Hamming weight comparison of 4 bit input vectors Hx and Hy in a number of cases involving all possible distances. The entries in Table 2 are derived from Eqs. (6), (8), (9), and (10) by assuming K = 1/4. We will see in the next section that the value of K is limited to the range [0, 1/2]. 4 Accuracy analysis In this section, characteristics of the proposed circuit are identified based on the behavior of switched-capacitor array. These characteristics resemble those of analog/ digital and digital/analog switched-capacitor structures Table 2 Hamming weight comparison between two 4 bit input vectors Hx and Hy, based on the proposed Hamming switched-capacitor array of Fig. 3 WHx, WHy [19–22, 24, 25], where similar definitions and theory are applicable. Assume ideal conditions, in the absence of noise and any imperfections. For N-bit inputs, our structure requires N equal C0 capacitors in the form of a uniform charge redistribution array, along with a shifting capacitor KC0. The lower bound of the shifting factor is based on the requirement for proper operation in the case when two Hamming weights are equal. Using (11) and (12), we have: C0 ðK þ N=2ÞVdd C0 ðN þ K ÞVdd=2 [ ; C0 ðN þ K Þ C 0 ðN þ K Þ implies K [ 0: ð13Þ On the other hand, the upper bound is based on the requirement for proper operation when the two Hamming weights differ by 1. Using Eqs. (6–10), we have: C0 ðN=2 þ K ÞVdd C0 ðN þ K þ 1=2ÞVdd=2 \ ; C0 ðN þ K Þ C0 ðN þ K Þ implies K\1=2: ð14Þ Furthermore, the minimum resolution between the input branches of the comparators can be expressed in a worstcase scenario when the two Hamming weights are equal as derived in (11) and (12), with the only difference between them resulting from the shift ratio:     Vdd K Vdd Vdd K 1þ ¼ Vdiff ¼  : 2 NþK 2 2 NþK ð15Þ Thus, the minimum comparators’ resolution is proportional to Vdd and inversely proportional to the input vector width N. This will make the design of comparators more challenging with gradual increase in bitwidth and continued technology scaling. An optimal value for K is obtained by striking a balance between the two constraints Eqs. (14) and (15): Kopt ¼ 0:5: COMP1 ð16Þ Voutp1, Voutp2 COMP2 Result VHx1/Vdd VHy1/Vdd VHx2/Vdd VHy2/Vdd 0, 0 0.529 0.500 0.500 0.529 1, 0 = 1, 0 0.647 0.500 0.618 0.529 1, 1 [ 2, 0 0.765 0.500 0.735 0.529 1, 1 [ 3, 0 0.882 0.500 0.853 0.529 1, 1 [ 4, 0 0, 1 1.000 0.500 0.500 0.647 0.971 0.529 0.529 0.618 1, 1 0, 0 [ \ 0, 2 0.500 0.765 0.529 0.735 0, 0 \ 0, 3 0.500 0.882 0.529 0.853 0, 0 \ 0, 4 0.500 1.000 0.529 0.971 0, 0 \ 123 Author's personal copy 424 Analog Integr Circ Sig Process (2013) 75:417–434 The ideal transfer characteristics from the difference of two Hamming weights to their accumulated charges at the input branches of the comparators (|VHx1-VHy1|, |VHx2VHy2|) is illustrated in Fig. 5. The offset error is commonly known as the constant deviation of the actual output from ideal output when the ideal output should be zero. Gain error is characterized by a slope change in the transfer characteristics. This type of error, depicted in Fig. 5(a), is usually due to process variability or the power supply voltage differences at the input branches of the comparators [20–25]. Another type of deviation is a measure of the non-linearity error after the offset and gain errors have been removed, as shown in Fig. 5(b). Assuming the X-axis is the Hamming input vectors bits and the Y-axis is the accumulated weights at the given X-axis bits. The accumulated weights are measured based on either Eq. (8) or (9) which applies on nonshifting inputs of first and second comparator structure (Fig. 3). Consider an example where the 8 bit Hx and Hy have 4 and 5 bits asserted, respectively, with the remaining bits being 0s. Applying (9) on Hx provides the weight Vddð2 þ K2 þ N2 Þ=ðN þ KÞ on the Y-axis, while Eq.(8) applied on Hy yields the weight Vddð52 þ K2 þ N2 Þ=ðN þ KÞ on the Y-axis, corresponding to two points on the graph. Furthermore, when Hx and Hy have the same number of asserted bits, they provide one points on the graph using either Eq.(8) or (9). By increasing the input bit-width N, all points of graph can be obtained using Eq.(8) and (9). Next, applying the shifting principles using Eq.(6) and (10), two graphs representing the minimum and the maximum shifting regions are produced. These shifting regions, along with the non-shifting region, create maximum and minimum voltage differences at the inputs of comparators. Thus, we have the following equations for the maximum-weight and minimum-weight voltage differences, respectively: Vdiffmax ¼maxfjVHx1  VHy1 j; jVHx2  VHy2 jg ¼ Vddð12 þ K2 Þ NþK ð17Þ Vdiffmin ¼minfjVHx1  VHy1 j; jVHx2  VHy2 jg ¼ Vddð12  K2 Þ NþK ð18Þ It might be noted that Vdiffmax of (17) realizes large margin for comparator resolution, while Vdiffmin of (18) realizes small margin for comparator resolution. To summarize, the graph in Fig. 5(b) shows the linearity of input–output characteristics, where the inputs are bits of Hx and Hy and the output is the accumulated voltage at the comparator input. This voltage difference determines the resolution of comparator operations against Hamming bits difference, which can be degraded due to nonlinear factors such as capacitance and switch mismatches. Units in Units in Vdd/(N+K) Vdd/(N+K) (N/2+K/2+N/2) (N/2+K/2+N/2) Hamming weights without Gain Error applying the shift procedure Hamming weights curve Maximum shifting region (7/2+K/2+N/2) (7/2+K/2+N/2) Minimum shifting region (3+K/2+N/2) (3+K/2+N/2) Hamming Weights (WHi) Hamming Weights (WHi) Offset Error (5/2+K/2+N/2) (2+K/2+N/2) (3/2+K/2+N/2) Maximum weight difference between WH5 and WH4 (5/2+K/2+N/2) Minimum weight difference (2+K/2+N/2) between WH5 and WH4 (3/2+K/2+N/2) Error due to minimum shifting (1+K/2+N/2) (1+K/2+N/2) (1/2+K/2+N/2) (1/2+K/2+N/2) K/2+N/2 K/2+N/2 Hx,Hy 0 1 2 3 4 5 6 7 Hamming Bits (a) 8 9 10 N Hx,Hy 0 1 2 3 4 5 6 7 8 9 10 N Hamming Bits (b) Fig. 5 Graphic illustration of linearity shifting principles .a Impact of linear error,b maximum and minimum resolution regions and impact of non-linear error 123 Author's personal copy Analog Integr Circ Sig Process (2013) 75:417–434 425 For N-bit inputs, the proposed Hamming circuit entails a total capacitance of CT ¼ 4ðN þ K ÞC0 ð19Þ given that we have two branches for each comparator structure as shown in Fig. 3. The outer plate of each capacitor is connected to either Vdd or Vdd/2, while the inner plate is connected to Vdd or the input terminal of the comparator. Such a configuration, is commonly known to reduce the noise injected from the substrate into the circuit [27–29], since the outer plates of the capacitors are never connected to input terminal of the comparator or to substrate voltage. Furthermore, the geometry of the differential Hamming switched-capacitor array reduces the effects of possible errors resulting from mismatches of the capacitances. Assuming the switched-capacitor array to exhibit a linear gradient, that is, a linear variation in changes of Tox from one end to the other, the linear mismatches of C0 becomes C0 ! ðC0 þ DC0 Þ; 2C0 ! ðC0 þ 2DC0 Þ; . . .; NC0 ! ðC0 þ NDC0 Þ ð20Þ where DC0 is the linear gradient error associated with capacitance value in our Hamming switched-capacitor array. Now, using Eq. (18) for the minimum resolution at the input of the comparator with linear gradient of mismatches for our switched-capacitor array, we have: Vdiffminc ¼ K ðC0 þ DC0 ÞVdd=2 C0 ðN þ K Þ þ N ðN  1ÞDC0 þ KDC0 ð21Þ where KDC0 is the error associated with the shift capacitance value. Taking 0.2 % of C0 to be a typical worst-case approximate mismatch for DC0 [27, 28], Fig. 6 shows Vdiffmin of (18) and Vdiffminc of Eq. (21) against the Hamming size N. The result is very promising, because it shows the minimum difference resolution voltage (18) Vdiffmin = 5.8 mV (at N = 64, K = 0.5, C0 = 0.1 pF, and Vdd = 1.5 V), while the minimum difference including mismatches linear capacitance error Eq. (21) is 5.2 mV. Thus, the mismatches degrade the resolution from 5.8 to 5.2 mV. Most recent comparator designs [19–21, 24, 25] can promptly respond to differences as small as a couple of millivolts, so the operation of our circuit should be robust, even with linear gradient error due to capacitance mismatches. See the appendix for simulation-based results. As further evidence of robustness, the error between Eqs. (18) and (21) is depicted in Fig. 7. The near independence with regard to increases in input width is due to a differential uniform cancellation structure. Non-linearity error, resulting from variance in capacitance non-linearities, degrades the minimum resolution margin by 10.34 % at the input of comparators. The capacitance mismatch error can be Fig. 6 Minimum resolution at the input of comparator against the Hamming input width N. The graph using the symbol ‘‘-’’ shows the ideal difference Vdiffmin modeled by (18), while the graph using the symbol ‘‘o’’ represents the non-ideal difference Vdiffminc given by Eq.(21) further analyzed by considering a random difference, in lieu of linear difference, between the two differential arrays. The latter error is usually reduced by using layout techniques, as discussed in [19, 30–33]. On the other hand, the non-linearity error of linear gradient capacitance mismatches shown in Eq. (21) has an insignificant impact on the accuracy of Vdiffmax, since the resolution margin at the comparator input is much greater than Vdiffmin. Hence, we elaborate only on the impact of nonlinearity factors on Vdiffmin margin. The maximum speed of the design is constrained by the four events comprising one system clock cycle, as discussed Fig. 7 The resolution error magnitude between ideal and non-ideal comparator input, where the non-linearity in the non-ideal case is due to linear gradient capacitance mismatches 123 Author's personal copy 426 Analog Integr Circ Sig Process (2013) 75:417–434 in Sect. 3, independent of the size of the input vectors. The worst-case settling time is associated with the first event, when the input branches of the comparators are charged to VHm = Vdd. These input branches are connected to all inner capacitor plates, where the voltage must rise from 0 V to VHm = Vdd within a quarter of system clock cycle, and then, discharge to Vdd/2 during the beginning of second event. We assume that signal rising and falling times are equalized by tweaking the magnitude of RTG. Consequently, using the derived transmission-gate transfer time in [34], which is limited by the time constant, we have: " # N 1 X se1charge ¼ se1discharge  RTG Cp þ KC0 þ C0 j¼0 ð22Þ where RTG is the resistance associated with the CMOS switch SHm, derived in [34], Cp accounts for gate and diffusion capacitances of SHm, KC0 is the shifting capacitance value, and C0 is the unit charge capacitor for each bit position of the N-bit input vectors. The subscripts ‘‘e1charge’’ and ‘‘e1-discharge’’ refer to first event for charging and discharging of total capacitor voltage. Hence, the conversion cycle during this first event is limited by the time constant of this event. We next endeavor to demonstrate that events 2–4 have much shorter settling times, making Te1 the determining factor in setting the clock speed. The settling time during event 2 is related to switching off the inner plates in the capacitor array from VHm and transferring Vdd/2 from the outer plate to inner plate of each capacitor in the array. This settling time is fairly short, since it pertains to individual capacitance values, rather than the whole line. During event 3, the settling time pertains to charging each unit capacitor individually with Vdd instead of Vdd/2, based on whether the associated bit of the corresponding input vector is asserted. This settling time is also fairly short, as it pertains to individual capacitors. The final event 4 is constrained by the comparator’s decision time of the sense amplifier and registering of the output results in D latches which considered short [19–21, 24, 25, 35]. As a result, the circuit’s total transfer time, which is limited by the time constant in (22), is given by:  Tp ¼ 4 se1charge þ se1discharge : ð23Þ Finally, the maximum switching frequency of the proposed design is (pessimistically): fm  1 : Tp ð24Þ The factor 4 in Eq. (23) reflects the fact that Eq. (22) shows only the transfer time of event 1. The pessimistic nature of Eq. (24) is due to our assuming each of the simpler events 2–4 also requiring as much time as event 1. 123 5 Simulation results We next present results for HSPICE [36] simulations of the proposed design, utilizing 0.15 lm TSMC technology [26]. All simulations were carried out for a typical technology corner: power supply at Vdd = 1.5 V and temperature at 25 C. The input vector width is chosen to be N = 16, representing the most common applications. Additionally, we take the unit capacitance value to be 0.1 lF, as is the case for most common 0.15 lm technology, and assume the sub-optimal shifting ratio factor of K = 1/4 as a challenging marginal operating point (half of the optimal value Kopt = 1/2). The HSPICE simulations where conducted to verify the functionality of the design based on our derived mathematical equations and to measure the maximum operating frequency, maximum resolution error, and maximum power dissipation. The excellent agreement between simulation results and those derived from analytic formulas verifies the accuracy of the latter, leading us to the conclusion that the circuit will behave as expected for wider inputs. Figure 8 shows the simulated waveform for our Hamming circuit’s operation at a system clock frequency of 1 GHz, when the weight of the input Hx sweeps from 0 to 16 (bits are sequentially asserted), while the weight of the input Hy is maintained at 16 (all bits are asserted). The first signal in the presented waveform (PHC) is related to symbol of Fig. 4, which illustrates the comparison decision (event 4), while the second and third waveforms show the input line voltages at the first and second comparator: V(Hx1), V(Hy1), V(Hx2), and V(Hy2), which are symbolized as VHx1, VHy1, VHx2, and VHy2, respectively. The fourth and fifth signal waveforms correspond to the comparator decision output for each comparison event (i.e. event 4). The signals evaluates an outcome of V(outp1) = 0 and V(outp2) = 0 (i.e. Voutp1 and Voutp2, respectively), indicating that WHx \ WHy since Hy is kept at 16 while Hx is sequentially increased. Moreover, when the simulation ends, we should have Voutp1 = 0 and Voutp2 = Vdd, indicating that the inputs have equal Hamming weights. Likewise, Fig. 9 shows complementary results to Fig. 8, in that it corresponds to a simulation where the Hamming weight of Hx is kept constant at 16 while the Hamming weight of Hy is swept from 0 to 16. Hence, the comparator outputs of Fig. 9 show a value of V(outp1) = Vdd and V(outp2) = Vdd, indicating that WHy \ WHx. Again, at the end of the simulation, Voutp1 = 0 and Voutp2 = Vdd, indicating the equality of Hamming weights. The simulation results of Fig. 10 were derived to examine the worst-case scenario for charging and discharging the switched-capacitor array represented by the comparator inputs: V(Hx1), V(Hy1), V(Hx2), and V(Hy2). To Author's personal copy Analog Integr Circ Sig Process (2013) 75:417–434 427 Fig. 8 HSPICE waveforms for input Hy kept at weight 16, while the Hamming weight of the input Hx is sequentially increased. The horizontal time scale is in nanoseconds and the vertical output scale is in volts (CLK = 1 GHz, Vdd = 1.5 V), using TSMC 0.15 lm technology induce the worst-case, we alternate between WHx = 0, WHy = 16 and WHx = 16, WHy = 0 during consecutive clock periods. Hence, all capacitors at one end of the comparator are charged, while all capacitors at the other end are discharged simultaneously. We note that for this worst-case, the comparator outputs are safely responsive at 1 GHz operating speed, given that all events within one cycle are correctly evaluated. Figure 11 depicts a useful way of indicating the relationship between the Hamming weight differences and analog voltage differences at the inputs of the comparators, as illustrated in Sect. 3 via Fig. 5b. This measure shows the linearity curve within minimum region difference, commonly known as the input–output transfer characteristics. The results show close-to-ideal transfer characteristics, where the deviation of measured in Fig. 12 is about ± 0.08 mV. This error reduces the minimum resolution of comparator inputs given by Eq. (18) from 22.7 to 22.62 mV for N = 16 bits. Generally, the impact of this error is negligible for small input vector widths. Finally, the power consumption results have been obtained for the proposed circuit running at 1 GHz, assuming unit capacitance value of 0.1 pF and Hamming width N = 16, with the shifting factor K = 1/4. The simulation results report power consumption of 0.136 mW, where all bits are taken to be different in a worst-case scenario. The two comparators are considered major sources of power consumption due to the bias circuit that requires a constant biasing current. Power consumption from this source can be minimized by using the design in [24, 37, 38]. Furthermore, using a bias on–off switch circuit will help in reducing the overall power consumption. Each of two comparators consumes an average of 0.5 mW of power, which is not included in the recorded power consumption. Thus, the total power consumption is 1.136 mW, when the two comparators are included. 6 Comparative evaluations We now endeavor to analyze the area (in number of transistors), operating speed, and power requirements of our proposed Hamming weight comparison circuit, ending the 123 Author's personal copy 428 Analog Integr Circ Sig Process (2013) 75:417–434 Fig. 9 HSPICE waveforms for Hamming weight of Hx kept at 16, while the Hamming weight of Hy is sequentially increased. The horizontal time scale is in nanoseconds and the vertical output scale is in volts (CLK = 1 GHz, Vdd = 1.5 V), using TSMC 0.15 lm technology section by a comparison of our novel design method with a number of recent alternative designs [12–18]. 6.1 Area assessment To derive the area requirements, we first calculate the total number of transistors for the N-bit Hamming weight circuit (Table 3). The transistor counts for CMOS switches are self-explanatory. The transistor count of COMP1 and COMP2, with Mealy phase generators, equals 2*(60) ? 58 = 178, independent of input size. We did not count the registers (D-Type Flip-Flops) that store the input bits, because these registers already exist on the data bus that provides the inputs. These registers are common to all designs, thus not affecting our comparisons. Furthermore, large capacitances are usually made of identical unit capacitors, so as to minimize the ratio mismatches [30–33]. For this reason, common-centroid geometries and inter-digitization are of particular importance in reducing mismatches for high accuracy, as they help reduce the effects of gradients and random fabrication errors [27–29]. This fabrication error is usually due to 123 photolithography, oxide thickness, and dielectric constant, plus expected voltage and temperature variations. Given the importance of achieving highly precise layout and due to our design’s use of uniform capacitances, our differential Hamming switched-capacitor arrays are arranged with common-centroid geometries. Figure 13 shows an example of centroid capacitance layout for an input width of 9 bits, where the structure requires 18 uniform C0 capacitors on each switched-capacitance array in our two-array structure. Each capacitor is split into two equal parts of capacity 0.5 C0, marked by the letters ‘‘a’’ and ‘‘b’’ in Fig. 13, where minus and plus signs denote the capacitors associated with inputs Hy and Hx, respectively. This geometry provides a good cancelation of random variations across the array through an assurance of center gradient in the fabrication process in all binary combinations. Besides, the proposed layout minimizes the fringing capacitances between the two branches of comparator inputs. Based on this geometry, the total area is depicted in Table 4 for some input bitwidths, assuming the use a uniform capacitance value of 0.1 lF for our 0.15 lm technology with 2 poly and 3 metal layers. Author's personal copy Analog Integr Circ Sig Process (2013) 75:417–434 429 Fig. 10 HSPICE waveforms for Hamming inputs Hy and Hx cross-toggling between maximum-weight of 16 (all bits at 1) and minimum-weight of 0 (all bits at 0) on alternating clock cycles. The horizontal time scale is in nanoseconds and the vertical output scale is in volts (CLK = 1 GHz, Vdd = 1.5 V), using TSMC 0.15 lm technology Fig. 11 HSPICE extracted simulation for input–output transfer characteristics with respect to ideal curve presented in Eq. (18), using TSMC 0.15 lm technology at CLK = 1 GHz, Vdd = 1.5 V, K = 1/4, N = 16, and C0 = 0.1 pF Fig. 12 HSPICE extracted simulation for non-linear error and its impact on comparators resolution decision, using TSMC 0.15 lm technology at a clock frequency of 1 GHz, Vdd = 1.5 V, K = 1/4, N = 16, and C0 = 0.1 pF 123 Author's personal copy 430 Analog Integr Circ Sig Process (2013) 75:417–434 Table 3 Number of transistors in the proposed Hamming circuit with various vector widths Input width in bits (N) Transistor counts Four sets of CMOS switches Comparators and phase generator 16 4 * (14 * 16) 178 1,074 32 4 * (14 * 32) 178 1,970 64 4 * (14 * 64) 178 3,762 128 4 * (14 * 128) 178 7,346 256 4 * (14 * 256) 178 14,514 +7b +8b +9b -9b -8b -7b +4b +5b +6b -6b -5b -4b +3b +2b +1b -1b -2a -3b -3a -2a -1a +1a +2a +3a -4a -5a -6a +6a +5a +4a Total transistors -7a -8a -9a +9a +8a +7a Fig. 14 Hamming weight decision delay vs. Hamming input bitwidth under worst-case simulations depicted in Fig. 10 Fig. 13 Example of a centroid capacitance layout Table 4 Total capacitor layout area for various input bit-widths Total (pF)2 Input width in bits (N) Switched-capacitor area for C0 = 0.2 pF (Hx1, Hy1) ? (Hx2, Hy2) 16 2 * (8 * 8)(0.05 * 0.05) 0.32 32 2 * (11.5 * 11.5)(0.05 * 0.05) 0.66 64 2 * (16 * 16)(0.05 * 0.05) 1.28 128 2 * (23 * 23)(0.05 * 0.05) 2.65 256 2 * (32 * 32)(0.05 * 0.05) 5.12 input lines to Vdd (VHm) also goes up. Figure 14 shows a linear relation between delay and input bit width up to 64 bits. The relationship starts exhibiting non-linearity for wider inputs due to large second-order parasitic effects with respect to C0. The second-order effects are mostly due to transmission-gate resistances (RTG) and parasitic capacitances (e.g. diffusion, gate, wire) of CMOS switches, which vary in a non-linear manner with voltage. More detailed analysis on typical switched-capacitor array parasitic effects can be found elsewhere [22]. 6.3 Power characteristics Our proposed Hamming comparator circuit exploits the uniform weighted switched-capacitor array and CMOS 6.2 Speed assessment In this section, we vary the input Hamming weights by taking a worst difference between the two input bit-vectors in each particular Hamming weight class and measure the HSPICE simulated worst-case delay. This worst-case delay is measured from rising edge of first event to evaluated comparison output data (Voutp1, Voutp2) during the fourth event. In addition, every simulation case was toggled based on Fig. 10 in order to ensure that the time constants are accounted for with enough accuracy (3s ? 5s) within each event. This delay is approximated by our derived Eq. (24), where the critical path length was taken to be four times the delay of the first event, which was shown to be a worst-case time delay event. Note that whenever N is increased, the capacitances on the input comparators are also increased, based on Eq. (22), with the result that the first event’s delay for charging these 123 Fig. 15 Power consumption vs. varying Hamming input widths, when running at 0.2 GHz system clock Author's personal copy Analog Integr Circ Sig Process (2013) 75:417–434 431 Table 5 Cost, latency, and power consumption for Hamming weight comparator circuits Circuits (N Bits) Tech/supply voltage Area (Transistors) Power Operating speed Proposed N = 16 0.15 lm/1.5 V 1074 Trans. ? 1.28 pF2 1.136 mW @ 1 GHz 1 ns 4[(2)RTG*(Csdg ? N*C0)] [1] N = 8 0.35 lm/3.5 V 324 Trans. 0.44 mW @ 200 MHz 4.58 ns [4] N = 5 – (41 gates) = 246 Trans. O(N log2N) (LG) – 7-gate levels O(log2N) (gate levels) [5] N = 16 AMI 0.8 lm (256 gates) = 1536 Trans. [N X N] *(LG) – 1.3 ns/level N-gate levels – ½2ðN  log2 N  1ÞFA þ 4 log2 NðLGÞ – ðlog2 N  1ÞðFA gate levelÞ þ 1 [3] Size = N 2 The parameters pF , FA, N, and LG are defined as picofarad area of switched-capacitor array, full-adder area, Hamming input bit-width, and logic gate area, respectively switches, where the input voltage to switched-capacitor array is limited to DC constant voltages (Vdd, Vdd/2). Besides, all switching activities entail single unit capacitors associated with input bits, the only exception being during the first event, when all capacitors are charged to Vdd. Thus, the switched-capacitor array usually limits switching activity to small numbers of devices, leading to low power dissipation [37]. Analyses in [38, 39] show that circuits based on the switched-capacitor array paradigm dissipate small amounts of power, a fact that is readily confirmed by the results of our HSPICE simulation for assessing power consumption, as depicted in Fig. 15. Power consumption results have been obtained for the proposed circuit running at a lower frequency of 200 MHz, in order to accommodate all functional operations for different input widths ranging from 16 to 64 bits, with unit capacitance of 0.1 pF and shifting factor K = 1/2. In each case, all bits are taken to be different as a worst-case scenario. The 0.5 mW average power consumption of each comparator is not included in the recorded power consumption of Fig. 15. We have not discussed the structures associated with the comparator circuitry, since the literature is rich with many such designs. The scope of our paper is to present the architecture of a Hamming comparator circuit, rather than basic comparator design. 6.4 Comparison with previous designs Table 5 contains a comparison of our Hamming comparator circuit against several state-of-the-art Hamming comparator designs, whose structures represent recent design topologies and circuits targeted for high-speed operation and power savings, that is, objectives similar to those of our proposed architecture. The symbols FA and LG are used as a unit for full-adder and logic gate counts, where the FA requires between 3 and 9 logic gates. A gate is usually built from 4–6 transistors, although some recent ones [40] require only one logic gate with several capacitors. In terms of the number of transistors used, our design is intermediate among those compared, because it requires two switching capacitor array circuits for each input bit, with each switching circuit composed of four pass-gate multiplexors and four inverters. Bear in mind, however, that the components in our design all use local interconnects, with no global routing. The latter property, combined with the small geometry of capacitance array, might result in a smaller overall silicon area. Furthermore, the proposed design is extremely power-efficient, since all activities of the operating frequency are introduced through the Hamming bit capacitor value C0, which is known to be less than diffusion and gate capacitances of cascaded logic gates. Nevertheless, designers must be aware that the comparator circuit requires standby currents on the order of 0.5 mA, which potentially necessitates the use a switched on–off circuit to shut off the power during standby mode. Another significant advantage of our design is its high operating speed for a wide range of Hamming input bitwidths (N). The high-speed of our design, which places it among the fastest Hamming weight comparators reported in the literature, results from parallel operation in charging and discharging of the capacitors C0 and the concurrent manifestation of the total weight in each input bit-vector, independent of the order of bits. By contrast, competing designs take the order of bits into account, leading to synchronous realizations. 7 Conclusions We have presented a Hamming weight comparator circuit architecture for input bit-vectors using switched- 123 Author's personal copy 432 capacitor arrays, where all capacitors have a uniform weight independent of the order of bits. The circuit comprises of two switched-capacitor arrays associated with two comparators, collectively realizing all comparison functions, that is, ‘‘[’’, ‘‘\’’, and ‘‘=’’, between the two input bit-vectors. The circuit operation involves a shifting factor applied differently to each input by varying the input voltage between Vdd and Vdd/2, instead of through changing the capacitance value in the two inputs of the comparator. Thus, the two comparator inputs have a uniform weight of capacitances, which reduces layout-induced mismatches. Theoretically, the circuit input/output characteristics are linearly related to the accumulated voltage weight of comparators across the Hamming vector width (N). Consequently, the minimum voltage margin at comparator inputs is reduced as N increases, and power supply voltage requirement is also reduced. Non-linear phenomena, such as capacitance and switch mismatches, can potentially upset this theoretical assessment, but HSPICE simulation shows that the effects of non-linearities are not problematic in practical contexts. For example, assuming 0.15 lm TSMC technology, with K = 1/4, Vdd = 1.5 V, and N = 16 bits, non-linearities reduce the minimum resolution margin from 22.7 to 22.3 mV, thus barely affecting the wide margin which is needed for reliable comparator operation. Consistent with this analysis, the measure of minimum difference region (i.e. resolution margin), assessed for several Hamming input vector bit-widths, shows that for N = 64, K = 1/2, Vdd = 1.5 V, and with the same technology, non-linearities reduce the margin from 5.8 to 5.2 mV. Most recent comparator designs can operate reliably under this typical margin. The operation of the Hamming switched-capacitor circuit is realized in terms of four events, regardless of the input width. The events occur sequentially, with each event proceeding concurrently in the two switched-capacitor arrays. Consequently, the circuit function entails a relatively low latency, not exceeding four times the duration of the first (longest) event. This leads to very competitive latency values and associated clock rates. Extensive HSPICE simulation results indicate that the circuit can run at 1 GHz for N = 16, while consuming a power of about 1.136 mW for the shifting factor K = 1/4. For N = 64 and K = 1/2, it can run at 200 MHz. In terms of area, our results for comparing the Hamming weights of two 16 bit input vectors reveal a cost of 1,074 transistors and switched-capacitor array of 0.32 pF2. Finally, our proposed design is self-contained and does not require any additional (external) decoding to produce the final output. 123 Analog Integr Circ Sig Process (2013) 75:417–434 Appendix In this appendix, we present the results of simulation-based verification of the theoretical analyses of Sect. 4 with regard to the negligible impact of capacitance mismatches on the physical attributes and performance of our Hamming comparator. Assuming 0.15 lm TSMC process with the linear oxide gradient of a = 60 ppm, oxide thickness of 15 nm, capacitor cell width of W = 5 lm, horizontal/ vertical spacing of Sx = Sy = 0.5 lm (see Fig. 16), angle h of 45 and 315 degrees, and unit capacitance of 0.5 pF, variations of capacitance value as a function of bit-width is depicted in Fig. 17. Instantiating capacitances whose variations are according to Fig. 17 into the design of our proposed Hamming comparator circuit and conducting HSPICE simulations on several technology corners, we confirmed that for bit-widths of 16, 32, and 64, the circuit behaves within acceptable tolerances in the worst-case involving Vdd = 1.35 V (15 % drop), temperature of 100 C, and the slow-fast or fast-slow technology corner. Minimum voltage differences of 11.1, 6.3, and 0.98 mV are applicable to 16, 32, and 64 bit comparators, respectively. Waveforms observed in one of the simulation runs, corresponding to the bit-width of 16, are depicted in Fig. 18. Results for the 16 bit case are shown, because they are most legible after reduction to a suitably CX5 CY5 CY4 CX1 CX0 SY CX4 SX CY3 CX3 45° CX2 CY2 W CY1 CY0 Fig. 16 Worst-case linear gradient switched-capacitor arrays for layout model arrangement Fig. 17 Variation in unit capacitance value for different comparator bit-widths based on linear gradient layout arrangement Author's personal copy Analog Integr Circ Sig Process (2013) 75:417–434 433 Fig. 18 HSPICE waveforms for Hy kept at weight 16, while the Hamming weight of the other input Hx is gradually increased up to 16. The horizontal axis shows the time scale in nanoseconds and the vertical output voltage scale is in volts. Simulation was done using TSMC 0.15 lm technology, under 1 GHz clock, Vdd = 1.35 V, Temp = 100 C, for the slowfast technology corner small size for publication. Full conformance to the results expected based on our theoretical analyses was observed in all simulated cases. References 1. Aleksander, I., & Morton, H. (1995). An introduction to neural computing, 2nd (pp. 73–94). Oxford: International Thomson Computer Press. 2. Parhami, B. (1991). Voting networks. IEEE Transactions on Reliability, 40(3), 380–394. 3. Kar, B. K., & Pradhan, D. K. (1993). A new algorithm for order statistic and sorting. IEEE Transactions on Signal Processing, 41(8), 2688–2694. 4. Chen, K. (1989). Bit-serial realizations of a class of non-linear filters based on positive boolean functions. IEEE Transactions on Circuits and Systems, 36(6), 785–794. 5. Karaman, M., Onural, L., & Atalar, A. (1990). Design and implementation of a general purpose median filter unit in CMOS VLSI. IEEE Journal of Solid-State Circuits, 25(2), 505–513. 6. Asada, K., Kumatsu, S., & Ikeda. M. (1999). Associative memory with minimum Hamming distance detector and its application to bus data encoding. Proc IEEE Asia-Pacific Application-Specific Integrated Circuits Conf, p. 16.1. 7. Barral, C., Coron, J. -S., & Naccache, D. (2004). Externalized Fingerprint Matching. Proc. 1st Int’l Conf. Biometric Authentication, 309–315. 8. Abdel-hafeez, S., Harb, S. M., Lee, K. M. On-Chip Jitter Measurement Architecture Using a Delay-Locked Loop with Vernier Delay Line, to the Order of Giga Hertz. Proc. 18th Int’l Conf. Mixed Design on Integrated Circuits and Systems, pp. 502-506, 2011. 9. Piestrak, S. J. Design of Self-Testing Checkers for Unidirectional Error Detecting Codes. Scientific Papers of the Cybernetics Institute of Wroclaw University of Technology, No. 92, Series: Monographs No. 24, 1995. 10. Ullman, S., (1996). High-level vision: object recognition and visual cognition (p. 5). Cambridge: MIT Press. 11. Komatsu, S., Ikeda, M., Asada, K. Bus Data Encoding with Coupling-Driven Adaptive Code-Book Method for Low Power 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. Data Transmission. Proc. 27th European Solid-State Circuits Conf., 2001, pp. 297-300. King, D. B. S., Simpson, R. J., Moore, C., & MacDiarmid, I. P. (1998). Digital n-tuple Hamming comparator for weightless systems. Electronics Letters, 34(22), 2103–2104. King, D. B. S., Simpson, R. J., Moore, R. J., & MacDiarmid, I. P. (1999). Hamming value comparator hierarchies. Electronics Letters, 35(11), 910–911. Ikeda, M., & Asada, K. ‘‘Time-Domain Minimum-Distance Detector and Its Application to Low Power Coding Scheme on Chip Interface,’’ Proc. 24th European Solid-State Circuits Conf., pp. 464-467, 1998. Fujino, M., & Moshnyaga, V. G. An Efficient Hamming Distance Comparator for Low-Power Applications. Proc. 9th Int’l Conf. Electronics, Circuits and Systems, Vol. 2, pp. 641-644, 2002. Pedroni, V. A. (2003). Compact fixed-threshold and two-vector Hamming comparators. Electronics Letters, 39(24), 1705–1706. Piestrak, S. J. (2007). Efficient Hamming weight comparators of binary vectors. Electronics Letters, 43(11), 611–612. Parhami, B. (2009). Efficient Hamming weight comparators for binary vectors based on accumulative and Up/Down parallel counters. IEEE Trans. Circuits and Systems II, 56(2), 167–171. Abdel-hafeez, S. (2010). A New high-speed SAR ADC architecture. Proc. 11th Int’l Workshop Symbolic and Numerical Methods, Modeling and Applications to Circuit Design, pp. 1–5, 2010. Razavi, B. (1995). Principles of data conversion system design. New York: Wiley. Davidovic, M., Zach, G., & Zimmermann, H. (2010). An 11 bit successive approximation analog-to-digital converter based on a combined capacitor-resistor network. E & I Elektrotechnik und Informationstechnik, 127(4), 98–102. Abo, A. M. (1999). Design for reliability of low-voltage switched-capacitor circuits. Berkeley: University of California. Sklar, B. (1988). Digital communications fundamentals and applications. New Jersey: Prentice Hall. Long, S., Wu, J., & Xia, X. (2008). A 1.8 V 3.1 mW successive approximation ADC in system-on-chip. Analog Integrated Circuits and Signal Processing, 56, 205–211. Gregorian, R. (1999). Introduction to CMOS Op-Amps and comparators. New York: Wiley. Taiwan Semiconductor Manufacturing Corp., ‘‘0.15 lm CMOS ASIC Process Digests,’’ 2002. 123 Author's personal copy 434 27. Michaeli, L., Michalko, P., & Saliga, J. (2008). Unified ADC non-linearity error model for SAR ADC. Measurement, 41(2), 198–204. 28. Ohnhaeuser, F., & Huemer, M. Methods to Eliminate Dynamic Errors in High-Performance SAR A/D Converter. Proc. IEEE Int’l Symp. Circuits and Systems, pp. 2398–2401, 2008. 29. Masuzawa, A. (2007). Design challenges of analog-to-digital converters in Nanoscale CMOS. IEICE Transactions on Electronics, E90-C(4), 779–785. 30. Sayed, D., & Dessouky, M. Automatic Generation of CommonCentroid Arrays with Arbitrary Capacitor Ratio. Proc. Design, Automation and Test in Europe Conf., pp. 1530–1591, 2002. 31. McNutt, M. J., LeMarquis, S., & Dunkley, J. L. (1994). Systematic capacitance matching errors and corrective layout procedures. IEEE Journal of Solid-State Circuits, 29(5), 611–616. 32. Quinn, P., & Pribytko, M. (2003). Capacitor matching insensitive algorithmic ADC requiring no calibrations. Integration, the VLSI Journal, 36, 211–228. 33. Hastings, A. (2001). The art of analog layout. New Jersey : Prentice Hall. 34. Uyemura, J. P. (1999). CMOS logic circuit design. Norwell: Kluwer. 35. Razavi, B. (2001). Design of analog CMOS integrated circuits. New York: McGraw-Hill. 36. HSPICE, Synopsys, 2010, http://www.synopsys.com. 37. Belluomini, W., Jamsek, D., Nartin, A. K., McDowell, C., Montoye, R. K., Ngo, H. C., et al. (2006). Limited switch dynamic logic circuits for high-speed low-power circuit design. IBM J. Research & Development, 50(2/3), 277–286. 38. Agens, A., Bonizzoni, E., Malcovati, P., & Maloberti, F. (2010). An ultra-low power successive approximation A/D converter with time-domain comparator. Analog Integrated Circuits and Signal Processing, 64(2), 183–190. 39. Dabbagh-Sadeghipour, K., Hadidi, K., & Khoei, A. (2006). A New successive approximation architecture for high-speed low-power ADCs. Int’l Journal of Electronics and Communications, 60(3), 217–223. 40. Navi, K., Maeen, M., Foroutan, V., Timarchi, S., & Kavehei, O. (2009). A novel low-power full-adder cell for low voltage. Integration, the VLSI Journal, 42(4), 457–467. 123 Analog Integr Circ Sig Process (2013) 75:417–434 Saleh Abdel-hafeez received a PhD in computer engineering from the University of Texas at El Paso and MS from New Mexico State University. He was a senior member of technical staff at S3 Inc. and Viatechnologies.com at the area of mixed-signal IC design. He also was Adjunct Professor of computer engineering at Santa Clara University from 1998 to 2002. He has two US patents, numbered 6,265,509 and 6,356,509 with S3 Inc. His current research interests are in the areas of high-speed ICs, computer arithmetic algorithms, and mixed-signal design. Dr. Abdel-hafeez is currently the Chairman of Computer Engineering Department at Jordan University of Science and Technology. Behrooz Parhami (PhD, 1973, University of California, Los Angeles) is Professor of Electrical and Computer Engineering, and Associate Dean for Academic Personnel, College of Engineering, at University of California, Santa Barbara, where he teaches and does research in computer arithmetic, parallel processing, and dependable computing. A Fellow of IEEE, IET, and British Computer Society, and recipient of several other awards (including a most-cited paper award from J. Parallel & Distributed Computing), he has written six textbooks and more than 270 peer-reviewed technical papers. Professionally, he serves on journal editorial boards and conference program committees and is also active in technical consulting.

References (40)

  1. Aleksander, I., & Morton, H. (1995). An introduction to neural computing, 2nd (pp. 73-94). Oxford: International Thomson Computer Press.
  2. Parhami, B. (1991). Voting networks. IEEE Transactions on Reliability, 40(3), 380-394.
  3. Kar, B. K., & Pradhan, D. K. (1993). A new algorithm for order statistic and sorting. IEEE Transactions on Signal Processing, 41(8), 2688-2694.
  4. Chen, K. (1989). Bit-serial realizations of a class of non-linear filters based on positive boolean functions. IEEE Transactions on Circuits and Systems, 36(6), 785-794.
  5. Karaman, M., Onural, L., & Atalar, A. (1990). Design and implementation of a general purpose median filter unit in CMOS VLSI. IEEE Journal of Solid-State Circuits, 25(2), 505-513.
  6. Asada, K., Kumatsu, S., & Ikeda. M. (1999). Associative memory with minimum Hamming distance detector and its application to bus data encoding. Proc IEEE Asia-Pacific Application-Specific Integrated Circuits Conf, p. 16.1.
  7. Barral, C., Coron, J. -S., & Naccache, D. (2004). Externalized Fingerprint Matching. Proc. 1st Int'l Conf. Biometric Authenti- cation, 309-315.
  8. Abdel-hafeez, S., Harb, S. M., Lee, K. M. On-Chip Jitter Mea- surement Architecture Using a Delay-Locked Loop with Vernier Delay Line, to the Order of Giga Hertz. Proc. 18th Int'l Conf. Mixed Design on Integrated Circuits and Systems, pp. 502-506, 2011.
  9. Piestrak, S. J. Design of Self-Testing Checkers for Unidirectional Error Detecting Codes. Scientific Papers of the Cybernetics Institute of Wroclaw University of Technology, No. 92, Series: Monographs No. 24, 1995.
  10. Ullman, S., (1996). High-level vision: object recognition and visual cognition (p. 5). Cambridge: MIT Press.
  11. Komatsu, S., Ikeda, M., Asada, K. Bus Data Encoding with Coupling-Driven Adaptive Code-Book Method for Low Power Data Transmission. Proc. 27th European Solid-State Circuits Conf., 2001, pp. 297-300.
  12. King, D. B. S., Simpson, R. J., Moore, C., & MacDiarmid, I. P. (1998). Digital n-tuple Hamming comparator for weightless systems. Electronics Letters, 34(22), 2103-2104.
  13. King, D. B. S., Simpson, R. J., Moore, R. J., & MacDiarmid, I. P. (1999). Hamming value comparator hierarchies. Electronics Letters, 35(11), 910-911.
  14. Ikeda, M., & Asada, K. ''Time-Domain Minimum-Distance Detector and Its Application to Low Power Coding Scheme on Chip Interface,'' Proc. 24th European Solid-State Circuits Conf., pp. 464-467, 1998.
  15. Fujino, M., & Moshnyaga, V. G. An Efficient Hamming Distance Comparator for Low-Power Applications. Proc. 9th Int'l Conf. Electronics, Circuits and Systems, Vol. 2, pp. 641-644, 2002.
  16. Pedroni, V. A. (2003). Compact fixed-threshold and two-vector Hamming comparators. Electronics Letters, 39(24), 1705-1706.
  17. Piestrak, S. J. (2007). Efficient Hamming weight comparators of binary vectors. Electronics Letters, 43(11), 611-612.
  18. Parhami, B. (2009). Efficient Hamming weight comparators for binary vectors based on accumulative and Up/Down parallel counters. IEEE Trans. Circuits and Systems II, 56(2), 167-171.
  19. Abdel-hafeez, S. (2010). A New high-speed SAR ADC architec- ture. Proc. 11th Int'l Workshop Symbolic and Numerical Methods, Modeling and Applications to Circuit Design, pp. 1-5, 2010.
  20. Razavi, B. (1995). Principles of data conversion system design. New York: Wiley.
  21. Davidovic, M., Zach, G., & Zimmermann, H. (2010). An 11 bit successive approximation analog-to-digital converter based on a combined capacitor-resistor network. E & I Elektrotechnik und Informationstechnik, 127(4), 98-102.
  22. Abo, A. M. (1999). Design for reliability of low-voltage swit- ched-capacitor circuits. Berkeley: University of California.
  23. Sklar, B. (1988). Digital communications fundamentals and applications. New Jersey: Prentice Hall.
  24. Long, S., Wu, J., & Xia, X. (2008). A 1.8 V 3.1 mW successive approximation ADC in system-on-chip. Analog Integrated Cir- cuits and Signal Processing, 56, 205-211.
  25. Gregorian, R. (1999). Introduction to CMOS Op-Amps and comparators. New York: Wiley.
  26. Taiwan Semiconductor Manufacturing Corp., ''0.15 lm CMOS ASIC Process Digests,'' 2002.
  27. Michaeli, L., Michalko, P., & Saliga, J. (2008). Unified ADC non-linearity error model SAR ADC. Measurement, 41(2), 198-204.
  28. Ohnhaeuser, F., & Huemer, M. Methods to Eliminate Dynamic Errors in High-Performance SAR A/D Converter. Proc. IEEE Int'l Symp. Circuits and Systems, pp. 2398-2401, 2008.
  29. Masuzawa, A. (2007). Design challenges of analog-to-digital converters in Nanoscale CMOS. IEICE Transactions on Elec- tronics, E90-C(4), 779-785.
  30. Sayed, D., & Dessouky, M. Automatic Generation of Common- Centroid Arrays with Arbitrary Capacitor Ratio. Proc. Design, Automation and Test in Europe Conf., pp. 1530-1591, 2002.
  31. McNutt, M. J., LeMarquis, S., & Dunkley, J. L. (1994). Sys- tematic capacitance matching errors and corrective layout pro- cedures. IEEE Journal of Solid-State Circuits, 29(5), 611-616.
  32. Quinn, P., & Pribytko, M. (2003). Capacitor matching insensitive algorithmic ADC requiring no calibrations. Integration, the VLSI Journal, 36, 211-228.
  33. Hastings, A. (2001). The art of analog layout. New Jersey : Prentice Hall.
  34. Uyemura, J. P. (1999). CMOS logic circuit design. Norwell: Kluwer.
  35. Razavi, B. (2001). Design of analog CMOS integrated circuits. New York: McGraw-Hill.
  36. HSPICE, Synopsys, 2010, http://www.synopsys.com.
  37. Belluomini, W., Jamsek, D., Nartin, A. K., McDowell, C., Montoye, R. K., Ngo, H. C., et al. (2006). Limited switch dynamic logic circuits for high-speed low-power circuit design. IBM J. Research & Development, 50(2/3), 277-286.
  38. Agens, A., Bonizzoni, E., Malcovati, P., & Maloberti, F. (2010). An ultra-low power successive approximation A/D converter with time-domain comparator. Analog Integrated Circuits and Signal Processing, 64(2), 183-190.
  39. Dabbagh-Sadeghipour, K., Hadidi, K., & Khoei, A. (2006). A New successive approximation architecture for high-speed low-power ADCs. Int'l Journal of Electronics and Communica- tions, 60(3), 217-223.
  40. Navi, K., Maeen, M., Foroutan, V., Timarchi, S., & Kavehei, O. (2009). A novel low-power full-adder cell for low voltage. Inte- gration, the VLSI Journal, 42(4), 457-467.
About the author
Papers
1
View all papers from Sara Silvaarrow_forward