High-speed and low-power scalable
Hamming weight comparator based on a
non-weighted switched-capacitor array
Saleh Abdel-hafeez & Behrooz Parhami
Analog Integrated Circuits and Signal
Processing
An International Journal
ISSN 0925-1030
Volume 75
Number 3
Analog Integr Circ Sig Process (2013)
75:417-434
DOI 10.1007/s10470-012-9991-8
1 23
Your article is protected by copyright and all
rights are held exclusively by Springer Science
+Business Media New York. This e-offprint is
for personal use only and shall not be selfarchived in electronic repositories. If you wish
to self-archive your article, please use the
accepted manuscript version for posting on
your own website. You may further deposit
the accepted manuscript version in any
repository, provided it is only made publicly
available 12 months after official publication
or later and provided acknowledgement is
given to the original source of publication
and a link is inserted to the published article
on Springer's website. The link must be
accompanied by the following text: "The final
publication is available at link.springer.com”.
1 23
Author's personal copy
Analog Integr Circ Sig Process (2013) 75:417–434
DOI 10.1007/s10470-012-9991-8
High-speed and low-power scalable Hamming weight comparator
based on a non-weighted switched-capacitor array
Saleh Abdel-hafeez • Behrooz Parhami
Received: 8 March 2012 / Revised: 4 August 2012 / Accepted: 16 November 2012 / Published online: 7 December 2012
Springer Science+Business Media New York 2012
Abstract Many practical applications require a comparison of the Hamming weights of two N-bit binary vectors.
This comparison can be performed in a fully digital manner
or by a mix of analog and digital techniques. In this paper, we
propose a design in the latter category that exhibits advantages in speed and power dissipation compared with the best
previous designs. The proposed design comprises of two
switched-capacitor arrays associated with two comparators
placed in parallel, collectively providing a complete comparison outcome of ‘‘[’’, ‘‘\’’, or ‘‘=’’. The switchedcapacitor array circuit is composed of uniform capacitances,
thus associating identical charges with all bits, independent
of their positions in the input bit-vectors. Once charge
accumulation has occurred based on the asserted inputs, the
two comparators release the final decision concurrently.
The structure is shown to support wide input vectors on the
order of 64 bits, while requiring a small silicon area for
capacitor array structures, CMOS switches, and latched
comparators to compute and store the comparison outcome.
HSPICE simulation shows a total power consumption of
1.136 mW, evaluated at the operating frequency of 1 GHz
(based on input-to-decision delay) for 16 bit input vectors
under 0.15 lm TSMC technology.
Keywords Analog summing Comparator GHz-class
circuit Hamming weight Low-power electronic circuit
S. Abdel-hafeez
Jordan University of Science and Technology, Irbid 22110,
Jordan
e-mail:
[email protected]
B. Parhami (&)
Department of Electrical and Computer Engineering, University
of California, Santa Barbara, CA 93106-9560, USA
e-mail:
[email protected]
Mixed analog/digital design Population count
Switched-capacitor array
1 Introduction
Comparing the Hamming weights of two bit-vectors of
common length N finds applications in neural networks [1],
threshold voting circuits [2, 3], digital filtering [4, 5],
pattern matching/recognition [6–8], and encoding for data
compression [9–11]. Accordingly, many researchers and
hardware designers have proposed designs for such comparators [12–18], exploring trade offs between operating
frequency, Hamming vector size, power consumption, and
area requirements to satisfy design specifications.
An early design [12], which required eight cascaded
logic stages, was based on deriving a complete Boolean
expression for comparing weighted vectors of size 2. The
operating frequency resulting from this approach would be
unsuitable for large vector sizes. Subsequent modifications
[13] were aimed at extending the application of this method
to wider vectors. But the extension is accomplished at the
expense of a large number of comparators, which imply
unacceptably high power consumption.
Power can be saved by converting binary inputs to time
delay behavior [14, 15], for example, by assigning a variable
delay element to every bit in input bit-vectors. With this
approach, the circuit achieves its power economy at the
expense of long latency, compared with the design of [13].
A subsequent design by Pedroni [16] reduces the delay
somewhat, but the presence of N cascaded switches makes
the design too slow for wide inputs. A further speed
enhancement method [17] minimizes the number of cascaded switches by using sorting networks [9], but the multistage structure of the design would still be problematic for
123
Author's personal copy
418
Analog Integr Circ Sig Process (2013) 75:417–434
wide inputs. Parhami’s design [18] uses arithmetic operations for counting and comparing the number of 1s through
high-speed parallel up/down counters, with the vector bits
viewed as positive or negative in the process of accumulation. However, sequential clocking precludes high-speed
parallel operation, with speed more of a problem for wider
input bit-vectors.
In this paper, we present a novel architecture that uses a
mix of analog charge accumulation and digital switching of
weighted bits value. Because the charges associated with
binary digits 0 and 1 do not have a positional weighting, the
accumulating switched-capacitor array comprises of uniform capacitance values, in contrast with conventional
switched-capacitor arrays used for A/D and D/A conversion
[19–22]. Once the two input Hamming bit-vectors have been
accumulated through the differential switched-capacitor
array, differential comparators produce the desired result. A
capsule summary of the main attributes and contributions of
our design is captured in the following five points:
1.
2.
3.
4.
5.
All bits in the input bit-vectors are accumulated
concurrently, regardless of the bit-width. Thus, the
circuit is attractive for high-speed and broad-scale
applications.
Small capacitances are used in the circuit to hold
charges based on asserted bit values, in much the same
way as dynamic low-power memory, making it
suitable for low-power design.
The circuit is adaptable to variable and unmatched
vector sizes, or to comparison between an input vector
and a threshold value. A variable size can simply be
added on two sides of differential switched-capacitor
array, where empty bit positions can be replaced by 0s.
Full ternary comparison function (‘‘[’’, ‘‘\’’, and ‘‘=’’)
is made possible by integrating two comparators with
two switched-capacitor arrays.
The circuit is reliable for continued technology scaling
and provides enough resolution at the comparator
inputs for wide vectors and advanced technologies
with low supply voltage.
The rest of this paper is organized as follows. Section 2
illustrates the comparison principle for our Hamming
weight circuit based on a switched-capacitor array. Section
3 provides CMOS circuit design with related equations and
event timings. Section 4 contains sensitivity analysis and
effect on resolutions against the input vector size, speed,
and continued reduction of the supply voltage. Section 5
reports on the results of our HSPICE simulation-based on
0.15 lm and 1.5 V integrated-circuit technology. Section 6
compares the simulation results against previous efforts
having similar objectives to ours (low cost, high-speed, low
power, and variable application scale). Section 7 concludes
the paper.
123
2 Comparison principle
Switched-capacitor arrays constitute common mechanisms
for analog-to-digital or digital-to-analog conversion. In
these applications, the capacitance values C0, C1,…,CN–1
within the array, and thus the charges they hold, are proportional to successive powers of 2 (Fig. 1). Such switchedcapacitor arrays can be modified to assign equal charges to
all 1s in a bit-vector. This modification is appropriate for
Hamming weight comparison [23], where all positions in
the bit-vectors being compared are equally weighted. Thus,
the non-weighted switched-capacitor array is a modified
form of the conventional binary weighted variant, with all
capacitor values chosen to be the same, that is, C0 =
C1= _ = CN-1. Such an array provides a total charge equal
to the non-weighted sum of the capacitor charges. In this
way, comparing the Hamming weights of two bit-vectors is
reduced to comparing the sum of charges they induce on
their corresponding capacitors within the array, or comparing voltages resulting from these charges.
The non-weighted switched-capacitor array of Fig. 1
functions as follows. The Hamming bit-vectors (Hx, Hy)
are fed through a pair of Hamming registers (RHx, RHy) as
inputs to the array. The array accumulates the total charges
associated with the Hamming bit-vectors, furnishing the
Hamming weights (WHx, WHy). These weight-related
charge totals go through a series of evaluation steps based
on system clock (CLK) phases, providing the comparator
input voltages (VHx, VHy). Once the voltages have been
established, the comparator (COMP) releases the decision
of differential outputs (Voutp, Voutn), indicating a greater
than (WHx [ WHy) or less than (WHx \ WHy) outcome.
Another comparator is employed to establish the equality
outcome (WHx = WHy), in a manner that will become
apparent in Sects. 2, 3.
The complete direct realization of the Hamming weight
comparison is illustrated by the examples in Fig. 2, where
the Hamming input values have been chosen for clarity in
drawing and ease of discussing the operational principles.
There are two Hamming switched-capacitor arrays, associated with two comparators, with their labels and parameters distinguished through the use of the indices ‘‘1’’ and
‘‘2’’, respectively. For ease of reference, Table 1 summarizes all notations related to our Hamming switchedcapacitor array circuit and the associated descriptions.
The Hamming bit-vectors Hx and Hy are common inputs
to the two switched-capacitor arrays, meaning that
Hx1 = Hx2 and Hy1 = Hy2 at the primary input. However,
we differentiate the inputs by the indices ‘‘1’’ and ‘‘2’’ due to
the different shifting operations applied to them by each
circuit. More specifically, the first circuit (comparator
structure) applies a shift operation on Hx to form Hx1, while
the second circuit applies shifting on Hy to derive Hy2.
Author's personal copy
Analog Integr Circ Sig Process (2013) 75:417–434
Fig. 1 Schematic diagram of a
modified switched-capacitor
array to compare the Hamming
weights of two bit-vectors Hx
and Hy, presented in Hamming
registers RHx and RHy
419
CLK
Hy
Hamming Register-2 (RHy)
Control Switches
CN-1
Control Switches
VHm
C1
C0
Voutp
VHy
VHx
Voutn
CN-1
C1
C0
Control Switches
Hamming Register-1 (RHx)
Hx
The two Hamming circuits are identical in their design
parameters and operational principles. The first one applies an
extra KC0 weight, which is less than a unit-Hamming weight,
on input Hx1; we refer to this input as the ‘‘right-shifted’’ or
simply ‘‘shifted’’ input. Likewise, the second Hamming
weight circuit applies the same additional weight, but on the
Hamming input Hy2.
We next describe Fig. 2(a), which illustrates the Hamming comparison process when WHx = WHy = 2. The
right-shifting of Hx1 through the first Hamming weightedcapacitor array implies VHx1 [ VHy1. Thus, the first
comparator output value is ‘‘high’’; viz., Voutp1 = Vdd.
Concurrently, the right-shifting of Hy2 using the circuit of
the second Hamming switched-capacitor array produces
VHy2 [ VHx2. Hence, the second comparator output value
is ‘‘low’’; viz., Voutp2 = 0 V. Therefore, when the two
inputs have equal Hamming weights, the two comparators
produce the outputs ‘‘10’’. Similar reasoning can be applied
to the cases in Fig. 2(b), (c), yielding the comparator outputs ‘‘00’’ and ‘‘11,’’ respectively.
We call this novel structure for handling Hamming inputs
(Hx, Hy) through a pair of switched-capacitor arrays with
uniform capacitances and an inherent shift with comparator
decision, a ‘‘Hamming switched-capacitor array’’. For
proper operation, the shift amount must be smaller than a
unit-Hamming weight voltage difference at comparator
inputs (VHx1, VHy1, VHx2, VHy2), that is, when the Hamming weights of two bit-vectors differ by 1.
3 Circuit descriptions
The proposed Hamming circuit, realized in Fig. 3, utilizes
fully differential comparators, Hamming switched-capacitor
arrays, CMOS switches, and basic registers to hold the
Hamming inputs. The comparator is similar to those described
in [24, 25] for a given technology scaling of 0.15 lm CMOS
process [26]. There are two identical branches at the inputs of
the comparator, each is based on a set of uniform C0 capacitors
for a total of N Hamming bit-vector width with an addition of
extra weighted capacitances KC0.
The Hamming decision structure operates in four events
within the system clock periods (TCLK). In this way, the
CMOS switches of the Hamming circuit are toggled
through these events in order to evaluate the voltages, VHx
and VHy, at the inputs of comparators. The four signals
(PHM, PHR, PHE, PHC), shown in Fig. 4, are generated
from the system CLK by a simple finite-state machine (i.e.
Mealy), where each signal presents a particular event
within CLK, but tailored to different position of duty
cycles. Consequently, each phase entails particular event
operations where charges are carried to next phase, until
the final accumulated charges take hold and the comparison
decision is drawn.
During the first event, beginning at time (n-1)T and
shown in Fig. 4 with the dashed circle 1, both branches of
the first and second comparators’ inputs (VHx1, VHy1,
VHx2, VHy2) are set at the power supply voltage VHm =
123
Author's personal copy
Fig. 2 Three examples of
Hamming weight comparisons,
with the relationship between
WHy and WHx being: a equal,
b greater than, and c less than
Analog Integr Circ Sig Process (2013) 75:417–434
VHy1
VHy1,VHx1
420
Voutp1
COMP1
VHx1 (Shift Right)
Voutn1
VHy1
At the input of first comparator
Voutp1 > Voutn1
COMP1 = 1
Hamming Weight
Assume: WHy = WHx = 2
At the input of the Hamming
switched-capacitor array
VHy2 (Shift Right)
VHx2
1
VHy2,VHx2
Voutp1
COMP2
Voutn1
VHx1
2
VHx2
3
4
N
VHy2
At the input of second comparator
Voutp1 < Voutn1
COMP2 = 0
Hamming Weight
1
2
3
4
N
VHy1
VHy1,VHx1
(a) Case 1: WHy = WHx
Voutp1
COMP1
VHx1 (Shift Right)
Voutn1
VHx1
At the input of first comparator
Voutp1 < Voutn1
COMP1 = 0
Hamming Weight
Assume: WHy = 2 and WHx = 1
At the input of the Hamming
switched-capacitor array
VHy2 (Shift Right)
COMP2
VHx2
1
VHy2,VHx2
Voutp1
Voutn1
VHy1
2
VHx2
3
4
N
VHy2
At the input of second comparator
Voutp2 < Voutn2
COMP2 = 0
Hamming Weight
1
2
3
4
N
VHy1
VHy1,VHx1
(b) Case 2: WHy > WHx
Voutp1
COMP1
VHx1 (Shift Right)
Voutn1
VHy1
At the input of first comparator
Voutp1 > Voutn1
COMP1 = 1
Hamming Weight
Assume: WHy= 1 and WHx = 2
At the input of the Hamming
switched-capacitor array
Voutp1
COMP2
VHx2
Voutn1
1
VHy2,VHx2
VHy2 (Shift Right)
VHx1
2
VHy2
3
4
N
VHx2
At the input of second comparator
Voutp2 > Voutn2
COMP2 = 1
Hamming Weight
1
2
3
4
N
(c) Case 3: WHy < WHx
Vdd through the switch SHm (shown in Fig. 3). At the
same time, the outer plates of all capacitors are tied to Vdd
through switches SHxj and SHyj, which are controlled by
the event signal PHM, independent of the values of the
stored bits RHxj and RHyj (0 B j B n-1) in Hamming
registers RHx and RHy, respectively. Accordingly, the total
charge during the first event is evaluated using the charge
balance equation
QTx1 ¼ fVdd ½ðn 1ÞT Vdd g ðR0 j N1 Cj þ KC0 Þ
ð1Þ
123
where QTx1 donates the total charge in the first comparator
structure associated with the Hamming input Hx, and
(n-1)T refers to the first event’s timing as depicted in
Fig. 4. Because both comparators structure are initialized to
the same voltages during this first event, and since all
capacitors have the same capacitance value C0, Eq. (1) yields:
QTx1 ¼ QTy1 ¼ QTx2 ¼ QTy2 ¼ 0:
ð2Þ
During the second event, beginning at time (n-1/2)T,
the SHm switch is opened, while the capacitors’ outer
Author's personal copy
Analog Integr Circ Sig Process (2013) 75:417–434
421
Symbol
Definition
plates are connected to Vdd/2. This is controlled by the
event signal PHR independent of the Hamming input bits
value. Thus, charge conservation implies:
Vdd
Power supply voltage
fVHx1 ½ðn 1=2ÞT Vdd=2g ðR0 j N1 Cj þ KC0 Þ ¼ 0
C0
Unit capacitance
K
Shifting ratio factor
N
Input bit-vector length
Hxi [Hyi]
First [second] Hamming input of the
ith Hamming circuit
RHx [RHy]
Hamming register for storing the first
[second] Hamming input
First [second] Hamming weightrelated to the ith Hamming circuit
Table 1 Terminology for the proposed Hamming switched-capacitor
array (i = 1 or 2 is the index of the Hamming circuit)
WHxi [WHyi]
COMPi
ð3Þ
Differential comparator in the ith
Hamming circuit
VHxi [VHyi]
First [second] input voltage for COMPi
VHm
Common Hamming mode voltage
equal to Vdd
Voutpi [Voutni]
Positive [negative] output voltage for
COMPi
Fig. 3 The proposed Hamming
switched-capacitor array circuit
Using the same assumption of equal capacitance values,
and due to the symmetry of comparator structures, we have
VHx1 ½ðn 1=2ÞT ¼ VHy1 ½ðn 1=2ÞT ¼ Vdd=2
VHx2 ½ðn 1=2ÞT ¼ VHy2 ½ðn 1=2ÞT ¼ Vdd=2
ð4Þ
Note that up to this point, the CMOS switches operate
independently of the input values.
During event 3, beginning at time instant nT, the circuit
switches to the evaluation mode, where the outer capacitor
plates are connected to input voltages, whose values
depend on the corresponding bits of the Hamming registers
RHx and RHy: Vdd for an input bit 1 and Vdd/2 for an input
bit 0, leading to the voltage value (1 ? b)Vdd/2 for an
Vdd/2
Vdd
SHyN-1
SHy0
C0
KC0
Voutp1
SHm
VHy1
COMP1
VHm
VHx1
Voutn1
C0
SHxN-1
C0
KC0
SHx0
First Comparator Structure
C0
Shifting Hx by inserting
Vdd to KC0
Vdd
Vdd/2
Vdd/2
Shifting Hy by inserting
Vdd to KC0
Vdd
SHyN-1
SHy0
C0
KC0
Voutp2
SHm
VHy2
VHm
COMP2
VHx2
Voutn2
C0
SHxN-1
C0
SHx0
KC0
Second Comparator Structure
C0
Vdd
Vdd/2
123
Author's personal copy
422
Fig. 4 Timing events for the
proposed Hamming N-bit vector
circuit
Analog Integr Circ Sig Process (2013) 75:417–434
RESET
(n-1)T
(n-1/2)T
nT
(n+1)T
(n+1/2)T
CLK
PHM
PHR
PHE
PHC
Event
Event
Event
Event
1
2
3
4
New cycle with new start for new Hamming
measurement
Cycle 1
input bit value b. During event 3, the evaluation voltage
VHx1 is ‘‘right-shifted’’ by applying Vdd to the outer plate
of the capacitor labeled KC0 in Fig. 3, with charge preservation implying:
R0 j N1 C0 fVHx1 ½ðn 1=2ÞT Vdd=2g
þ KC0 fVHx1 ½ðn 1=2ÞT Vdd=2g
¼ R0 j N1 C0 VHx1 ½nT 1 þ Hxj Vdd=2
þ KC0 fVHx1 ½nT Vdd g
ð5Þ
Note that the first line of Eq. (5) represents the total
charge at time (n-1/2)T and the second line constitutes the
total charge at time nT, with the latter being a function of
the bit values Hxj or their associated applied voltages
(1 ? Hxj)Vdd/2.
Plugging (4) into (5) and simplifying, we have:
PN1
ðN2 þ KÞ þ j¼0
Hxj =2
VHx1 ½nT ¼ Vdd
ð6Þ
NþK
Concurrently, VHy1 is also evaluated during event 3, but
without the right-shifting, that is, the voltage Vdd/2 to outer
plate of the capacitor labeled KC0 in Fig. 3. Thus, the
counterparts to Eqs. (5) and (6) are:
123
R0 j N1 C0 fVHy1 ½ðn 1=2ÞT Vdd=2g
þ KC0 fVHy1 ½ðn 1=2ÞT Vdd=2g
¼ R0 j N1 C0 VHy1 ½nT 1 þ Hyj Vdd=2
þ KC0 fVHy1 ½nT Vdd g
PN1
N
K
j¼0 Hyj =2
2 þ 2 þ
VHy1 ½nT ¼ Vdd
NþK
ð7Þ
ð8Þ
Notice that the required right-shifting is performed by
simply applying a different voltage value to the outer plates of
capacitances KC0. This strategy renders the Hamming
switched-capacitor structures on both branches identical, thus
ensuring balance in the geometry of capacitance areas on both
inputs to the comparator. This reduces the mismatch error.
Following the same process as above, the voltages
associated with the second comparator structure are readily
seen to be:
PN1
Hxj =2
ðN2 þ K2 Þ þ i¼0
VHx2 ½nT ¼ Vdd
NþK
PN1
Hyj =2
ðN2 þ KÞ þ i¼0
VHy2 ½nT ¼ Vdd
NþK
ð9Þ
ð10Þ
Author's personal copy
Analog Integr Circ Sig Process (2013) 75:417–434
423
We note that if the Hamming inputs Hx and Hy are
both the all 0s vector, then Eqs. (6), (8), (9), and (10)
reduce to:
Vdd
K
1þ
VHx1 ½nT ¼
&VHy1 ½nT ¼ Vdd=2
2
NþK
Vdd
Vdd
K
& VHy2 ½nT ¼
1þ
:
VHx2 ½nT ¼
2
2
NþK
ð11Þ
ð12Þ
Consequently, for equal Hamming weights, Eq. (11)
suggests that the shift amount is K/2 towards Hx, while
in (Eq. 12) the shift amount is K/2 towards Hy. Thus, these
equations confirm the comparison principle derived in
Sect. 2.
The final event 4, beginning at time instant (n ? 1/2)T,
completes the operation of the comparators by providing
output values at Voutp1 and Voutp2, based on the weighted
sum of the comparators’ inputs (VHx1, VHy1, VHx2, VHy2).
The comparator is realized with latched output that holds
the current comparison results during the next new comparison cycle.
Table 2 provides the resulting voltages and comparison
results for Hamming weight comparison of 4 bit input
vectors Hx and Hy in a number of cases involving all
possible distances. The entries in Table 2 are derived from
Eqs. (6), (8), (9), and (10) by assuming K = 1/4. We will
see in the next section that the value of K is limited to the
range [0, 1/2].
4 Accuracy analysis
In this section, characteristics of the proposed circuit are
identified based on the behavior of switched-capacitor
array. These characteristics resemble those of analog/
digital and digital/analog switched-capacitor structures
Table 2 Hamming weight
comparison between two 4 bit
input vectors Hx and Hy, based
on the proposed Hamming
switched-capacitor array of
Fig. 3
WHx, WHy
[19–22, 24, 25], where similar definitions and theory are
applicable. Assume ideal conditions, in the absence of
noise and any imperfections. For N-bit inputs, our structure
requires N equal C0 capacitors in the form of a uniform
charge redistribution array, along with a shifting capacitor
KC0.
The lower bound of the shifting factor is based on the
requirement for proper operation in the case when two
Hamming weights are equal. Using (11) and (12), we have:
C0 ðK þ N=2ÞVdd
C0 ðN þ K ÞVdd=2
[
;
C0 ðN þ K Þ
C 0 ðN þ K Þ
implies K [ 0:
ð13Þ
On the other hand, the upper bound is based on the
requirement for proper operation when the two Hamming
weights differ by 1. Using Eqs. (6–10), we have:
C0 ðN=2 þ K ÞVdd C0 ðN þ K þ 1=2ÞVdd=2
\
;
C0 ðN þ K Þ
C0 ðN þ K Þ
implies K\1=2:
ð14Þ
Furthermore, the minimum resolution between the input
branches of the comparators can be expressed in a worstcase scenario when the two Hamming weights are equal as
derived in (11) and (12), with the only difference between
them resulting from the shift ratio:
Vdd
K
Vdd Vdd
K
1þ
¼
Vdiff ¼
:
2
NþK
2
2 NþK
ð15Þ
Thus, the minimum comparators’ resolution is
proportional to Vdd and inversely proportional to the
input vector width N. This will make the design of
comparators more challenging with gradual increase in bitwidth and continued technology scaling. An optimal value
for K is obtained by striking a balance between the two
constraints Eqs. (14) and (15):
Kopt ¼ 0:5:
COMP1
ð16Þ
Voutp1, Voutp2
COMP2
Result
VHx1/Vdd
VHy1/Vdd
VHx2/Vdd
VHy2/Vdd
0, 0
0.529
0.500
0.500
0.529
1, 0
=
1, 0
0.647
0.500
0.618
0.529
1, 1
[
2, 0
0.765
0.500
0.735
0.529
1, 1
[
3, 0
0.882
0.500
0.853
0.529
1, 1
[
4, 0
0, 1
1.000
0.500
0.500
0.647
0.971
0.529
0.529
0.618
1, 1
0, 0
[
\
0, 2
0.500
0.765
0.529
0.735
0, 0
\
0, 3
0.500
0.882
0.529
0.853
0, 0
\
0, 4
0.500
1.000
0.529
0.971
0, 0
\
123
Author's personal copy
424
Analog Integr Circ Sig Process (2013) 75:417–434
The ideal transfer characteristics from the difference of
two Hamming weights to their accumulated charges at the
input branches of the comparators (|VHx1-VHy1|, |VHx2VHy2|) is illustrated in Fig. 5. The offset error is commonly
known as the constant deviation of the actual output from
ideal output when the ideal output should be zero. Gain
error is characterized by a slope change in the transfer
characteristics. This type of error, depicted in Fig. 5(a), is
usually due to process variability or the power supply
voltage differences at the input branches of the
comparators [20–25].
Another type of deviation is a measure of the non-linearity
error after the offset and gain errors have been removed, as
shown in Fig. 5(b). Assuming the X-axis is the Hamming
input vectors bits and the Y-axis is the accumulated weights
at the given X-axis bits. The accumulated weights are measured based on either Eq. (8) or (9) which applies on nonshifting inputs of first and second comparator structure
(Fig. 3). Consider an example where the 8 bit Hx and Hy
have 4 and 5 bits asserted, respectively, with the remaining
bits being 0s. Applying (9) on Hx provides the weight
Vddð2 þ K2 þ N2 Þ=ðN þ KÞ on the Y-axis, while Eq.(8)
applied on Hy yields the weight Vddð52 þ K2 þ N2 Þ=ðN þ KÞ on
the Y-axis, corresponding to two points on the graph. Furthermore, when Hx and Hy have the same number of asserted
bits, they provide one points on the graph using either Eq.(8)
or (9). By increasing the input bit-width N, all points of graph
can be obtained using Eq.(8) and (9).
Next, applying the shifting principles using Eq.(6) and
(10), two graphs representing the minimum and the maximum shifting regions are produced. These shifting regions,
along with the non-shifting region, create maximum
and minimum voltage differences at the inputs of comparators. Thus, we have the following equations for the
maximum-weight and minimum-weight voltage differences, respectively:
Vdiffmax ¼maxfjVHx1 VHy1 j; jVHx2 VHy2 jg
¼
Vddð12 þ K2 Þ
NþK
ð17Þ
Vdiffmin ¼minfjVHx1 VHy1 j; jVHx2 VHy2 jg
¼
Vddð12 K2 Þ
NþK
ð18Þ
It might be noted that Vdiffmax of (17) realizes large margin
for comparator resolution, while Vdiffmin of (18) realizes
small margin for comparator resolution.
To summarize, the graph in Fig. 5(b) shows the linearity
of input–output characteristics, where the inputs are bits of
Hx and Hy and the output is the accumulated voltage at the
comparator input. This voltage difference determines the
resolution of comparator operations against Hamming bits
difference, which can be degraded due to nonlinear factors
such as capacitance and switch mismatches.
Units in
Units in
Vdd/(N+K)
Vdd/(N+K)
(N/2+K/2+N/2)
(N/2+K/2+N/2)
Hamming weights without
Gain Error
applying the shift procedure
Hamming weights curve
Maximum shifting region
(7/2+K/2+N/2)
(7/2+K/2+N/2)
Minimum shifting region
(3+K/2+N/2)
(3+K/2+N/2)
Hamming Weights (WHi)
Hamming Weights (WHi)
Offset Error
(5/2+K/2+N/2)
(2+K/2+N/2)
(3/2+K/2+N/2)
Maximum weight difference
between WH5 and WH4
(5/2+K/2+N/2)
Minimum weight difference
(2+K/2+N/2)
between WH5 and WH4
(3/2+K/2+N/2)
Error due to minimum shifting
(1+K/2+N/2)
(1+K/2+N/2)
(1/2+K/2+N/2)
(1/2+K/2+N/2)
K/2+N/2
K/2+N/2
Hx,Hy
0
1
2
3
4
5
6
7
Hamming Bits
(a)
8
9
10
N
Hx,Hy
0
1
2
3
4
5
6
7
8
9
10
N
Hamming Bits
(b)
Fig. 5 Graphic illustration of linearity shifting principles .a Impact of linear error,b maximum and minimum resolution regions and impact of
non-linear error
123
Author's personal copy
Analog Integr Circ Sig Process (2013) 75:417–434
425
For N-bit inputs, the proposed Hamming circuit entails a
total capacitance of
CT ¼ 4ðN þ K ÞC0
ð19Þ
given that we have two branches for each comparator
structure as shown in Fig. 3. The outer plate of each
capacitor is connected to either Vdd or Vdd/2, while the
inner plate is connected to Vdd or the input terminal of the
comparator. Such a configuration, is commonly known to
reduce the noise injected from the substrate into the circuit
[27–29], since the outer plates of the capacitors are never
connected to input terminal of the comparator or to substrate voltage.
Furthermore, the geometry of the differential Hamming
switched-capacitor array reduces the effects of possible
errors resulting from mismatches of the capacitances.
Assuming the switched-capacitor array to exhibit a linear
gradient, that is, a linear variation in changes of Tox from
one end to the other, the linear mismatches of C0 becomes
C0 ! ðC0 þ DC0 Þ; 2C0 ! ðC0 þ 2DC0 Þ; . . .;
NC0 ! ðC0 þ NDC0 Þ
ð20Þ
where DC0 is the linear gradient error associated with
capacitance value in our Hamming switched-capacitor
array. Now, using Eq. (18) for the minimum resolution at
the input of the comparator with linear gradient of
mismatches for our switched-capacitor array, we have:
Vdiffminc ¼
K ðC0 þ DC0 ÞVdd=2
C0 ðN þ K Þ þ N ðN 1ÞDC0 þ KDC0
ð21Þ
where KDC0 is the error associated with the shift capacitance value. Taking 0.2 % of C0 to be a typical worst-case
approximate mismatch for DC0 [27, 28], Fig. 6 shows
Vdiffmin of (18) and Vdiffminc of Eq. (21) against the
Hamming size N. The result is very promising, because it
shows the minimum difference resolution voltage (18)
Vdiffmin = 5.8 mV (at N = 64, K = 0.5, C0 = 0.1 pF, and
Vdd = 1.5 V), while the minimum difference including
mismatches linear capacitance error Eq. (21) is 5.2 mV.
Thus, the mismatches degrade the resolution from 5.8 to
5.2 mV. Most recent comparator designs [19–21, 24, 25]
can promptly respond to differences as small as a couple of
millivolts, so the operation of our circuit should be robust,
even with linear gradient error due to capacitance mismatches. See the appendix for simulation-based results.
As further evidence of robustness, the error between
Eqs. (18) and (21) is depicted in Fig. 7. The near independence with regard to increases in input width is due to a
differential uniform cancellation structure. Non-linearity
error, resulting from variance in capacitance non-linearities,
degrades the minimum resolution margin by 10.34 % at the
input of comparators. The capacitance mismatch error can be
Fig. 6 Minimum resolution at the input of comparator against the
Hamming input width N. The graph using the symbol ‘‘-’’ shows the
ideal difference Vdiffmin modeled by (18), while the graph using the
symbol ‘‘o’’ represents the non-ideal difference Vdiffminc given by
Eq.(21)
further analyzed by considering a random difference, in lieu
of linear difference, between the two differential arrays. The
latter error is usually reduced by using layout techniques, as
discussed in [19, 30–33]. On the other hand, the non-linearity
error of linear gradient capacitance mismatches shown in
Eq. (21) has an insignificant impact on the accuracy of
Vdiffmax, since the resolution margin at the comparator input
is much greater than Vdiffmin. Hence, we elaborate only on
the impact of nonlinearity factors on Vdiffmin margin.
The maximum speed of the design is constrained by the
four events comprising one system clock cycle, as discussed
Fig. 7 The resolution error magnitude between ideal and non-ideal
comparator input, where the non-linearity in the non-ideal case is due
to linear gradient capacitance mismatches
123
Author's personal copy
426
Analog Integr Circ Sig Process (2013) 75:417–434
in Sect. 3, independent of the size of the input vectors. The
worst-case settling time is associated with the first event,
when the input branches of the comparators are charged to
VHm = Vdd. These input branches are connected to all inner
capacitor plates, where the voltage must rise from 0 V to
VHm = Vdd within a quarter of system clock cycle, and
then, discharge to Vdd/2 during the beginning of second
event. We assume that signal rising and falling times are
equalized by tweaking the magnitude of RTG. Consequently,
using the derived transmission-gate transfer time in [34],
which is limited by the time constant, we have:
"
#
N 1
X
se1charge ¼ se1discharge RTG Cp þ KC0 þ
C0
j¼0
ð22Þ
where RTG is the resistance associated with the CMOS
switch SHm, derived in [34], Cp accounts for gate and
diffusion capacitances of SHm, KC0 is the shifting capacitance value, and C0 is the unit charge capacitor for each bit
position of the N-bit input vectors. The subscripts ‘‘e1charge’’ and ‘‘e1-discharge’’ refer to first event for charging
and discharging of total capacitor voltage. Hence, the
conversion cycle during this first event is limited by the
time constant of this event.
We next endeavor to demonstrate that events 2–4 have
much shorter settling times, making Te1 the determining factor
in setting the clock speed. The settling time during event 2 is
related to switching off the inner plates in the capacitor array
from VHm and transferring Vdd/2 from the outer plate to inner
plate of each capacitor in the array. This settling time is fairly
short, since it pertains to individual capacitance values, rather
than the whole line. During event 3, the settling time pertains
to charging each unit capacitor individually with Vdd instead
of Vdd/2, based on whether the associated bit of the corresponding input vector is asserted. This settling time is also
fairly short, as it pertains to individual capacitors. The final
event 4 is constrained by the comparator’s decision time of
the sense amplifier and registering of the output results in
D latches which considered short [19–21, 24, 25, 35]. As a
result, the circuit’s total transfer time, which is limited by the
time constant in (22), is given by:
Tp ¼ 4 se1charge þ se1discharge :
ð23Þ
Finally, the maximum switching frequency of the
proposed design is (pessimistically):
fm
1
:
Tp
ð24Þ
The factor 4 in Eq. (23) reflects the fact that Eq. (22)
shows only the transfer time of event 1. The pessimistic
nature of Eq. (24) is due to our assuming each of the
simpler events 2–4 also requiring as much time as event 1.
123
5 Simulation results
We next present results for HSPICE [36] simulations of the
proposed design, utilizing 0.15 lm TSMC technology [26].
All simulations were carried out for a typical technology
corner: power supply at Vdd = 1.5 V and temperature at
25 C. The input vector width is chosen to be N = 16,
representing the most common applications. Additionally,
we take the unit capacitance value to be 0.1 lF, as is the
case for most common 0.15 lm technology, and assume
the sub-optimal shifting ratio factor of K = 1/4 as a challenging marginal operating point (half of the optimal value
Kopt = 1/2).
The HSPICE simulations where conducted to verify the
functionality of the design based on our derived mathematical equations and to measure the maximum operating
frequency, maximum resolution error, and maximum
power dissipation. The excellent agreement between simulation results and those derived from analytic formulas
verifies the accuracy of the latter, leading us to the conclusion that the circuit will behave as expected for wider
inputs.
Figure 8 shows the simulated waveform for our Hamming
circuit’s operation at a system clock frequency of 1 GHz,
when the weight of the input Hx sweeps from 0 to 16 (bits are
sequentially asserted), while the weight of the input Hy is
maintained at 16 (all bits are asserted). The first signal in the
presented waveform (PHC) is related to symbol of Fig. 4,
which illustrates the comparison decision (event 4), while the
second and third waveforms show the input line voltages at
the first and second comparator: V(Hx1), V(Hy1), V(Hx2), and
V(Hy2), which are symbolized as VHx1, VHy1, VHx2, and
VHy2, respectively. The fourth and fifth signal waveforms
correspond to the comparator decision output for each
comparison event (i.e. event 4). The signals evaluates an
outcome of V(outp1) = 0 and V(outp2) = 0 (i.e. Voutp1 and
Voutp2, respectively), indicating that WHx \ WHy since Hy
is kept at 16 while Hx is sequentially increased. Moreover,
when the simulation ends, we should have Voutp1 = 0 and
Voutp2 = Vdd, indicating that the inputs have equal Hamming weights.
Likewise, Fig. 9 shows complementary results to Fig. 8,
in that it corresponds to a simulation where the Hamming
weight of Hx is kept constant at 16 while the Hamming
weight of Hy is swept from 0 to 16. Hence, the comparator
outputs of Fig. 9 show a value of V(outp1) = Vdd and
V(outp2) = Vdd, indicating that WHy \ WHx. Again, at
the end of the simulation, Voutp1 = 0 and Voutp2 = Vdd,
indicating the equality of Hamming weights.
The simulation results of Fig. 10 were derived to
examine the worst-case scenario for charging and discharging the switched-capacitor array represented by the
comparator inputs: V(Hx1), V(Hy1), V(Hx2), and V(Hy2). To
Author's personal copy
Analog Integr Circ Sig Process (2013) 75:417–434
427
Fig. 8 HSPICE waveforms for
input Hy kept at weight 16,
while the Hamming weight of
the input Hx is sequentially
increased. The horizontal time
scale is in nanoseconds and the
vertical output scale is in volts
(CLK = 1 GHz, Vdd = 1.5 V),
using TSMC 0.15 lm
technology
induce the worst-case, we alternate between WHx = 0,
WHy = 16 and WHx = 16, WHy = 0 during consecutive
clock periods. Hence, all capacitors at one end of the
comparator are charged, while all capacitors at the other
end are discharged simultaneously. We note that for this
worst-case, the comparator outputs are safely responsive at
1 GHz operating speed, given that all events within one
cycle are correctly evaluated.
Figure 11 depicts a useful way of indicating the relationship between the Hamming weight differences and
analog voltage differences at the inputs of the comparators,
as illustrated in Sect. 3 via Fig. 5b. This measure shows the
linearity curve within minimum region difference, commonly known as the input–output transfer characteristics.
The results show close-to-ideal transfer characteristics,
where the deviation of measured in Fig. 12 is about ±
0.08 mV. This error reduces the minimum resolution
of comparator inputs given by Eq. (18) from 22.7 to
22.62 mV for N = 16 bits. Generally, the impact of this
error is negligible for small input vector widths.
Finally, the power consumption results have been obtained
for the proposed circuit running at 1 GHz, assuming unit
capacitance value of 0.1 pF and Hamming width N = 16,
with the shifting factor K = 1/4. The simulation results
report power consumption of 0.136 mW, where all bits are
taken to be different in a worst-case scenario. The two
comparators are considered major sources of power consumption due to the bias circuit that requires a constant
biasing current. Power consumption from this source can be
minimized by using the design in [24, 37, 38]. Furthermore,
using a bias on–off switch circuit will help in reducing the
overall power consumption. Each of two comparators consumes an average of 0.5 mW of power, which is not included
in the recorded power consumption. Thus, the total power
consumption is 1.136 mW, when the two comparators are
included.
6 Comparative evaluations
We now endeavor to analyze the area (in number of transistors), operating speed, and power requirements of our
proposed Hamming weight comparison circuit, ending the
123
Author's personal copy
428
Analog Integr Circ Sig Process (2013) 75:417–434
Fig. 9 HSPICE waveforms for
Hamming weight of Hx kept at
16, while the Hamming weight
of Hy is sequentially increased.
The horizontal time scale is in
nanoseconds and the vertical
output scale is in volts
(CLK = 1 GHz, Vdd = 1.5 V),
using TSMC 0.15 lm
technology
section by a comparison of our novel design method with a
number of recent alternative designs [12–18].
6.1 Area assessment
To derive the area requirements, we first calculate the total
number of transistors for the N-bit Hamming weight circuit
(Table 3). The transistor counts for CMOS switches are
self-explanatory. The transistor count of COMP1 and
COMP2, with Mealy phase generators, equals 2*(60) ?
58 = 178, independent of input size. We did not count the
registers (D-Type Flip-Flops) that store the input bits,
because these registers already exist on the data bus that
provides the inputs. These registers are common to all
designs, thus not affecting our comparisons.
Furthermore, large capacitances are usually made of
identical unit capacitors, so as to minimize the ratio mismatches [30–33]. For this reason, common-centroid
geometries and inter-digitization are of particular importance in reducing mismatches for high accuracy, as they
help reduce the effects of gradients and random fabrication
errors [27–29]. This fabrication error is usually due to
123
photolithography, oxide thickness, and dielectric constant,
plus expected voltage and temperature variations. Given
the importance of achieving highly precise layout and due
to our design’s use of uniform capacitances, our differential
Hamming switched-capacitor arrays are arranged with
common-centroid geometries. Figure 13 shows an example
of centroid capacitance layout for an input width of 9 bits,
where the structure requires 18 uniform C0 capacitors on
each switched-capacitance array in our two-array structure.
Each capacitor is split into two equal parts of capacity 0.5
C0, marked by the letters ‘‘a’’ and ‘‘b’’ in Fig. 13, where
minus and plus signs denote the capacitors associated with
inputs Hy and Hx, respectively. This geometry provides a
good cancelation of random variations across the array
through an assurance of center gradient in the fabrication
process in all binary combinations. Besides, the proposed
layout minimizes the fringing capacitances between the
two branches of comparator inputs. Based on this geometry, the total area is depicted in Table 4 for some input bitwidths, assuming the use a uniform capacitance value of
0.1 lF for our 0.15 lm technology with 2 poly and 3 metal
layers.
Author's personal copy
Analog Integr Circ Sig Process (2013) 75:417–434
429
Fig. 10 HSPICE waveforms
for Hamming inputs Hy and Hx
cross-toggling between
maximum-weight of 16 (all bits
at 1) and minimum-weight of 0
(all bits at 0) on alternating
clock cycles. The horizontal
time scale is in nanoseconds and
the vertical output scale is in
volts (CLK = 1 GHz,
Vdd = 1.5 V), using TSMC
0.15 lm technology
Fig. 11 HSPICE extracted simulation for input–output transfer
characteristics with respect to ideal curve presented in Eq. (18),
using TSMC 0.15 lm technology at CLK = 1 GHz, Vdd = 1.5 V,
K = 1/4, N = 16, and C0 = 0.1 pF
Fig. 12 HSPICE extracted simulation for non-linear error and its
impact on comparators resolution decision, using TSMC 0.15 lm
technology at a clock frequency of 1 GHz, Vdd = 1.5 V, K = 1/4,
N = 16, and C0 = 0.1 pF
123
Author's personal copy
430
Analog Integr Circ Sig Process (2013) 75:417–434
Table 3 Number of transistors in the proposed Hamming circuit with
various vector widths
Input width
in bits (N)
Transistor counts
Four sets of
CMOS switches
Comparators and
phase generator
16
4 * (14 * 16)
178
1,074
32
4 * (14 * 32)
178
1,970
64
4 * (14 * 64)
178
3,762
128
4 * (14 * 128)
178
7,346
256
4 * (14 * 256)
178
14,514
+7b
+8b
+9b
-9b
-8b
-7b
+4b
+5b
+6b
-6b
-5b
-4b
+3b
+2b
+1b
-1b
-2a
-3b
-3a
-2a
-1a
+1a
+2a
+3a
-4a
-5a
-6a
+6a
+5a
+4a
Total
transistors
-7a
-8a
-9a
+9a
+8a
+7a
Fig. 14 Hamming weight decision delay vs. Hamming input bitwidth under worst-case simulations depicted in Fig. 10
Fig. 13 Example of a centroid capacitance layout
Table 4 Total capacitor layout area for various input bit-widths
Total (pF)2
Input width
in bits (N)
Switched-capacitor area
for C0 = 0.2 pF
(Hx1, Hy1) ? (Hx2, Hy2)
16
2 * (8 * 8)(0.05 * 0.05)
0.32
32
2 * (11.5 * 11.5)(0.05 * 0.05)
0.66
64
2 * (16 * 16)(0.05 * 0.05)
1.28
128
2 * (23 * 23)(0.05 * 0.05)
2.65
256
2 * (32 * 32)(0.05 * 0.05)
5.12
input lines to Vdd (VHm) also goes up. Figure 14 shows a
linear relation between delay and input bit width up to 64
bits. The relationship starts exhibiting non-linearity for
wider inputs due to large second-order parasitic effects
with respect to C0. The second-order effects are mostly due
to transmission-gate resistances (RTG) and parasitic
capacitances (e.g. diffusion, gate, wire) of CMOS switches,
which vary in a non-linear manner with voltage. More
detailed analysis on typical switched-capacitor array parasitic effects can be found elsewhere [22].
6.3 Power characteristics
Our proposed Hamming comparator circuit exploits the
uniform weighted switched-capacitor array and CMOS
6.2 Speed assessment
In this section, we vary the input Hamming weights by
taking a worst difference between the two input bit-vectors
in each particular Hamming weight class and measure the
HSPICE simulated worst-case delay. This worst-case delay
is measured from rising edge of first event to evaluated
comparison output data (Voutp1, Voutp2) during the fourth
event. In addition, every simulation case was toggled based
on Fig. 10 in order to ensure that the time constants are
accounted for with enough accuracy (3s ? 5s) within each
event. This delay is approximated by our derived Eq. (24),
where the critical path length was taken to be four times the
delay of the first event, which was shown to be a worst-case
time delay event.
Note that whenever N is increased, the capacitances on
the input comparators are also increased, based on Eq. (22),
with the result that the first event’s delay for charging these
123
Fig. 15 Power consumption vs. varying Hamming input widths,
when running at 0.2 GHz system clock
Author's personal copy
Analog Integr Circ Sig Process (2013) 75:417–434
431
Table 5 Cost, latency, and power consumption for Hamming weight comparator circuits
Circuits (N Bits)
Tech/supply
voltage
Area (Transistors)
Power
Operating speed
Proposed
N = 16
0.15 lm/1.5 V
1074 Trans. ? 1.28 pF2
1.136 mW @ 1 GHz
1 ns 4[(2)RTG*(Csdg ? N*C0)]
[1] N = 8
0.35 lm/3.5 V
324 Trans.
0.44 mW @
200 MHz
4.58 ns
[4] N = 5
–
(41 gates) = 246 Trans. O(N log2N)
(LG)
–
7-gate levels O(log2N) (gate
levels)
[5] N = 16
AMI 0.8 lm
(256 gates) = 1536 Trans. [N X N]
*(LG)
–
1.3 ns/level N-gate levels
–
½2ðN log2 N 1ÞFA þ 4 log2 NðLGÞ
–
ðlog2 N 1ÞðFA gate levelÞ þ 1
[3] Size = N
2
The parameters pF , FA, N, and LG are defined as picofarad area of switched-capacitor array, full-adder area, Hamming input bit-width, and
logic gate area, respectively
switches, where the input voltage to switched-capacitor
array is limited to DC constant voltages (Vdd, Vdd/2).
Besides, all switching activities entail single unit capacitors
associated with input bits, the only exception being during
the first event, when all capacitors are charged to Vdd.
Thus, the switched-capacitor array usually limits switching
activity to small numbers of devices, leading to low power
dissipation [37]. Analyses in [38, 39] show that circuits
based on the switched-capacitor array paradigm dissipate
small amounts of power, a fact that is readily confirmed by
the results of our HSPICE simulation for assessing power
consumption, as depicted in Fig. 15.
Power consumption results have been obtained for the
proposed circuit running at a lower frequency of 200 MHz,
in order to accommodate all functional operations for different input widths ranging from 16 to 64 bits, with unit
capacitance of 0.1 pF and shifting factor K = 1/2. In each
case, all bits are taken to be different as a worst-case
scenario. The 0.5 mW average power consumption of each
comparator is not included in the recorded power consumption of Fig. 15. We have not discussed the structures
associated with the comparator circuitry, since the literature is rich with many such designs. The scope of our paper
is to present the architecture of a Hamming comparator
circuit, rather than basic comparator design.
6.4 Comparison with previous designs
Table 5 contains a comparison of our Hamming comparator
circuit against several state-of-the-art Hamming comparator
designs, whose structures represent recent design topologies
and circuits targeted for high-speed operation and power
savings, that is, objectives similar to those of our proposed
architecture. The symbols FA and LG are used as a unit for
full-adder and logic gate counts, where the FA requires
between 3 and 9 logic gates. A gate is usually built from 4–6
transistors, although some recent ones [40] require only one
logic gate with several capacitors.
In terms of the number of transistors used, our design is
intermediate among those compared, because it requires
two switching capacitor array circuits for each input bit,
with each switching circuit composed of four pass-gate
multiplexors and four inverters. Bear in mind, however,
that the components in our design all use local interconnects, with no global routing. The latter property, combined
with the small geometry of capacitance array, might result
in a smaller overall silicon area. Furthermore, the proposed
design is extremely power-efficient, since all activities of
the operating frequency are introduced through the Hamming bit capacitor value C0, which is known to be less than
diffusion and gate capacitances of cascaded logic gates.
Nevertheless, designers must be aware that the comparator
circuit requires standby currents on the order of 0.5 mA,
which potentially necessitates the use a switched on–off
circuit to shut off the power during standby mode.
Another significant advantage of our design is its high
operating speed for a wide range of Hamming input bitwidths (N). The high-speed of our design, which places it
among the fastest Hamming weight comparators reported
in the literature, results from parallel operation in charging
and discharging of the capacitors C0 and the concurrent
manifestation of the total weight in each input bit-vector,
independent of the order of bits. By contrast, competing
designs take the order of bits into account, leading to
synchronous realizations.
7 Conclusions
We have presented a Hamming weight comparator circuit architecture for input bit-vectors using switched-
123
Author's personal copy
432
capacitor arrays, where all capacitors have a uniform
weight independent of the order of bits. The circuit
comprises of two switched-capacitor arrays associated
with two comparators, collectively realizing all comparison functions, that is, ‘‘[’’, ‘‘\’’, and ‘‘=’’, between the
two input bit-vectors. The circuit operation involves a
shifting factor applied differently to each input by
varying the input voltage between Vdd and Vdd/2,
instead of through changing the capacitance value in the
two inputs of the comparator. Thus, the two comparator
inputs have a uniform weight of capacitances, which
reduces layout-induced mismatches.
Theoretically, the circuit input/output characteristics are
linearly related to the accumulated voltage weight of
comparators across the Hamming vector width (N). Consequently, the minimum voltage margin at comparator
inputs is reduced as N increases, and power supply voltage
requirement is also reduced. Non-linear phenomena, such
as capacitance and switch mismatches, can potentially
upset this theoretical assessment, but HSPICE simulation
shows that the effects of non-linearities are not problematic
in practical contexts.
For example, assuming 0.15 lm TSMC technology,
with K = 1/4, Vdd = 1.5 V, and N = 16 bits, non-linearities reduce the minimum resolution margin from 22.7 to
22.3 mV, thus barely affecting the wide margin which is
needed for reliable comparator operation. Consistent with
this analysis, the measure of minimum difference region
(i.e. resolution margin), assessed for several Hamming
input vector bit-widths, shows that for N = 64, K = 1/2,
Vdd = 1.5 V, and with the same technology, non-linearities reduce the margin from 5.8 to 5.2 mV. Most recent
comparator designs can operate reliably under this typical
margin.
The operation of the Hamming switched-capacitor circuit is realized in terms of four events, regardless of the
input width. The events occur sequentially, with each event
proceeding concurrently in the two switched-capacitor
arrays. Consequently, the circuit function entails a relatively low latency, not exceeding four times the duration of
the first (longest) event. This leads to very competitive
latency values and associated clock rates.
Extensive HSPICE simulation results indicate that the
circuit can run at 1 GHz for N = 16, while consuming a
power of about 1.136 mW for the shifting factor K = 1/4.
For N = 64 and K = 1/2, it can run at 200 MHz. In terms
of area, our results for comparing the Hamming weights
of two 16 bit input vectors reveal a cost of 1,074 transistors and switched-capacitor array of 0.32 pF2. Finally,
our proposed design is self-contained and does not require
any additional (external) decoding to produce the final
output.
123
Analog Integr Circ Sig Process (2013) 75:417–434
Appendix
In this appendix, we present the results of simulation-based
verification of the theoretical analyses of Sect. 4 with
regard to the negligible impact of capacitance mismatches
on the physical attributes and performance of our Hamming comparator. Assuming 0.15 lm TSMC process with
the linear oxide gradient of a = 60 ppm, oxide thickness
of 15 nm, capacitor cell width of W = 5 lm, horizontal/
vertical spacing of Sx = Sy = 0.5 lm (see Fig. 16), angle
h of 45 and 315 degrees, and unit capacitance of 0.5 pF,
variations of capacitance value as a function of bit-width is
depicted in Fig. 17.
Instantiating capacitances whose variations are according
to Fig. 17 into the design of our proposed Hamming
comparator circuit and conducting HSPICE simulations on
several technology corners, we confirmed that for bit-widths
of 16, 32, and 64, the circuit behaves within acceptable tolerances in the worst-case involving Vdd = 1.35 V (15 %
drop), temperature of 100 C, and the slow-fast or fast-slow
technology corner. Minimum voltage differences of 11.1,
6.3, and 0.98 mV are applicable to 16, 32, and 64 bit comparators, respectively. Waveforms observed in one of the
simulation runs, corresponding to the bit-width of 16, are
depicted in Fig. 18. Results for the 16 bit case are shown,
because they are most legible after reduction to a suitably
CX5
CY5
CY4
CX1
CX0
SY
CX4
SX
CY3
CX3
45°
CX2
CY2
W
CY1
CY0
Fig. 16 Worst-case linear gradient switched-capacitor arrays for
layout model arrangement
Fig. 17 Variation in unit capacitance value for different comparator
bit-widths based on linear gradient layout arrangement
Author's personal copy
Analog Integr Circ Sig Process (2013) 75:417–434
433
Fig. 18 HSPICE waveforms
for Hy kept at weight 16, while
the Hamming weight of the
other input Hx is gradually
increased up to 16. The
horizontal axis shows the time
scale in nanoseconds and the
vertical output voltage scale is
in volts. Simulation was done
using TSMC 0.15 lm
technology, under 1 GHz clock,
Vdd = 1.35 V,
Temp = 100 C, for the slowfast technology corner
small size for publication. Full conformance to the results
expected based on our theoretical analyses was observed in
all simulated cases.
References
1. Aleksander, I., & Morton, H. (1995). An introduction to neural
computing, 2nd (pp. 73–94). Oxford: International Thomson
Computer Press.
2. Parhami, B. (1991). Voting networks. IEEE Transactions on
Reliability, 40(3), 380–394.
3. Kar, B. K., & Pradhan, D. K. (1993). A new algorithm for order
statistic and sorting. IEEE Transactions on Signal Processing,
41(8), 2688–2694.
4. Chen, K. (1989). Bit-serial realizations of a class of non-linear
filters based on positive boolean functions. IEEE Transactions on
Circuits and Systems, 36(6), 785–794.
5. Karaman, M., Onural, L., & Atalar, A. (1990). Design and
implementation of a general purpose median filter unit in CMOS
VLSI. IEEE Journal of Solid-State Circuits, 25(2), 505–513.
6. Asada, K., Kumatsu, S., & Ikeda. M. (1999). Associative memory
with minimum Hamming distance detector and its application to
bus data encoding. Proc IEEE Asia-Pacific Application-Specific
Integrated Circuits Conf, p. 16.1.
7. Barral, C., Coron, J. -S., & Naccache, D. (2004). Externalized
Fingerprint Matching. Proc. 1st Int’l Conf. Biometric Authentication, 309–315.
8. Abdel-hafeez, S., Harb, S. M., Lee, K. M. On-Chip Jitter Measurement Architecture Using a Delay-Locked Loop with Vernier
Delay Line, to the Order of Giga Hertz. Proc. 18th Int’l Conf. Mixed
Design on Integrated Circuits and Systems, pp. 502-506, 2011.
9. Piestrak, S. J. Design of Self-Testing Checkers for Unidirectional
Error Detecting Codes. Scientific Papers of the Cybernetics
Institute of Wroclaw University of Technology, No. 92, Series:
Monographs No. 24, 1995.
10. Ullman, S., (1996). High-level vision: object recognition and
visual cognition (p. 5). Cambridge: MIT Press.
11. Komatsu, S., Ikeda, M., Asada, K. Bus Data Encoding with
Coupling-Driven Adaptive Code-Book Method for Low Power
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
Data Transmission. Proc. 27th European Solid-State Circuits
Conf., 2001, pp. 297-300.
King, D. B. S., Simpson, R. J., Moore, C., & MacDiarmid, I. P.
(1998). Digital n-tuple Hamming comparator for weightless
systems. Electronics Letters, 34(22), 2103–2104.
King, D. B. S., Simpson, R. J., Moore, R. J., & MacDiarmid, I. P.
(1999). Hamming value comparator hierarchies. Electronics
Letters, 35(11), 910–911.
Ikeda, M., & Asada, K. ‘‘Time-Domain Minimum-Distance Detector
and Its Application to Low Power Coding Scheme on Chip Interface,’’
Proc. 24th European Solid-State Circuits Conf., pp. 464-467, 1998.
Fujino, M., & Moshnyaga, V. G. An Efficient Hamming Distance
Comparator for Low-Power Applications. Proc. 9th Int’l Conf.
Electronics, Circuits and Systems, Vol. 2, pp. 641-644, 2002.
Pedroni, V. A. (2003). Compact fixed-threshold and two-vector
Hamming comparators. Electronics Letters, 39(24), 1705–1706.
Piestrak, S. J. (2007). Efficient Hamming weight comparators of
binary vectors. Electronics Letters, 43(11), 611–612.
Parhami, B. (2009). Efficient Hamming weight comparators for
binary vectors based on accumulative and Up/Down parallel
counters. IEEE Trans. Circuits and Systems II, 56(2), 167–171.
Abdel-hafeez, S. (2010). A New high-speed SAR ADC architecture. Proc. 11th Int’l Workshop Symbolic and Numerical Methods,
Modeling and Applications to Circuit Design, pp. 1–5, 2010.
Razavi, B. (1995). Principles of data conversion system design.
New York: Wiley.
Davidovic, M., Zach, G., & Zimmermann, H. (2010). An 11 bit
successive approximation analog-to-digital converter based on a
combined capacitor-resistor network. E & I Elektrotechnik und
Informationstechnik, 127(4), 98–102.
Abo, A. M. (1999). Design for reliability of low-voltage switched-capacitor circuits. Berkeley: University of California.
Sklar, B. (1988). Digital communications fundamentals and
applications. New Jersey: Prentice Hall.
Long, S., Wu, J., & Xia, X. (2008). A 1.8 V 3.1 mW successive
approximation ADC in system-on-chip. Analog Integrated Circuits and Signal Processing, 56, 205–211.
Gregorian, R. (1999). Introduction to CMOS Op-Amps and
comparators. New York: Wiley.
Taiwan Semiconductor Manufacturing Corp., ‘‘0.15 lm CMOS
ASIC Process Digests,’’ 2002.
123
Author's personal copy
434
27. Michaeli, L., Michalko, P., & Saliga, J. (2008). Unified ADC
non-linearity error model for SAR ADC. Measurement, 41(2),
198–204.
28. Ohnhaeuser, F., & Huemer, M. Methods to Eliminate Dynamic
Errors in High-Performance SAR A/D Converter. Proc. IEEE
Int’l Symp. Circuits and Systems, pp. 2398–2401, 2008.
29. Masuzawa, A. (2007). Design challenges of analog-to-digital
converters in Nanoscale CMOS. IEICE Transactions on Electronics, E90-C(4), 779–785.
30. Sayed, D., & Dessouky, M. Automatic Generation of CommonCentroid Arrays with Arbitrary Capacitor Ratio. Proc. Design,
Automation and Test in Europe Conf., pp. 1530–1591, 2002.
31. McNutt, M. J., LeMarquis, S., & Dunkley, J. L. (1994). Systematic capacitance matching errors and corrective layout procedures. IEEE Journal of Solid-State Circuits, 29(5), 611–616.
32. Quinn, P., & Pribytko, M. (2003). Capacitor matching insensitive
algorithmic ADC requiring no calibrations. Integration, the VLSI
Journal, 36, 211–228.
33. Hastings, A. (2001). The art of analog layout. New Jersey :
Prentice Hall.
34. Uyemura, J. P. (1999). CMOS logic circuit design. Norwell:
Kluwer.
35. Razavi, B. (2001). Design of analog CMOS integrated circuits.
New York: McGraw-Hill.
36. HSPICE, Synopsys, 2010, http://www.synopsys.com.
37. Belluomini, W., Jamsek, D., Nartin, A. K., McDowell, C.,
Montoye, R. K., Ngo, H. C., et al. (2006). Limited switch
dynamic logic circuits for high-speed low-power circuit design.
IBM J. Research & Development, 50(2/3), 277–286.
38. Agens, A., Bonizzoni, E., Malcovati, P., & Maloberti, F. (2010).
An ultra-low power successive approximation A/D converter
with time-domain comparator. Analog Integrated Circuits and
Signal Processing, 64(2), 183–190.
39. Dabbagh-Sadeghipour, K., Hadidi, K., & Khoei, A. (2006).
A New successive approximation architecture for high-speed
low-power ADCs. Int’l Journal of Electronics and Communications, 60(3), 217–223.
40. Navi, K., Maeen, M., Foroutan, V., Timarchi, S., & Kavehei, O.
(2009). A novel low-power full-adder cell for low voltage. Integration, the VLSI Journal, 42(4), 457–467.
123
Analog Integr Circ Sig Process (2013) 75:417–434
Saleh Abdel-hafeez received a
PhD in computer engineering
from the University of Texas at
El Paso and MS from New
Mexico State University. He
was a senior member of technical staff at S3 Inc. and Viatechnologies.com at the area of
mixed-signal IC design. He also
was Adjunct Professor of computer engineering at Santa Clara
University from 1998 to 2002.
He has two US patents, numbered 6,265,509 and 6,356,509
with S3 Inc. His current
research interests are in the areas of high-speed ICs, computer
arithmetic algorithms, and mixed-signal design. Dr. Abdel-hafeez is
currently the Chairman of Computer Engineering Department at
Jordan University of Science and Technology.
Behrooz Parhami (PhD, 1973,
University of California, Los
Angeles) is Professor of Electrical and Computer Engineering, and Associate Dean for
Academic Personnel, College of
Engineering, at University of
California,
Santa
Barbara,
where he teaches and does
research in computer arithmetic,
parallel processing, and dependable computing. A Fellow of
IEEE, IET, and British Computer Society, and recipient of
several other awards (including
a most-cited paper award from J. Parallel & Distributed Computing),
he has written six textbooks and more than 270 peer-reviewed technical papers. Professionally, he serves on journal editorial boards and
conference program committees and is also active in technical
consulting.