adeel@jhu.edu
spanda@bme.jhu.edu
vls@jhu.edu
Layout of the entire chip with padframe included.
We present the design of a chargemode CMOS Imager Chip which computes Discrete Wavelet Transform through focal plane image processing. The architecture uses Haar pyramid decomposition [1], which is the simplest tool for multiresolution image analysis The Haar computation can be realized as a 2D Matrix Vector Multiplication problem [2], with multiplication achieved by presenting bitserial inputs to a twostage correlated double sampling (CDS) circuit. With the proposed architecture, it is also possible to compute other unitary image transforms like for example Walsh Transform and Hadamard Transform. In addition it can be used as a random access imager [4], where each pixel can be addressed independently. The entire chip is implemented on a 0.35um CMOS process with 16x24 active pixel sensors (APS) array. The initial prototype is laid out on a 1.5mmx1.5mm padframe.
Keywords: Focal plane image processing, Wavelet transform, Multiresolution image analysis
DWT extracts information from a signal at different scales. The first level of decomposition captures the highest frequency components of a signal, while the second and later decompositions extract progressively coarser level information (lower frequency components). In this way, it can be regarded as a filter bank with varying levels of resolution. Two filters that are needed here, one highpass and one lowpass. A computationally efficient way to implement such a filter bank is to regard sums and differences as lowpass and highpass filters. In this way, the output of a highpass filter is the DWT coefficient while the output of the lowpass filter is fed to the next stage for higherorder coefficients, as shown in Figure 1.


Figure 1: Basic pyramid decomposition 
Figure 2: Haar basis functions 
Haar is the simplest transform for computing DWT. The haar algorithm takes sums and differences of pixels in a given image at different resolutions. As shown in the Figure 2, a family of Haar basis functions can be obtained by simply translating and dilating the mother wavelet.
For a 2D image, the Haar wavelet decomposition can be defined as a product of inner and outer coefficients i.e.: W_{
(i,j)} = (ΣA_{(i)}×P_{(i,j)})×B_{(j)} Where A_{(i)}’s are inner product multiplicands, B_{(i)}’s are outer product multiplicands and P_{(i,j)} is the value at each pixel. In Haar, these multiplicands can assume three values: +1, 0 or 1. Therefore we are either adding all the pixel values, or taking their differences or not getting any contribution from them at all. For a 4×4 order image, the complete wavelet transform coefficients are shown in Figure 3. The four pixels on the bottom
left contain vertical edge information. Four pixels on the top right contain
horizontal edge information and pixels at bottom right contain diagonal edge
information. It also shows that the pixel that represents zerofrequency
component (DC value) of the image is computed by simply adding all the pixels
together. The next successive coefficients will involve summing and taking
differences with finer resolution. 


Figure 3: Haar decomposition for a 4 x 4 image 
The chip consists of three main components: a 16×24 pixel array, two correlated double sampling (CDS) stages and inner/outer product shift registers array at the periphery. Inner product coefficients are provided through rows and outer product coefficients are provided in columns. The first CDS bank is used to sum A_{(i)}×P_{(i,j)} across every columns. Each column has its own CDS and there are a total of 24 CDS circuits in the first stage. In the second CDS stage, charges across all columns are multiplied with Bi’s and their result added together. So only one CDS is needed in the second stage. Finally the output of this CDS is sent offchip, as analog output. In this way, for one frame, the entire wavelet spectrum can be obtained in 16×24 clock cycles. Block diagram of the entire chip is shown Figure 4. A screenshot of the complete schematic is shown here. 


Figure 4: Toplevel block diagram of the entire chip 
The pixel circuit diagram and layout are shown in Figure 5. The capacitor used is of 20f F. It operates as a charemode device and therefore the voltage discharge from the APS is directly proportional to the light intensity. The capacitor in each cell is needed to sum all the charges across a column along a single wire. Inner product multiplicand, A is provided at the gate of M5. If A is kept low during precharge phase of CDS and high during evaluation phase, the APS dumps a charge 1_{)}×P_{(i,j)} on the common line. Likewise it dumps a charge P_{(i,j)} if A is high in precharge and low during evaluation. No charge is dumped if A is held constant. Since, multiple APS cells are active at a time along a column, so a biasing transistor M2 has to be placed inside each cell. Each cell has a fill factor of approximately 60%, with dimensions of 89λ×89λ.


Figure 5: Pixel circuit diagram and layout
The output of all the pixels along a column is sent to a correlated double sampling circuit. The CDS circuit is a basic sample and hold cell, and its layout is shown in Figure 6. CDS circuits are generally used in imagers to reduce the effect of fixed pattern noise. In this design, we are using CDS to compute difference of charge dumped across common line for two different phases of A. The capacitors used here are 320f F, because they need to store charge dumped by 16 cells. In order to get a high gain, we are using a cascode inverter instead of ordinary inverter. For the output, depending upon which phase B_{(i)}’s are in, the output capacitor is either connected to CDS output or V_{ref}. This is done during precharge/evaluation phase of the second CDS stage. Also worth noting is that width of CDS circuit is also 89λ, so it abuts exactly with the flipflops.


Figure 6: CDS circuit diagram and layout
There is some digital circuitry at the periphery of the chip which is mainly composed of an array of shift registers, for feeding different phases of A_{(i)}’s and B_{(i)}’s in a bitserial fashion. The layout of the shift register had to be made very compact, since that was the main bottleneck in having high pixel density. Currently each flip flop fits inside 89λ×154λ. The dimension that is of more importance is 89λ, because it affects width and height of pixels and essentially the resolution of the imager. The clocks for these are generated through a simple circuit that generates two nonoverlapping clocks, given one clock input. The schematic of each flipflop is available here, and its layout is available here. The advantage of using row and column shift registers is that they take up far less area than decoder. In addition, the design is more scalable because in case of decoders the circuit complexity would increase exponentially. Schematic and layout of row and column shift registers is shown in Figure 7. Different phases of product coefficients are selected through multiplexors.
We need to store two phases of inner product coefficients and four phases of outer product coefficients. So there are two shift register arrays for rows and four shift register arrays for columns.


Figure 7: Row and column shift registers
The simulation results obtained by simulating one pixel of the chip are shown in Figure 1. Here is the description of signals:
net34 
Output of one pixel. Voltage which represents the charge dumped by a cell (Figure 5) 
net74 
Voltage across capacitor of CDS stage 1 (Figure 6) 
S1 
Sample of stage 1 (Figure 6) 
H1 
Hold of stage 1 (Figure 6) 
Rst 
Reset signal for the APS. Notice that in order to ensure that we experience same discharge across photodiode, we have to present two reset signals for both the phases of A (Figure 5) 
A 
Inner product coefficient (It goes from 0 to 1 so we are multiplying by 1) (Figure 5) 
Here we are approximating a photocurrent of 60pA. All the clocks are operating at a frequency of 11 kHz. Sequencing of all the clocks is very important, so we should compute outer products only when we are in hold phase of inner product CDS circuit.
Figure 8: Simulation results for a single pixel
The simulation trace obtained after simulating 4×4 pixel array is shown in Figure 9. Here is the description of signals:
net75 
Wavelet coefficient, voltage across capacitor of the second CDS circuit. It is valid only during the time when hold of the second stage is high 
H2 
Hold of stage 2 
S2 
Sample of stage 2 
B_i 
The component of outer product coefficient which controls CDS1 output connection to its output capacitor (Figure 6) 
B_i_1 
Second component of B_i_1 which controls Vref connection to output connection 
The outer product coefficients, Bi’s are presented as two components: one controls transmission gate which connects CDS stage 1 buffer to output capacitor. The second component controls transmission gate which connects Vref to output capacitor. The configuration shown in the diagram multiplies inner product with +1. If it was required to multiply with 1, we would have asserted both the components in opposite phases.
Figure 9: Simulation results for a 4×4 array.
From simulation results, the range of operation of the output voltage was between approximately between 1.76 to 4.23 volts, with reference at 2.96 volts. The coefficients below reference indicate negative values. For further confirming that the chip is operating well, we covered some cases of adding and subtracting pixel values. The results were satisfactory, although small amount of error was observed. Our understanding is that it was due to charge injection in the APS cell.
1. 
Vref_SH1 
CDS Stage1 reference voltage 
2. 
Vref_aps 
Reference voltage of pixel cell 
3. 


4. 


5. 
gnd 
Ground 
6. 


7. 


8. 


9. 


10. 
A_phi1 
2^{nd} phase value of inner product coefficient 
11. 
Rst 
Reset signal of APS cell 
12. 
Vbias 
Biasing voltage of APS cell 
13. 
A_sel 
Input to the multiplexor which selects current phase of inner product coefficient 
14. 
A_phi0 
1^{st} phase value of inner product coefficient 
15. 
vdd 
Vdd 
16. 
VBP1_SH2 
Biasing voltage of cascade inverter, CDS 2^{nd} stage 
17. 
VBP2_SH2 
Biasing voltage of cascade inverter, CDS 2^{nd} stage 
18. 
VBN1_SH2 
Biasing voltage of cascade inverter, CDS 2^{nd} stage 
19. 
S2 
Sample of CDS 2^{nd} stage 
20. 
H2 
Hold of CDS 2^{nd} stage 
21. 
Vref_SH2 
Reference voltage, CDS 2^{nd} stage 
22. 
Wout 
Output (Wavelet coefficient) 
23. 
Prg_enb 
Controls shifts registers latching 
24. 
Prg_clk 
Clock for programming shift registers 
25. 
B_sel 
Selects phase of the outer product coefficients 
26. 
B_zero 
Used to isolate CDS stage 1 from CDS stage 2 during precharge and evaluate of CDS stage 1 
27. 
Bi_Phi0 
1^{st} phase of Bi 
28. 
Bi_phi1 
2^{nd} phase of Bi 
29. 
Bi_1_phi0 
1^{st} phase of Bi_1 
30. 
Bi_1_phi1 
2^{nd} phase of Bi_1 
31. 
Phi_1_inv 
Output clock (for testing) 
32. 
Phi_1 
Output clock (for testing) 
33. 
Phi_2_inv 
Output clock (for testing) 
34. 


35. 


36. 
H1 
Hold of CDS stage 1 
37. 
S1 
Sample of CDS stage 1 
38. 
VBN1_SH1 
Biasing voltage of cascade inverter, CDS 1^{st }stage 
39. 
VBP2_SH1 
Biasing voltage of cascade inverter, CDS 1^{st }stage 
40. 
VBP1_SH1 
Biasing voltage of cascade inverter, CDS 1^{st }stage 
[1] R. Gonzalez and R. Woods, “Digital Image Processing,” AddisonWesley, 1992.
[2] R. Genov and G. cawenberghs, “Chargemode parallel architecture for vectormatrix multiplication,” IEEE Transactions on Circuits and SystemsII, Analog and Digital Signal Processing, Vol. 48 pp 930936, Oct. 2001.
[3] S. Decker, R. Mcgrath, K. Brehmer and C. Sodini, “A 256x256 CMOS imaging array with wide dynamic range pixels and columnparallel digital output,” IEEE Journal of SolidState Circuits, vol. 33, pp 20812090, Dec 1998.
[4] O. Y. Pecht, R. Ginosar and Y. S. Diamand, “A random access photodiode array for intelligent image capture,” IEEE Transactions on Electron Devices, vol. 38, pp 17721780, Aug 1991.
[5] T. Lule, S. Benthien, H. Keller, F. Mutze, P. Rieve, K. Seibel, M. Sommer and M. Bohm, “Sensitivity of CMOS based imagers and scaling perspectives,” IEEE Transactions on Electron Devices, vol. 47, pp 21102122, Nov 2000.
[6] M. Cohen, G. cawenberghs, “Image sharpness and beam focus VLSI sensors for adaptive optics,” IEEE Sensors Journal, vol. 2, pp 680690, Dec 2002.
[7] A.. Grzeszczak, M. K. Mandal, S. Panchanathan, “VLSI Implementation of Discrete Wavelet Transform,” IEEE Transactions on VLSI Systems, vol. 4, pp 421433, Dec 1996.
[8] V. Gruev and R. EtienneCummings, “Implementation Of Steerable Spatiotemporal Image Filters on the Focal Plane,” IEEE Trans. Circuits and SystemsII, Vol. 49, No. 4, pp. 233244, Apr 2002.
[9] J.E. Franca and Y. Tsividis, “Design of AnalogDigital VLSI Circuits for Telecommunications and Signal Processing”, PrenticeHall, 2nd Edition, 1994