Ciência e Natura, v. 37 Part 2 2015, p. 312-319

ISSN impressa: 0100-8307 ISSN on-line: 2179-460X

# Novel design of array multiplier

Seyyed Masoud Razavi<sup>1</sup>, Seyyed Reza Talebiyan<sup>1</sup>

# <sup>1</sup>Department of Electrical Engineering, Imam Reza International University, Mashhad, Iran

#### Abstract

In this paper a new array multiplier has been proposed, which has lower power consumption than the regular array multipliers. This technique has been applied on two conventional and leapfrog array multipliers. In the formation of  $8 \times 8$  multiplier all designs proposed in this paper have been implemented using the HSPICE by the use of 180 nm TSMC technology at a supply voltage 1v. To verify the performance of the proposed structures, structures have been simulated in 130 nm & 65 nm PTM technologies. The simulation results show that applying the return technique in the array structures causes power consumption reduction and consequently PDP reduction. This improvement for 180 nm technology in the conventional array structure is 13.32 % and in the leapfrog array structure is 23.27 %. It should be noted that this technique substantially makes the number of transistors less and as a result area reduction.

Keywords: Array multiplier, return technique, return leapfrog array multiplier, power, delay

ciênciaenatura

### **1** Introduction

any application systems such as digital signal processing systems require the processing of large amounts of digital data (Chong, Bah-Hwee, & 2004; Mathew, Latha, Chang, Ravi, & Logashanmugam, 2013; Ravi, Rao, & Prasad, 2011; Ravi, Subbaiah, Prasad, & Rao, 2011; Srivastava, Vishant, Singh, & Nagaria, 2013). To implement algorithms such as convolution and because of various filters, multiplication operation unit is placed in digital signal processor systems. In many algorithms, multiplication is considered as the critical path and consequently the most critical operations (Mathew et al., 2013). In recent years, researchers have put emphasis on three fields of power, speed, and area (Ravi, Rao, et al., 2011). The need for specific design causes the increase of consumption power and the number of transistors on chip as well. Therefore, power is the most important field among those three. In order to achieve the high operating speed the most suitable structure is array multipliers which are in good order and lead to the ordered arrangement of layout (Mathew et al., 2013). This paper focuses on power consumption and a new method for array multipliers has been proposed which can reduce the power and the area as well. In the second section, the mathematical relationships and the multiplication algorithm of two 8-bit numbers have been explained. In the third section, two conventional array multiplier and leapfrog structures have been analyzed and in the fourth section, by applying return technique on two structures, a new design has been done. In the fifth section, how to perform the simulation process about the best selection and result presentation is examined and finally in the sixth section, a general conclusion has been made about the work done.

### 2 Parallel Multiplier

A serial multiplier consumes less power but due to ripple, delay will be more. In parallel multiplier delay is less but high complex circuitry it consumes more power.

Consider the multiplication of two unsigned n-bit numbers, where  $X = x_{n-1}, x_{n-2}, ..., x_0$  is the multiplicand and  $Y = y_{n-1}, y_{n-2}, ..., y_0$  is the multiplier. The product of these two bits can be written as (Mathew et al., 2013; Ravi, Rao, et al., 2011; Ravi, Subbaiah, et al., 2011).

$$X = x_{n-1} x_{n-2} \dots x_0 = \sum_{i=0}^{n-1} x_i 2^i$$
(1)

$$Y = y_{n-1}y_{n-2}....y_0 = \sum_{j=0}^{n-1} y_j 2^j$$
(2)

$$\begin{split} P &= XY = \left(\sum_{i=0}^{n-1} x_i 2^i\right) \left(\sum_{j=0}^{n-1} y_j 2^j\right) \\ &= \sum_{i=0}^{n-1} \sum_{j=0}^{n-1} x_i y_j 2^{i+j} \end{split} \tag{3}$$

In the example discussed in this paper are 8-bit multiplicand and multiplier. Using equation 3, 8 rows of partial product as shown in Figure 1 has been shown to be produced.

|                 |                 | 25              | $X_{\gamma}$        | X               | X.5             | $X_4$           | $X_3$           | $X_2$             | $\mathbf{X}_{1}$  | $X_0$           |
|-----------------|-----------------|-----------------|---------------------|-----------------|-----------------|-----------------|-----------------|-------------------|-------------------|-----------------|
|                 |                 | *               | $Y_7$               | $Y_6$           | $Y_5$           | $Y_4$           | $Y_3$           | $Y_2$             | $\mathbf{Y}_{1}$  | $\mathbf{Y}_0$  |
|                 |                 |                 | P07                 | P <sub>06</sub> | P <sub>05</sub> | $P_{04}$        | P <sub>03</sub> | P <sub>02</sub>   | $\mathbf{P}_{01}$ | P <sub>00</sub> |
|                 |                 | P <sub>17</sub> | $\mathbf{P}_{\!16}$ | P <sub>15</sub> | P <sub>14</sub> | P <sub>13</sub> | P <sub>12</sub> | $\mathbf{P}_{11}$ | $P_{10}$          |                 |
|                 |                 |                 |                     | •               |                 | 1               |                 |                   |                   |                 |
| P <sub>77</sub> | P <sub>76</sub> | P <sub>75</sub> | P <sub>74</sub>     | P <sub>73</sub> | P <sub>72</sub> | P <sub>7</sub>  | P <sub>7</sub>  | 0                 |                   |                 |
| S <sub>15</sub> | S <sub>14</sub> | S <sub>13</sub> |                     |                 | •••             |                 |                 | $S_2$             | S                 | S <sub>0</sub>  |

Figure 1: 8×8 array multiplication algorithm.

### **3 Array Multiplier**

In this section conventional and leapfrog array multipliers will be reviewed briefly. It will be a point for our design in section 4.

#### 3.1 Conventional Array Multiplier

The block diagram of a 8×8-bit conventional array multiplier is shown in Figure 3. In the conventional array multiplier (Chong et al., 2004; Mahant-Shetti, Balsara, & Lemonds, 1999; Ravi, Rao, et al., 2011), output signals (Sum and Carry) of the carry save adders (CSAs) are directly connected to the next row of CSAs. Finally, in order to produce an 8-bit value most significant bit (MSB) of the ripple carry adder (RCA) n-bits is used.

### 3.2 Leapfrog Array Multiplier

The block diagram of a 8×8-bit leapfrog array multiplier is shown in Figure 4. In the leapfrog array structure (Chong et al., 2004; Mahant-Shetti et al., 1999), on the other hand, the interconnections of the CSAs are rearranged such that the propagation delay of the CSAs is better synchronized within the intermediate rows. This potentially results in higher speed and lower spurious switching (lower power dissipation) because the carry signal of the full adder is generally generated earlier than the sum signal of the same full adder. To take advantage of this, instead of connecting the sum outputs of the CSAs in row 1 to the CSAs in row 2 (as in a general array structure), the sum outputs of the CSAs in row 1 are instead connected to the CSAs in row 3. The carry signals of the CSAs in row 1, however, remain connected to the CSAs in row 2. Put simply, in a leapfrog array structure, the arrival times of carry (from row 2) and sum signals (from row 1) are better synchronized to the CSAs in row 3. Consequently, this results in higher speed (for data propagation) and lower spurious switching (less power dissipation) (Chong et al., 2004; Mahant-Shetti et al., 1999).

### 4 Return Technique

By using return technique in these structures, addition operation is done through two cycles. For the first cycle, the addition operation on the first four rows of partial products is also done and for the second cycle, the addition operation on the second four rows of partial products and on the final result of the first cycle is done. In Figure 2, the multiplication algorithm of two 8-bit numbers is shown by applying return technique.



Figure 2: 8-bit multiplication algorithm by applying a return technique.

#### 4.1 Return Conventional Array Multiplier

The block diagram of the return conventional array multiplier is shown in Figure 5. In the structure the number of full adder rows is reduced to half than conventional array multiplier and a row of registers for saving the outputs of the last full adder row for the first cycle and returning them for the second cycle to the input of the first full adder row, are used. T-1...T-4 are 1-bit registers and T-0...T-7 are 2-bit registers. T-1...T-4 registers for every two cycles include the 8-bit least significant bit (LSB) of the final product. In this structure, if the 8-bit LSB are considered as two groups, for first cycle, First 4 bit of the final product are produced and saved in T1...T4 registers, and the sum of the first 4 rows partial product are saved in the T0...T7 registers, and are returned to the input of the first row of full adder for the second cycle, and they are added to the second 4 rows of partial product. The second 4 bit of the LSB of the final product are produced for the second cycle and saved in the T1...T4 registers. The saved bits on T0...T7 registers are applied to the final stage of full adder (CRA). So that the 8 bit MSB of the final product are produced.



# 4.2 Return Leapfrog Array Multiplier

The main structure presented in this paper which has the lowest consumption power is the return leapfrog array multiplier structure. In Figure 6, block diagram of the return leapfrog array multiplier is shown. In this structure, the length of the first adder row is n-bit which is equal to the length of multiplicand, and the length of the next three rows is n+1bit. The addition of a full adder to these three rows is for adding the output of previous row sum and the leapfrog sum of the previous two rows as well. Because of leapfrog, in this structure two register rows are used. The number of these registers in the first row is 3/2n, which T1-1...T1-4 registers are single-bit and T10...T17 registers are two-bit and include output carry of the last row of adder and the sum output of penultimate row. The length of these registers in the second row is n-bit which includes T20....T27 and consists of the first registers, carry of single adder in the fifth row, and the rest of registers consist of the sum output of the last row (fourth row) adder.



righte 0. block diagram of return leapinog array multipli

The performance of this structure the same as the previous one, stands for two cycles. For the first cycle 4-bit of first LSB and for the second cycle 4-bit of second LSB are also produced in the output of T1-1...T1-4 registers. For the first cycle, sum output of the third row and the fourth row carry of adders are saved in T10...T17 registers and will be returned to the first row of adder. The sum output of adder last row will be saved in T21...T27 registers and fifth row carry will be saved in T20 and will be returned to the adder second row to be also added with the second category of partial product rows for the second cycle. The last stage of this structure, the same as the structure of current leapfrog array multiplier, consists of a row of CSA and a row of CRA. For the second cycle, T10...T17 and T21...T27 registers output are applied in a row of CSA in order to decrease the number of product rows. Finally a CRA row is used to produce the final result. The full adder and register architecture used in this paper are shown in Figures 7(a) and 7(b).

In fact, in these two structures, due to the reduction of the number of full adder rows, area and consequently consumption power also decrease.



Figure 7: (a) Full adder architecture and (b) C2CMOS flip flop architecture (Tambat & Lakhotiya, 2014).

### **5** Simulation Results

The simulation in this paper was performed by HSPICE software and by means of 180nm TSMC and 130nm & 65nm PTM libraries and in the form of multiplying two 8-bit numbers. In order to show the suitable performance and very low consumption power of the designed return leapfrog array multiplier structure, this structure along with the return conventional array multiplier structure designed were and simulated by full adder cell and register which is presented in figures 7(a) and 7(b) of this paper. The results of simulation have been shown in the

following tables by means of different libraries. In all technologies, the related results of the performance of each structure are presented in front of it first Real values and then, normalized values. It should be noted that, the normalization process was performed separately for each structure. Since the main discussion in this paper is on the array structure, in this simulation assuming that all individual bits of partial product have been previously produced, delay and consumption power are only related to the array structure and were calculated. In all technologies, the return leapfrog array multiplier structure in comparison to the rest of proposed structures has the least PDP.

| technology            |            |           |        |        |             |  |
|-----------------------|------------|-----------|--------|--------|-------------|--|
| Multipliers           | Parameters | Avg.Power | Delay  | PDP    |             |  |
| (180nm)               |            | (E-6)     | (E-9)  | (E-15) | No. of      |  |
|                       |            |           |        |        | Transistors |  |
|                       | Real       | 3.1260    | 1.4806 | 4.6283 | 2736        |  |
| CAM                   | Normalized | 1         | 1      | 1      | 1           |  |
|                       | Real       | 1.6524    | 2.4279 | 4.0119 | 1760        |  |
| CAM <sub>Return</sub> | Normalized | 0.5285    | 1.6398 | 0.8668 | 0.6432      |  |
|                       | Real       | 2.3131    | 1.7486 | 4.0447 | 3344        |  |
| LAM                   | Normalized | 1         | 1      | 1      | 1           |  |
| LAM Return            | Real       | 1.6601    | 1.8696 | 3.1037 | 2350        |  |
| Return                | Normalized | 0.7176    | 1.0691 | 0.7673 | 0.7027      |  |

Table 1: The results of the simulation of array and return array multiplier by using the 180 nm

| Multipliers           | Parameters | Avg.Power | Delav  | PDP    | No. of      |
|-----------------------|------------|-----------|--------|--------|-------------|
| (130nm)               |            | (E-4)     | (E-10) | (E-14) | Transistors |
|                       | Real       | 2.8774    | 2.6334 | 7.5775 | 2736        |
| CAM                   | Normalized | 1         | 1      | 1      | 1           |
|                       | Real       | 1.6507    | 4.4419 | 7.3324 | 1760        |
| CAM <sub>Return</sub> | Normalized | 0.5736    | 1.6867 | 0.9676 | 0.6432      |
|                       | Real       | 2.1033    | 4.4923 | 9.4487 | 3344        |
| LAM                   | Normalized | 1         | 1      | 1      | 1           |
|                       | Real       | 1.5959    | 3.5190 | 5.6161 | 2350        |
| LAM Return            | Normalized | .07587    | 0.7833 | 0.5943 | 0.7027      |

Table 2: The results of the simulation of array and return array multiplier by using the 130 nm technology

| Table 3: The results of the simulation of a | array and return | array multiplier b | y using the 65 nm |
|---------------------------------------------|------------------|--------------------|-------------------|
| toc                                         | hnology          |                    |                   |

| Multipliers           | Parameters | Avg.Power | Delay  | PDP    | No. of      |  |
|-----------------------|------------|-----------|--------|--------|-------------|--|
| (65nm)                |            | (E-4)     | (E-10) | (E-14) | Transistors |  |
|                       | Real       | 1.7218    | 1.4746 | 2.5390 | 2736        |  |
| CAM                   | Normalized | 1         | 1      | 1      | 1           |  |
|                       | Real       | 0.9972    | 2.3547 | 2.3482 | 1760        |  |
| CAM <sub>Return</sub> | Normalized | 0.5791    | 1.5968 | 0.9248 | 0.6432      |  |
|                       | Real       | 1.2867    | 2.5073 | 3.2261 | 3344        |  |
| LAM                   | Normalized | 1         | 1      | 1      | 1           |  |
|                       | Real       | 0.97995   | 1.8475 | 1.8104 | 2350        |  |
| LAM Return            | Normalized | 0.7615    | 0.7368 | 0.5611 | 0.7027      |  |

The frequency of return structures by means of different libraries are as follows:

 $F_{max_{180nm}} = 2 \text{ Mhz}$   $F_{max_{130nm}} = 266,66 \text{ Mhz}$  $F_{max_{65nm}} = 400 \text{ Mhz}$ 

### **6** Conclusion

The simulation results show that applying the return technic in the array structures cause power consumption reduction and consequently PDP reduction. This improvement for 180 nm technology in the conventional array structure is 13.32 % and in the leapfrog array structure is 23.27 %. It should be noted that this technic substantially makes the number of transistors less and as a result area reduction. This reduction, for leapfrog array structure is 29.73 % and for conventional array structure is 35.68 %.

### References

- Chong, K.-S., Bah-Hwee, G., & Chang, J. S. (2004, 23-26 May 2004). A low power 16-bit Booth Leapfrog array multiplier using Dynamic Adders. Paper presented at the Circuits and Systems, 2004. ISCAS '04. Proceedings of the 2004 International Symposium on.
- Mahant-Shetti, S. S., Balsara, P. T., & Lemonds, C. (1999). High performance low power array multiplier using temporal tiling. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 7(1), 121-124. doi: 10.1109/92.748208
- Mathew, K., Latha, S. A., Ravi, T., & Logashanmugam, E. (2013). Design and Analysis of Array Multiplier using an Area Efficient Full Adder Cell in 32nm CMOS Technology. International Journal of Engineering and Science, 2(3), 8-16.
- Ravi, N., Rao, D. T., & Prasad, D. T. (2011). Performance Evaluation of Bypassing Array Multiplier with Optimized Design.

International Journal of Computer Applications (0975–8887) Volume.

- Ravi, N., Subbaiah, Y., Prasad, T. J., & Rao, T. S. (2011). A novel low power, low area array multiplier design for DSP applications. Paper presented at the Signal Processing, Communication, Computing and Networking Technologies (ICSCCN), 2011 International Conference on.
- Srivastava, P., Vishant, V., Singh, R. K., & Nagaria, R. K. (2013, 12-14 April 2013).
  Design and implementation of high performance array multipliers for digital circuits. Paper presented at the Engineering and Systems (SCES), 2013 Students Conference on.
- Tambat, R. V., & Lakhotiya, S. A. (2014). Design of Flip-Flops for High Performance VLSI Applications using Deep Submicron CMOS Technology.

~~~~~~~~~~~~~~~~