



# Design and Implementation of a Multiply-Accumulate (MAC)unit

Neethu Johny<sup>\*1</sup>, Divya Rajan<sup>2</sup>

\*12Senior Assistant Professor, ECE Department, New Horizon College of Engineering, Bengaluru, India neethuj@newhorizonindia.edu1

# ABSTRACT

This paper studies the data-path and VLSI implementation of multiply accumulate (MAC) unit. MAC unit performs multiplication and accumulation process and is an important operation in many of the digital signal processing (DSP) applications. The multiplier is designed using Wallace multiplier and the adder is designed as a carry look ahead adder. The performance analysis of MAC unit is done in terms of area and delay. The design of the MAC model is done in Verilog HDL. The MAC unit is then simulated and synthesized in Xilinx ISE 14.7 for Artix 7 family and the performance analysis is done in terms of area and delay. **Keywords :** Accumulate; High Performance; Carry Look Ahead Adder; Wallace

## I. INTRODUCTION

A MAC unit performs extensive data manipulation and complex mathematical operations in various DSP applications, image and video processing etc. It lies in the decisive path of a system and plays a vital role in determining the overall operational speed and power of the hardware.. MAC unit is a fundamental block in the computing devices, especially Digital Signal Processor (DSP). Recently, there has been an explosive growth in the development of portable communication devices like mobile phones, IPADS and note books in the field of semiconductor design. Modern computers usually contain a dedicated MAC unit which comprises of a multiplier followed by an adder implemented using some combinational logic with a register to store the results. These real time processing systems perform high computational operations, mainly in the form of Multiply Accumulate (MAC) and butterfly. However, these

systems consume high power and are characterized by high data throughput rate.

MAC is а major component used in communication systems like OFDM based wireless devices, Wireless Code Division Multiple Access (WCDMA), base station receivers, channel estimators and so on. Low power architecture design becomes crucial in MAC block. The architecture selection for MAC unit generally depends upon the type of applications.

Recursive architecture:- For embedded microprocessor or micro controller applications the memory usage is limited and the operand size is also small. Recursive architecture is suitable, when power and area is important. This recursive MAC unit is used in image processing application such as Fast Fourier Transform(FFT) and digital filtering.

Parallel architecture:-For high performance applications like notepads, laptops and desktops require large set of data computation. Shared segmented architecture:- In order to perform multi-mode logic dependent operation, where the speed and power constraint is considered in which is mainly used in embedded medical equipment and in communication systems, such as Orthogonal Frequency Division Multiplexing (OFDM) based wireless devices, sub carrier frequency domain operations, channel estimator and carrier synchronizer.

The paper is organized as follows. Section II describes the MAC unit architecture. Section III discuss about the results obtained and section IV concludes the paper.

## **II. MAC ARCHITECTURE**

The MAC unit basically supports multiply – accumulate operations of signed, unsigned and signed fractional operands. MAC architecture consists of multiplier, adder and an accumulator to reduce delay and improve the speed of the MAC. The product of the two input number are computed first and the result is forwarded for addition or accumulation. If both the computing is executed in a single rounding then it is referred to as a fused MAC Unit. The generated final results of the MAC unit are stored in adequate memory locations.

Multipliers in MAC are usually complex circuits and must operate at high system clock rate. Reducing the delay of multiplier is an essential part, in order to satisfy the overall design performance. An adder or summer is a digital circuit used to add binary numbers. They are also provisioned to add/ subtract signed numbers. A Carry Look Ahead adder is used here to greatly reduce the carry propagate delay and to improve the speed of operation.

Main goal of MAC is to increase the speed which in turn should decrease the delay and consume less power. MAC is always a key element to achieve a high-performance digital signal processing application for real time signal processing applications.



Fig 1: MAC unit

The various sub blocks of MAC unit are discussed in the following sections.

## A) MULTIPLIER

In today's digital signal processing field binary multipliers plays a very important role. Addition and multiplication of two binary numbers are the basic and most commonly used arithmetic operations. It is an important arithmetic operation which consumes considerable power and takes up a large area in the architecture.70% of instructions in microprocessors and most of the DSP systems use multiplication as the basic arithmetic operation which greatly affects the execution time.

|   |   |   |   | 1 | 0 | 1 | 0 | 1 | 0 |   | Multiplicand     |
|---|---|---|---|---|---|---|---|---|---|---|------------------|
| X |   |   |   |   |   | 1 | 0 | 1 | 1 |   | Multiplier       |
|   |   |   |   | 1 | 0 | 1 | 0 | 1 | 0 |   |                  |
|   |   |   | 1 | 0 | 1 | 0 | 1 | 0 |   |   |                  |
|   |   | 0 | 0 | 0 | 0 | 0 | 0 |   |   |   | Partial products |
| + | 1 | 0 | 1 | 0 | 1 | 0 |   |   |   |   |                  |
|   | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | ) | Result           |

Fig 2: Operation of multiplier with partial products

Multiplication of binary numbers is usually implemented by using repeated addition and shift operations as shown the fig 2. Some binary adders are designed to perform the addition operation for only 2 binary numbers at a time instead of adding all partial products and resulting in increased delay and power consumption. Therefore Multiplication dominates the execution time of the most DSP, and hence it determines the overall performance of the system. Thus this paper adopts some advanced multiplier designs in the MAC unit to enhance the speed of the system.

#### WALLACE TREE MULTIPLIER

hardware efficient А fast process and implementation for multiplication of two numbers was established by Australian computer scientist Chris Wallace. In the Wallace tree technique, there are three bits which are passed to a one bit full adder which is known as a three input Wallace tree circuit, and one of the output of the full adder is sum which is supplied to the next stage full adder of the same bit. And the carry output signal is passed to the next stage full adder of the same no of bit, and the carry output signal there is supplied to the next stage of the full adder which is located at a one bit higher position. The circuit design is not easy in the Wallace tree technique, although the speed of the operation is high. The addition is performed using a Carry Look Ahead adder to increase the speed of addition.

#### B) CARRY LOOK AHEAD ADDER

The Ripple carry Adder is the simplest implementation with low power consumption and compact layout. However the delay of a RCA is directly proportional to the number of input bits thereby limiting the performance of the adder. A Carry Look ahead-Adder was developed by Weinberger and Smith .The Carry Look Ahead logic is based on generating and propagating carry. It allows the circuits to pre - process the input bits being added to predict the carry ahead of time thereby eliminating the wait time. A Carry Look ahead Adder (CLA) is superior to conventional full adder in terms of speed which is the most important factor in the digital circuit. During the addition of two binary numbers, the sum is not obtained instantaneously as the gates inside the adder circuits take some time to produce the output which is the

propagation delay and this delay is different for sum and carries output. The delay in producing the final carry is large in a conventional ripple carry adder as it has to pass through a long carry chain. But in CLA, using the generate and propagate terms, it can predict the carry in advance which enhances the parallel addition and reduces the carry propagation time thereby increasing the speed of addition.

The Propagate P and generate G in a full-adder with inputs A and B are given as:

Pi = Ai  $\oplus$  Bi Carry propagate

Gi = Ai.B i Carry generate

Here it is observed that both the propagate and generate signals are dependent only on the input bits and hence can be deduced from after one gate delay.

The CLA adder is one of the fastest schemes used for the addition of two numbers. The Carry Look Ahead Adder uses modified full adders for each bit position. However the circuitry gets complicated as the number of variables increase and the increased hardware make it a costlier option



Fig 3: Carry look ahead adder

# C) ACCUMULATOR

Accumulator is a register which stores the value. Depending upon the clock and reset the accumulator works and stores the data for every clock pulse.

## III. RESULTS AND DISCUSSION

The following table shows the Synthesis result for the target device xc7a100t-3-csg324 for the study on Mac unit.

Table 1: Synthesis report for the target device xc7a100t-3-csg324

| Number of Slice Registers | 8       |
|---------------------------|---------|
| Number of Slice LUTs      | 49      |
| Number of IOs             | 34      |
| Delay                     | 4.68 ns |

The following figures shows the simulation result of carry look ahead adder, Wallace tree multipliers and the MAC unit.



Fig 4: Simulation result of Carry look ahead adder



Fig 5: Simulation result of Wallace multiplier

| •             |          | 63.505 ns |          |    |          |      |          |          |  |  |
|---------------|----------|-----------|----------|----|----------|------|----------|----------|--|--|
| ie            | Value    | 10 ns     | 20 ns    |    | 40 ns    | 60 r | s<br>    | 180 ns   |  |  |
| 🔓 reset       | 0        |           |          |    |          |      |          |          |  |  |
| 🔓 clk         | 1        |           |          |    |          |      |          |          |  |  |
| org[3:0]      | 0010     | 0000      | X        |    | 0010     |      | χ        | 0100     |  |  |
| 🕈 m[3:0]      | 0100     | 0000      | X        |    |          |      | 0100     |          |  |  |
| 🛃 r[7:0]      | 00001000 | 0000000   | 0 )(     |    | 00001000 |      | )(       | 0010000  |  |  |
| 🖁 out[7:0]    | 00100000 | XXXXXXXXX | 00000000 | 00 | 00010000 | 00.  | 00101000 | 01010000 |  |  |
| 🕈 macout[7:0] | 00011000 | X000000X  | 00000000 |    | 00001000 |      | 00011000 | 01000000 |  |  |
|               |          |           |          |    |          |      |          |          |  |  |
|               |          |           |          |    |          |      |          |          |  |  |
|               |          |           |          |    |          |      |          |          |  |  |
|               |          |           |          |    |          |      |          |          |  |  |
|               |          |           |          |    |          |      |          |          |  |  |
|               |          |           |          |    |          |      |          |          |  |  |
|               |          |           |          |    |          |      |          |          |  |  |

Fig 6: Simulation result of MAC unit

# IV. CONCLUSION

This work investigates the behaviour of a MAC unit using Wallace tree multipliers, carry look ahead adder and accumulator. The results shows the outputs of the various sample cases. The synthesis of the design shows that, the design comsumes optimum area and delay when compared to the MAC using conventional multipliers and adders. The future work includes enhnacing the MAC design using other multiplier circuits which can result in a better delay, area and power consumption.

## V. REFERENCES

- Chandrakasan, Sheng, & Brodersen, 1992 and Weste & Harris, 3rd Ed.
- [2] Deepika, Nidhi Goel "Design of FIR Filter Using Reconfigurable MAC Unit", 2016 3rd International Conference on Signal Processing and Integrated Networks (SPIN), 2016.
- [3] Gitika Bhatia, Karanbir Singh Bhatia, Osheen Chauhan, Soumya Chourasia and Pradeep Kumar "An Efficient MAC Unit with Low Area Consumption", IEEE INDICON 2015.
- [4] Tung Thanh Hoang, Magnus Själander and Per Larsson-Edefors, "A High-Speed, Energy-Efficient Two-Cycle Multiply-Accumulate (MAC) Architecture and Its Application to a Double-Throughput MAC Unit", IEEE Transactions on Circuits and Systems - I : Regular Papers, Vol. 57, No. 12, pp. 3073 – 3081, 2010.
- [5] Gitika Bhatia, Karanbir Singh Bhatia, Shashank Srivastava, and Pradeep Kumar "Design and

Implementation of MAC Unit Based on Vedic Square, and It's Application", IEEE UP Section Conference on Electrical Computer and Electronics, 2015.

- [6] S. Ahish, Y.B.N. Kumar, Dheeraj Sharma and M.H. Vasantha, "Design of High Performance Multiply-Accumulate Computation Unit", Proceedings of IEEE International Advance Computing Conference (IACC), pp. 915-918, 2015.
- [7] A. Abdelgawad, "Low Power Multiply Accumulate Unit (MAC) for Future Wireless Sensor Networks", Proceedings of IEEE Sensors Applications Symposium (SAS), pp.129-132, 2013.
- [8] S. Rakesh and K.S. Vijula Grace, "A Survey on the Design and Performance of various MAC Unit Architectures", Proceedings of IEEE International Conference on Circuits and Systems (ICCS), pp. 312 – 315,2017.