# **Design and Implementation of D-FlipFlop For Computation in Memory**

DEEPIKA PRABHAKAR, SHYLASHREE NAGARAJA<sup>a</sup>, S PRANAVA KOUNDINYA, MEGHANA R Department of Electronics and Communication Engineering,

R V College of Engineering, Bengaluru, Affiliated to Visvesvaraya Technological University, Belagavi-590018, Karnataka, INDIA

### <sup>a</sup>ORCiD: https://orcid.org/0000-0003-4185-6190

*Abstract:* - As the number of data-intensive applications has grown, the traditional Von Neumann computer architecture has become constrained. To address the issue, the new technology platform "computation-in-memory" was established. A new design of the D Flip flop implemented in a memory array employing 8T static random-access memory (SRAM) and latch-type sense amplifier is proposed in this study. To implement the D Flip Flop, this design employs a master-slave multiplexer (MUX) architecture. It has a setup time of 94.887ps and a hold time of 97.22ps.

*Key-Words:* - D Flip Flop, Computation in Memory, SRAM, Sense Amplifier, Multiplexer, Memory\_design, latch based sense amplifier, CIM architecture.

Received: May 22, 2024. Revised: November 13, 2024. Accepted: December 15, 2024. Published: February 4, 2025.

### **1** Introduction

The growth of computing systems under the traditional Von Neumann architecture has been constrained in recent years as data-intensive applications such as artificial intelligence and machine learning have grown too far, [1]. The CPU and memory unit are physically separate in the Von Neumann design, which is utilized by most computers today, as depicted in Figure 1. The Von Neumann bottleneck develops as a result of the frequent data transfer between two disconnected sections. The ALU and memory unit are embedded to address this issue, [2]. Computation-in-memory (CIM) architecture is the term for this type of design.



Fig. 1: Traditional Von Neumann Architecture

## 2 Associated Works

Computation in memory has been the subject of numerous investigations. Integrating ALU with the RAM lowers energy consumption, [3]. The sixtransistor SRAM cell is most commonly used in embedded memory due to its short time interval and small size. By arranging 6T SRAM as content addressable memories (CAMs), Boolean logic operations are made possible, [4]. The 8T-SRAM cell can also be used in the place of 6T SRAM, [5]. Bit lines are used to represent the input and output ports of the capacitive-loaded cells. These operations can only be performed using these bit lines. The shared read and write paths in 6T SRAM, however, can result in read disturb failure. Bit-wise logic operations are performed using 8T SRAM, [6], to address this issue, although it had a poor sense margin for NAND operation, [7]. Utilizing latchtype sense amplifiers, [8], it is seen that a low sense margin for NAND operation is improvised, [9]. The 6T configuration has a much greater noise tolerance, [10], which is a big advantage, especially for scaled technology with reduced noise margins. The use of the 6T SRAM cell in low-power SRAM units as opposed to the more prevalent 4T designs is predominantly due to this. The two access transistors and two cross-coupled CMOS inverters that make up a 6T SRAM cell are included in the design, according to [11]. Considering sudden increases in area, optimal soft error resilience, high cell on current, minimal leakage through off transistors, stable performance with low voltage and amp requirements, as well as minimizing word line voltage pulses, the 6T cell style necessitates intricate trade-offs among various factors, [12]. This paper implements a new structure for a master-slave MUX architecture-based D Flip Flop in memory array. The 8T SRAM cell and latch-based sense amplifier that are employed in the proposed architecture are shown in Figure 2 and Figure 3, respectively.



Fig. 2: Proposed Transmission Gate Based 8T SRAM Cell



Fig. 3: Latch Based Sense Amplifier

### **3** Proposed D Flip Flop for CIM

A distinct channel exists for read and write operations in the transmission gate-based 8T SRAM, [13], depicted in Figure 2. In comparison to 6T SRAM, which shares a route for read and write operations, it offers a better read noise margin. In Figure 3, the latch-type sense amplifier is depicted. Two distinct cells are linked to inputs of the sense amplifier over a single channel known as RBL to achieve Boolean logic operation. The outputs of the sense amplifier can function as an OR/NOR gate or an AND/NAND gate by adjusting V\_REF, as indicated in Table 1. Output1 and Output2 produce the inverted output and noninverted output, respectively. As demonstrated, the voltage drop in the RQ fluctuates according to the values kept in the two memory cells when the two RLs are turned on simultaneously.

Table 1. Reference voltage for different boolean

| functions                                |                        |                    |
|------------------------------------------|------------------------|--------------------|
|                                          | Output 1<br>(Inverted) | Output 2<br>(True) |
| Reference Voltage for<br>OR/NOR function | NOR                    | OR                 |
| Reference Voltage for AND/NAND function  | NAND                   | AND                |

After the RLs are turned on, the EN of the sense amplifier senses the voltage in the RQ and compares it to V\_REF. V\_REF for the OR/NOR gate should be lower than the RQ voltage in the case of (00) that is stored in the cells and greater than that of RQ (01/10) and RQ (11) instances when EN is turned on. The case between RQ (01/10) and RQ (11) should be selected for V\_REF for AND/NAND, [14].

The 2:1 multiplexer used in the proposed D flip flop design is implemented as shown in Figure 4 for computation in memory. Here, two SRAM cells serve as a substitute for the D and CLK inputs that make up the inputs of the D flip flop. Q is the output of the proposed D flip flop. Here, the proposed D flip-flop action is used to read and write the data into the SRAM array.



Fig. 4: 2:1 Multiplexer (MUX)

Two transmission gates are incorporated into each column, passing the data onto the RQ line and the sense amplifier's input. As a result, any column that includes a sense amplifier, two transmission gates, and a pre-charge transistor functions as a simple logic gate. When a calculation needs to be done, the two RL lines are enabled, which feed the data onto the RQ line, after the RQ line has been for pre-charged while. The true а and complemented outputs of the logic gate are produced simultaneously whenever the Enable signal is activated and the proper reference voltage is applied to the sense amplifier.

The 2:1 MUX for computation in memory (CIM), which serves as the fundamental building block for the suggested D flip-flop design, is implemented in Figure 5.



Fig. 5: 2:1 Multiplexer (MUX) schematic

The 2:1 MUX is made up of 4 gates, as can be seen in Figure 5. The first column is set aside for the select line, the second column provides the output S'.10, the third column imitates the operation of S.11, and the fourth column is the multiplexer's final output after being ORed with the third and fourth columns.

Figure 6 represents the physical mask-level implementation of the 2:1 MUX that was explained earlier. The arrangement consists of three metal layers, M1 through M3, with an overall area of  $0.105 \text{ mm}^2$ .



Fig. 6: 2:1 Multiplexer (MUX) layout

An edge-sensitive D flip-flop can be generated by cascading two 2:1 MUXs, as shown in Figure 7. Figure 8 and Figure 9 depict the intended D flip flop's schematic and layout. The configuration comprises of 126 transistors and is made up of three metal layers, M1 to M3, which take up 0.212mm2 of area.



Fig. 7: Negative Edge Triggered D Flip Flop



Fig. 8: D Flip Flop schematic



Fig. 9: D Flip Flop layout

# **4** Experimental Outcomes

The SRAM cell's two word lines, WL1 and WL2 are initially turned on for a duration of 20 ns. The bit line is used to write data into the cell, which represents the CLK input and is written with a value of 1 for a duration of 20 ns once the word lines have been enabled. The D line is similarly enabled for 20ns after CLK is enabled. This is equivalent to the data writing stage.

The pre-charge transistor is initially turned ON and the read bit lines RQ, are pre-charged up to VDD, [15], for a duration of 5 ns to read the data, 20 ns after the data is stored in the SRAM cells. With a delay of 30 ns, the read lines, RL1 and RL2, are enabled, putting the data on the RQ channel. Using the EN signal, the sense amplifier is activated after 30 ns. Following the sense amplifier's activation, a suitable reference voltage, V\_REF, is provided to it. The sense amplifier then functions as a comparator, comparing the RQ voltage with the V REF voltage and producing both true and complemented outputs.

The above steps must be followed to activate one of the columns that serves as a logic gate, [16], in the complex D Flip Flop design. All of the appropriate columns must be activated identically for the D Flip Flop to function correctly for CIM. The output waveforms of the proposed D Flip Flop design for computation in memory are illustrated in Figure 10. Since the designed flip flop is negative edge triggered, the output Q follows the input D at the falling edge of the CLK.



Fig. 10: D Flip Flop Output Waveform

When the pre-charge is activated, there is a sudden surge in power as depicted in Figure 11. Both during the data read phase and the data write phase, similar power spikes can be seen. Power dissipation ranges from a minimum of 24.9nW to a maximum of 74.6uW. As a result, the D flip flip demonstrated for CIM dissipates an average power of 22.75uW.



Fig. 11: D Flip Flop Power Plot

The Spectre simulator and the Cadence Virtuoso environment, both of which utilize gpdk45 technology, are used to model the whole design. The design parameters are summarized in Table 2.

Table 2. Parameters summary

|                  | 2                   |
|------------------|---------------------|
| Parameter        | Values              |
| Set Up Time      | 94.887ps            |
| Hold Time        | 97.22ps             |
| Transistor Count | 126                 |
| Area             | 0212mm <sup>2</sup> |
| Minimum Power    | 24.9nW              |
| consumed         |                     |
| Maximum Power    | 74.6uW              |
| consumed         |                     |
| Average Power    | 22.75uW             |

To account for setup time, the data must be 94.887ps earlier than the clock's active edge. After the active clock edge arrives, the data must remain steady for 97.22ps which is the hold time specified by the suggested design.

#### 5 Conclusion

This paper proposes a new method for implementing a D Flip Flop for a widespread application in digital circuits especially in a memory array. Computation in memory(CAM) is a newly found technology and this DFF is well designed to fit in. The proposed D Flip Flop involves the cascading of two multiplexers in a master-slave arrangement, making it a faster edge-triggered device having a setup time of 94.887ps, hold time of 97.22ps and consuming an average power of 22.75uW. Knowledge gained from this study will form a strong foundation for advancement in flipflop technology.

References:

- Han, J., & Kim, Y. (2021, November). A Fast [1] Half Adder using 8T SRAM for Computationin-Memory. In 2021 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), pp. 1-3.
- [2] B. Chen, F. Cai, J. Zhou, W. Ma, P. Sheridan, and W. D. Lu, "Efficient in-memory computing architecture based on crossbar arrays," IEEE International Electron Devices Meeting (IEDM), 2015, pp. 17.5.1-17.5.4.
- D. G. Elliott, M. Stumm, W. M. Snelgrove, C. [3] Cojocaru, and R. Mckenzie, "Computational RAM: implementing processors in memory," IEEE Design & Test of Computers, vol. 16, no. 1, 1999, pp. 32-41.
- S. Jeloka, N. B. Akesh, D. Sylvester and D. [4] Blaauw, "A 28 nm Configurable Memory (TCAM/BCAM/SRAM) Using Push-Rule 6T Bit Cell Enabling Logic-in-Memory," IEEE

Journal of Solid-State Circuits, vol. 51, no. 4, April 2016, pp. 1009-1021.

- [5] Kutila, M., Paasio, A., & Lehtonen, T. (2014, August). Comparison of 130 nm technology 6T and 8T SRAM cell designs for Near-Threshold operation. In 2014 IEEE 57th International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 925-928.
- [6] A. Jaiswal, I. Chakraborty, A. Agrawal, and K. Roy, "8T SRAM Cell as a Multibit Dot-Product Engine for Beyond Von Neumann Computing," in *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 27, no. 11, Nov. 2019, pp. 2556-2567.
- [7] A. Agrawal, A. Jaiswal, C. Lee, and K. Roy, "X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 65, no. 12, Dec. 2018, pp. 4219-4232.
- [8] B. Wicht, T. Nirschl and D. Schmitt-Landsiedel, "Yield and speed optimization of a latch-type voltage sense amplifier," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 7, July 2004, pp. 1148-1158.
- [9] A. K. Rajput and M. Pattanaik, "Implementation of Boolean and Arithmetic Functions with 8T SRAM Cell for In-Memory Computation," 2020 International Conference for Emerging Technology (INCET), 2020, pp. 1-5.
- [10] Mu, J., & Kim, B. (2020, October). A 65nm logic-compatible embedded and flash memory for in-memory computation of artificial neural networks. In 2020 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1-4.
- [11] Manna, A., & Bhaaskaran, V. K. (2017, March). Improved read noise margin characteristics for single-bit line SRAM cell using adiabatically operated word line. In 2017 International Conference on Nextgen Electronic Technologies: Silicon to Software (ICNETS2), pp. 385-393.
- [12] Ravichandiran, P. P., & Franzon, P. D. (2021, October). A Review of 3D-Dynamic Random-Access Memory based Near-Memory Computation. In 2021 IEEE International 3D Systems Integration Conference (3DIC), pp. 1-6.
- [13] Aswini, V., Musala, S., & Srinivasulu, A.
  (2021, March). Transmission Gate-Based 8T
  SRAM Cell for Biomedical Applications. In
  2021 12th International Symposium on

Advanced Topics in Electrical Engineering (ATEE), pp. 1-7.

- [14] Han, J., & Kim, Y. (2021, November). A Fast Half Adder using 8T SRAM for Computationin-Memory. In 2021 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), pp. 1-3.
- [15] 14Bhaskar, A. (2017, April). Design and analysis of low-power SRAM cells. In 2017 Innovations in Power and Advanced Computing Technologies (i-PACT), pp. 1-5.
- [16] S. Jain, A. Ranjan, K. Roy, and A. Raghunathan, "Computing in memory with spin-transfer torque magnetic ram," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 26, no. 3, pp. 470–483, March 2018.

### Contribution of Individual Authors to the Creation of a Scientific Article (Ghostwriting Policy)

- Deepika P and Pranava koundinya carried out design and simulation of transmission gate 6T SRAM cell, D flip flop schematic and layout of D flip flop using cadence as shown in section 3.
- Shylashree N and Meghana R worked on design and implementation of latch based sense amplifier and simulated using cadence and responsible for the simulation table 2 parameter summary.

### **Sources of Funding for Research Presented in a Scientific Article or Scientific Article Itself** No funding was received for conducting this study.

### **Conflict of Interest**

The authors have no conflicts of interest to declare.

# Creative Commons Attribution License 4.0 (Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the Creative Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en US