

# Realization of Low Power and High Speeds S-TCAM with A Check bit

S.Dhiliban<sup>1</sup>,(M.E AE), S.Saraswathi<sup>2</sup>, M.E.,

<sup>1</sup>Student/Dept of ECE, Anna University Chennai, <sup>2</sup>Assistant Professor/Dept of ECE, Anna University Chennai Jayam College of Engineering and Technology, Dharmapuri DT, India

## -----ABSTRACT-----

Ternary Content Addressable memory used to give a high speed of searching operation. Ternary Content Addressable memory having some disadvantages like less bit density, access and searching of data will be very slow, cost of the TCAM is thirty percent more per bit then it has low scalability and architecture of TCAM is more complex than the Random Access Memory (RAM), finally TCAM's are more expensive. In this paper introducing a check/parity bit to boost the searching speed of Hybrid Partition SRAM based TCAM with less power and delay. A new memory architecture called S-TCAM, which perform TCAM operation with SRAM memory. The architecture of HP SRAM based TCAM with a parity bit was verified by VERILOG in ModelSim .Delay and power factor will be analysis in Xilinx ISE. Power will be reduced upto 30% and delay will be reduced up to 36% compared to previous methods.

**KEY WORDS** — Static Random Access Memory Based TCAM (S-TCAM), Ternary content Addressable memory, HP, Modelsim, Xilinx ISE.

## I. INTRODUCTION

Ternary content addressable memory (TCAM) allows its memory to be searched by contents rather than by an address and a memory location among matches is sent to the output in a constant time. A typical TCAM cell has two static random access memory (SRAM) cells and a comparison circuitry and has the ability to store three states -0, 1, and x where x is a don't care state. The x state is always regarded as matched irrespective of the input bit. The constant time search of TCAM makes it a suitable candidate in different applications such as network routers, data compression, real-time pattern matching in virus-detection, and image processing etc. TCAM provides single clock lookup; however, it has several disadvantages compared with SRAM. TCAM is not subjected to the intense commercial competition found in the RAM market. TCAM is less dense than SRAM.

The comparator's circuitry in TCAM cell adds complexity to the TCAM architecture. The extra logic and capacitive loading due to the massive parallelism lengthen the access time of TCAM, which is 3.3 times longer than the SRAM access time. Inborn architectural barriers also limit the total chip capacity of TCAM. Complex integration of memory and logic also makes TCAM testing very time consuming. Furthermore, the cost of TCAM is about 30 times more per bit of storage than SRAM.

RAM is available in a wider variety of sizes and flavors, is more generic and widely available, and enables to avoid the heavy licensing and royalty costs charged by some CAM vendors. CAM devices have very limited pattern capacity and also CAM technology does not evolve as fast as the RAM technology.

During the desktop PC design era VLSI design efforts have focused primarily on optimizing speed to realize computationally intensive real-time functions such as video compression, gaming, graphics etc. While these solutions have addressed the real-time problem, they have not addressed the increasing demand for portable operation, where mobile phone need to pack all this without consuming much power. The strict limitation on power dissipation in portable electronics applications such as smart phones and tablet computers must be met by the VLSI chip designer while still meeting the computational requirements.

While wireless devices are rapidly making their way to the consumer electronics market, a key design constraint for portable operation namely the total power consumption of the device must be addressed.

*National Conference On Intelligence In Electronics And Communication Engineering (NCIECE'15)* - 117| Page RVS COLLEGE OF ENGINEERING AND TECHNOLOGY

#### **II. PREVIOUS METHOD**

Most RAM-based solutions for CAM use hashing to build CAM from RAM but these methods suffer from collisions and bucket overflow. If many records have been placed in an overflow area, then a lookup may not finish until many buckets are searched. when stored keys contain don't care bits in the bit positions used for hashing, then such keys must be duplicated in multiple buckets, which need increased capacity. On the other hand, if the search key contains don't care bits which are taken by the hash function, multiple buckets must be accessed that results in performance degradation. the performance of the method becomes gracefully degradable as the number of stored elements increases. Furthermore, it emulates binary CAM, not TCAM. Thus, hashing cannot provide deterministic performance owing to potential collisions and is inefficient in handling wildcard. Traditional algorithmic search solutions take multiple clock cycles and also result in inefficient memory utilization .Combines RAM and CAM to develop the CAM functionality. This approach makes partitions of the conventional TCAM table using some distinguishing bits in CAM entries.

#### III. PROPOSED METHOD

The new architecture has the same interface as the conventional HP SRAM based TCAM with one extra bit. The proposed TCAM with a parity bit design consisting of the original data segment and an extra one bit segment derived from the actual data bits. Parity bit is either odd or even.

The extractor is used to find the parity bit value. During the search operation, the matched parity bit value of word is found first. Only the word whose parity bit value matched will be compared with the search word and reduce the comparison with the mismatched word.

#### **A.Overall Architecture**

The overall architecture of S-TCAM is depicted in Fig. 1 where each layer represents the architecture shown in Fig. 2. It has L layers and a CAM priority encoder (CPE). Each layer outputs a potential match address (PMA). The PMAs are fed to CPE, which selects match address (MA) among PMAs.



Fig. 1.Architecture of S-TCAM (sw: subword, *C*: # of bits in the input word, PMA: potential match address, and MA: match address).

6

*National Conference On Intelligence In Electronics And Communication Engineering (NCIECE'15)* - 118| Page RVS COLLEGE OF ENGINEERING AND TECHNOLOGY

## **B.** Layer Architecture

Layer architecture is shown in Fig. 2. It contains *N* validation memories (VMs), 1-bit AND operation, *N* original address table address memories (OATAMs), *N* original address tables (OATs), *K*-bit AND operation, and a layer priority encoder (LPE).

## 1. Validation Memory (VM)

Size of each VM is  $2^{w} \times 1$  bits where *w* represents the number of bits in each subword and  $2^{w}$  shows the number of rows. A subword of *w* bits implies that it has total combinations of  $2^{w}$  where each combination represents a subword. For example, if *w* is of 4 bits, then it means that there are total of  $2^{4} = 16$  combinations. This explanation is also related to OATAM and OAT. Each subword acts as an address to VM. If the memory location be invoked by a subword is high, it means that the input subword is present, otherwise absent. Thus, VM validates the input subword, if it is present.

#### 2. 1-Bit AND Operation

It ANDs the output of all VMs. The output of 1-bit AND operation decides the continuation of a search operation. If the result of 1-bit AND operation is high, then it permits the continuation of a search operation, otherwise mismatch occurs in the corresponding layer.

## 3. Original Address Table Address Memory (OATAM)

Each OATAM is of  $2^{W} \times w$  bits where  $2^{w}$  is the number of rows and each row has w bits. In OATAM, an address is stored at the memory location indexed by a subword and that address is then used to invoke a row from its corresponding OAT. If a subword in VM is mapped, then a corresponding address is also stored in OATAM at a memory location accessed by the subword.

## 4. Original Address Table (OAT)

Dimensions of OAT are  $2^{w} \times K$  where w is the number of bits in a subword,  $2^{w}$  represents number of rows, and K is the number of bits in each row where each bit represents an original address. Here K is a subset of original addresses from conventional TCAM table. It is OAT, which considers the storage of original addresses.

## 5. K-BIT AND Operation

It ANDs bit-by-bit the read out K-bit rows from all OATs and forwards the result to LPE.

## 6. Layer Priority Encoder (LPE)

Because we emulate TCAM and multiple matches may occur in TCAM Content-addressable memory (CAM) circuits and architectures, the LPE selects PMA among the outputs of K-bit AND operation.



Fig. 2. Layer Architecture of S-TCAM. (sw: subword, VM: validation memory, OATAM: original address table address memory, OAT: original address table, and LPE: layer priority encoder).

*National Conference On Intelligence In Electronics And Communication Engineering (NCIECE'15)* - 119| Page RVS COLLEGE OF ENGINEERING AND TECHNOLOGY

## **C. S-TCAM Operations**

There are two kind of operation will be performed by S-TCAM and they are

- 1) Data Mapping operation
- 2) Searching operation.

## **1. Data Mapping Operation**

Classical TCAM table is logically partitioned into hybrid partitions. Each hybrid partition is then expanded into a binary version. Thus, we first expand x into states 0 and 1 to be stored in SRAM. For example, if we have a TCAM word of 010x, then it is expanded into 0100 and 0101. Each subword, acting as an address, is applied to its corresponding VM and a logic "1" is written at that memory location. The same subword is also applied to its respective OATAM and w bits data are written at that memory location. During search, these w bits data act as an address to the OAT. The K bits data are also written at the memory location in OAT determined by its corresponding OATA. Thus, in this way, all hybrid partitions are mapped. A subword in a hybrid partition can be present at multiple locations. So, it is mapped in its corresponding VM and its original address(es) is/are mapped to its/their corresponding bit(s) in its respective OAT. Since a single bit in OAT represents an original address, only those memory locations in VMs and address positions/ original addresses in OATs are high, which are mapped while remaining memory locations and address positions are set to low in VMs and OATs, respectively.

| Address | Terna            | Layer            |   |  |
|---------|------------------|------------------|---|--|
|         |                  |                  |   |  |
| 0<br>1  | 00<br>01<br>HP11 | 11<br>01<br>HP12 | 1 |  |
| 2<br>3  | 0x<br>11<br>HP21 | 11<br>1x<br>HP22 | 2 |  |

Table 1: Traditional TCAM table and its hybrid partitions (hp)

Example of data mapping is shown in Table 2. I use Table 1 to be mapped to S-TCAM. We take N = 2, L = 2, K = 2, and w = 2. After necessary processing, HP11, HP12, HP21, and HP22 are mapped to their corresponding memory units. In the example, we map hybrid partitions of layer 2 to their corresponding memory units. Hybrid partitions of layer 1 can be easily mapped in similar way.

| Address          | VM21 VM22                                             | OATAM21<br>OATAM22 |            | AL               | RGIN<br>DREE<br>AT21<br>3 |                  |                  |
|------------------|-------------------------------------------------------|--------------------|------------|------------------|---------------------------|------------------|------------------|
| 0<br>1<br>2<br>3 | $ \begin{array}{cccccccccccccccccccccccccccccccccccc$ | 0<br>1<br><br>2    | <br>0<br>1 | 1<br>1<br>0<br>0 | 0<br>0<br>1<br>0          | 0<br>1<br>0<br>0 | 1<br>1<br>0<br>0 |

## 2. Search Operation

#### a) Searching in a Layer of S-TCAM:

. N subwords are concurrently applied to a layer. The subwords then read out their corresponding memory locations from their respective VMs. If all VMs validate their corresponding subwords (equivalent to 1-bit AND operation in Layer Architecture of S-TCAM), then searching will continue, otherwise mismatch occurs in the layer.

## b) Searching in S-TCAM

Search operation in the proposed TCAM occurs concurrently in all layers Search key is applied to S-TCAM, which is then divided into N subwords. After searching, PMAs are available from all layers. CPE selects MA among PMAs; otherwise a mismatch of the input word occurs.



Fig 3.Flowchart for searching operation in layer

*National Conference On Intelligence In Electronics And Communication Engineering (NCIECE'15)* - 121| Page *RVS COLLEGE OF ENGINEERING AND TECHNOLOGY* 

# IV. EXPERIMENTAL RESULT



Fig 3.Modelsim output for HP partitioned S-TCAM Without a check bit



Fig 4.Modelsim output for HP partitioned S-TCAM with a check bit

| Method                    | Power (nw) | Delay (ns) |
|---------------------------|------------|------------|
| Existing (Without parity) | 0.087      | 7.219      |
| Proposed (with parity)    | 0.058      | 3.593      |

Table 3: Power and Delay Comparison

#### V. CONCLUSION

Thus, the Architecture of Hybrid Partition SRAM based TCAM (S-TCAM ) will implement and its performance will be analyze by using ModelSim-Altera 6.4a (Quartus II 9.0) Starter Edition and also Hybrid Partition SRAM based TCAM (S-TCAM) with a Check bit/Parity bit will analysis then both architecture will compare to get performance like Speed of Operation. By using Xilinx ISE both Hybrid Partition SRAM based TCAM (S-TCAM ) and Hybrid Partition SRAM based TCAM (S-TCAM) with a Check bit/Parity will be synthesis.

The synthesis report give the performance factors like power dissipation and delay .delay factor will be measure from the Xilinx synthesis report then power will be measure from Xilinx Power Estimator. With help of this architecture bit density will increase then the cost for this Hybrid Partition SRAM based TCAM (S-TCAM) with a Check bit/Parity will reduce.

*National Conference On Intelligence In Electronics And Communication Engineering (NCIECE'15)* - 122| Page RVS COLLEGE OF ENGINEERING AND TECHNOLOGY

#### ACKNOWLEDGMENT

I would like to take this opportunity to express my sincere gratitude to all my professors who have guided, inspired and motivated me for my project work. It gives me immense pleasure to acknowledge their cooperation.

#### REFERENCE

- Anh-Tuan Do, Shoushun Chen, Zhi-Hui Kong, and Kiat Seng Yeo, "A high speed low power CAM with a parity bit and power-gated ML sensing," IEEE Trans. Very Large Scale Integration (VLSI) System, vol. 21, no. 1, pp. 151-156, Jan. 2013. [1]
- Zahid Ullah, Kim Ilgon, and Sanghyeon baeg, "*Hybrid partitioned SRAM-based ternary content addressable memory*," IEEE Trans. Circuits Syst. I, vol. 59, no.12, pp. 2969-2978, Dec. 2012. [2]
- B. D. Yang, Y. K. Lee, S. W. Sung, J. J. Min, J. M. Oh, and H. J. Kang, "A low power content addressable memory using low [3] *swing search lines*," IEEE Trans. Circuits Syst. I, vol.58, no. 12, pp. 2849-2858, Dec. 2011. J. Chang, "Using the dynamic power source technique to reduce TCAM leakage power," IEEE Trans. Circuits Syst. II, vol. 57,
- [4] no. 11, pp. 888- 892, Nov. 2010.
- P. Mahoney, Y. Savaria, G. Bois, and P. Plante, "*Performance characterization for the implementation of content addressable memories based on parallel hashing memories*," P. Stentrom (ED.): Trans. HiPEAC II, LNCS 5470, pp.307-325, 2009. [5]
- [6] S. Baeg, "Low power ternary content-addressable memory design using a segmented match line, "IEEE Trans. Circuits Syst. I, vol.55, no.6, pp. 1485-1494, Jul.2008.