# Non-Volatile Nano-Electro-Mechanical Memory for Energy-Efficient Data Searching

Kimihiko Kato, Vladimir Stojanović, and Tsu-Jae King Liu

(a)

DL0

Data1 1100

Abstract—A compact non-volatile nano-electro-mechanical memory (NV-NEMory) cell design together with a novel memory array architecture and operating scheme is proposed for real-time data searching applications. Performance characteristics of a vertically oriented NV-NEMory cell with a small layout area of  $8F^2$ , where F is the minimum half-pitch, are investigated by static and transient device simulations. Data searching can be achieved directly in the memory by a two-step read operation, dramatically improving the latency and energy cost of each search query.

*Index Terms*—Database computing, NEM switch, nonvolatile memory, data searching.

## I. INTRODUCTION

**D** ATA searching is a fundamental operation used in computer networks and in the processing of large and complex data sets ("big data"). To facilitate the continued proliferation of information technology, innovations to dramatically improve the speed and energy efficiency of data searching are essential. Conventionally, a data search operation utilizes a combination of a processor (CPU) chip and memory chips (non-volatile flash or dynamic random access memory, DRAM). Recently, a memory-based super-parallel computing scheme has been proposed for high-speed data searching [1], [2], wherein simple arithmetic functionality is added to a high-density non-volatile memory array in which the searchable data is stored separately from a CPU; the number of operations required to match a data string is equal to its width (the number of bits) [1].

In this letter, a novel memory circuit architecture and operating scheme for faster and more energy-efficient data searching is proposed, based on an array of nonvolatile memory (NVM) cells each comprising one access transistor and one compact nano-electro-mechanical (NEM) non-volatile switch implemented using an advanced back-endof-line (BEOL) process with air-gapped interconnects [3], [4]. The NV-NEMory cell is relatively compact, with a layout

Manuscript received November 12, 2015; revised November 24, 2015; accepted November 27, 2015. Date of publication December 2, 2015; date of current version December 24, 2015. This work was supported in part by the National Science Foundation within the Directorate for Engineering through the Center for Energy Efficient Electronics Science under Award 0939514. The work of K. Kato was supported by a post-doctoral fellowship from the Japan Society for the Promotion of Science. The review of this letter was arranged by Editor C. V. Mouli.

The authors are with the Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA 94720 USA (e-mail: k.kato@eecs.berkeley.edu).

Color versions of one or more of the figures in this letter are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/LED.2015.2504955

DL1 DL0 DL1 DL0 DL1 (b) (d) (c) Ń AL1 "N" STATE "0" STATE "1" STATE (e) Input voltage Output current Operation AL0 AL1 DL0 DL1 BL, WL, "0" state "1" state V Program "0" Float Float Float 0  $V_{\rm dd}$ N/A N/A V<sub>prog</sub> Program "1" Float Float Float 0  $V_{\rm dd}$ N/A N/A N/A Store Х Х Х Х х Х N/A Match "0" Х SA > 0 Х Float V<sub>read</sub>  $V_{\rm dd}$ zero Match "1" Х Х  $V_{\rm read}$ Float SA V<sub>dd</sub> > 0 zero

Data2 1101 Data3 1000

BI

SA: Sense amplifier, X: Any state acceptable

Fig. 1. (a) Circuit diagram of the proposed cell array for memory-based super-parallel data searching; (b)-(d) schematic illustrations of the NEM switch of a cell in the "neutral (N)", "0" and "1" states, respectively; (e) cell operating voltages and output current for programming, storage, and data-matching operations.

area equal to  $8F^2$  (where *F* is the minimum half-pitch), suitable for high-density storage; from circuit simulations it is projected to have a read access time well below 0.5 ns. (This is in contrast to the NV static memory cell design proposed in [3].) To find a matching data string stored within the NVM cell array, only two read operations are performed directly on the array, in parallel across all of the programmed cells.

## II. NV-NEMory ARRAY DESIGN AND OPERATION

The proposed memory array architecture, memory cell states and operating scheme are illustrated in Fig. 1. As fabricated, each NEM switch is in the neutral (N) state, with its movable electrode (a vertically oriented beam anchored at the bottom end) not touching either adjacent data line (DL0 or DL1). To program a cell (located in the

0741-3106 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.

Data4

Not programmed

BL



Fig. 2. NV-NEMory cell comprising one access transistor and one nonvolatile NEM switch implemented in multiple layers of metal interconnect: (a) 3-D view, (b) layout view; (c) side view showing how contact is made only in the top metal layer (the color scale indicates displacement); (d) projected minimum half-pitch (F) reduction over time, with continued advancement in manufacturing technology [6].

*i*-th column and *j*-th row) into the "1" state, the actuation line AL1 is biased at a high voltage  $(V_{prog})$ , the bit line  $(BL_i)$  is driven to 0 V, and the word line  $(WL_j)$  is pulsed to a voltage equal to  $V_{dd}$  to turn on the access transistor, causing the beam to be electrostatically actuated into contact with the DL1 line. To program a cell into the "0" state, the actuation line AL0 is activated instead so that the beam is actuated into contact with the DL0 line. Note that during a program operation, the data lines are floating so that no direct current flows in the NEM switch; therefore, the energy consumed is very small (less than 50 aJ per cell).

For non-volatile storage of information, the adhesive force ( $F_{adh}$ ) between the beam and the data line which it touches must be larger than the mechanical spring restoring force ( $F_{spring}$ ) of the deformed beam.  $F_{spring}$  can be engineered to be smaller than  $F_{adh}$  – which is projected to be greater than 1 nN for a 10 nm metal-metal contact [5] – by using multiple metal interconnect layers to form a relatively long (compliant) beam with minimum footprint, as shown in Fig. 2.

To find the column which contains data matching an input string, two read operations are performed across all of the programmed columns in the memory array: 1) Match "0" to determine which columns have "0" stored in the same bits as the input string; and 2) Match "1" to determine which columns have "1" stored in the same bits as the input string. In a Match "0" operation, all of the DL1 lines are driven to a high voltage ( $V_{read}$ ) and only the word lines corresponding to the locations of 0 bits in the input string are pulsed to a voltage equal to  $V_{dd}$ ; any accessed cell in the "1" state will drive current through its bitline (detected by a sense amplifier), signaling that its column does not store data matching the input string. In other words, only if zero current flows into its bitline does a column have 0s stored in the locations of 0 bits in the input string. A Match "1" operation works analogously: all of the DL0 lines are driven to  $V_{read}$  and only the word lines corresponding to the locations of 1 bits in the input string are pulsed to  $V_{dd}$ ; only if zero current flows into its bitline does a column have 1s stored in the locations of 1 bits in the input string. The column which has both 0s stored in the same bits as the input string and 1s stored in the same bits as the input string contains a perfect match to the input string. In this manner, only two read operations are needed no matter how many data strings are stored and how long the data string. The number of data strings cannot exceed the number of columns, while the length of the data string cannot exceed the number of rows in the array.

It should be noted that the control circuitry needs to keep track of the number of data strings stored in the array to avoid selecting unprogrammed columns, since these always contribute zero current to the bit line (and hence would never signal data mismatch). Also, to avoid a read error due to sneak leakage current, the sense-amplifier comparison threshold should be set appropriately; alternatively, a nonlinear selector can be integrated in series with each NEM switch, *e.g.* by leaving an insulating layer on the sidewalls of the contacting electrodes so that a metal-insulator-metal diode is formed upon contact.

## III. NEM SWITCH DESIGN & OPERATING CHARACTERISTICS

The Coventor MEMS+ software tool [7] was used to simulate the operation of the NEM switch, to find the minimum actuation voltage  $(V_{prog})$  required to program a memory cell and the minimum value of contact adhesive force  $(F_{adh})$ required for non-volatile storage, as a function of the minimum half-pitch (F). The aspect ratio (height:width) of the metal interconnects and vias is kept constant at 2, and the width of the self-aligned contacts/vias between adjacent wordlines and bitlines is kept constant at F/2 [8], [9]. The minimum cell layout area is  $8F^2$  for a vertically oriented NEM switch with minimum feature/gap size F. The horizontal arrows in Fig. 2(a) indicate various cases for the beam anchor location, which is the lowest deformable point of the beam: for case A (lowest anchor point), the beam is anchored at the base of the fin-shaped self-aligned metal contact to the diffusion region of the access transistor; for case B, the beam is anchored at the base of the fin-shaped self-aligned contact between bitlines; for case C (highest anchor point), the beam is anchored at the base of the lowermost via. The beam height from the lowest anchor point to the top is 22F.

Fig. 3(a) plots the pull-in voltage  $(V_{pi})$  as a function of F for the various anchor options, obtained by static device simulation. It is clear that a lower anchor point (*i.e.* a longer movable beam) is beneficial for a lower set voltage, and is more effective than F scaling in this regard. The minimum  $F_{adh}$  required for non-volatile storage (corresponding to  $F_{spring}$  when the portion of the beam formed in the topmost layer of metal is displaced by a distance F such that it contacts a data line) is plotted as a function of F in Fig. 3(b), and can be seen to be less than  $\sim 1$  nN. In theory, a perfectly flat metal-metal contact has adhesive force greater than 50 nN/nm<sup>2</sup> [10].



Fig. 3. (a)  $V_{pi}$  and (b) minimum  $F_{adh}$  for NV application as a function of *F*, obtained by static device simulation.



Fig. 4. (a) Waveforms of  $V_{\text{prog}}$  for transient simulation; (b) and (c) time evolution of displacement of top of the beam with and without  $F_{\text{adh}}$  of 1 nN; (d)  $V_{\text{prog}}$  dependence of mechanical delay of the vertical NEM switch with various F. × symbols indicate catastrophic pull-in of the NEM switch.

In practice, metal-metal contact is only made at one or more asperities and  $F_{adh}$  has been estimated from experimental data to be 0.05 nN/nm<sup>2</sup> for sub-micron contacting regions [5]. Thus,  $F_{adh}$  can be expected to be greater than 1 nN for a metal-metal contact with an apparent area as small as 2 nm × 10 nm. It should be noted that the NEM switch is robust against mechanical shock, since an acceleration rate in excess of  $10^9$  m/s<sup>2</sup> would be needed to cause the beam to come out of contact with a data line, due to its very small mass.

The time required for the beam to be actuated into contact with a dataline DL (i.e. the mechanical delay of the NEM switch) was obtained by transient device simulation. Figs. 4(a)-4(c) show the actuation voltage waveform and the simulated displacement of the top portion of the beam for anchor case A, with and without  $F_{adh}$  of 1 nN. (It should be noted the electrical charging delay is much smaller than the mechanical delay; Table II illustrates the matching delay for a typical sub-array size of 256×256 cells). The results indicate that without adhesive force the beam recoils upon initial contact, and that 1 nN contact adhesive force is sufficient to eliminate this bounce. Fig. 4(d) shows how the mechanical delay decreases with increasing  $V_{prog}$ , for various values of F. Note that the dynamic pull-in voltage (accounting for the non-zero momentum of the beam) is ~80% of the static  $V_{pi}$ . The NEM switch is projected to be programmed in less than 10 ns with  $V_{prog} = 2$  V. Additionally, the voltage operating margin (to avoid catastrophic pull-in of the beam into contact with an actuation line AL) is more than 2 V, even though the as-fabricated air gap thickness between the beam and AL is as small as that between the beam and DL. (Process-induced variations resulting in an actuation gap that is 19% smaller than the contact gap can result in catastrophic

TABLE I CELL-LEVEL COMPARISON OF NVM TECHNOLOGIES (16 nm TECHNOLOGY)

|                 | РСМ             | Redox<br>RRAM     | STT-<br>MRAM        | NV-NEMory<br>(This work) |
|-----------------|-----------------|-------------------|---------------------|--------------------------|
| Cell area       | 6F <sup>2</sup> | 5-8F <sup>2</sup> | 20-40F <sup>2</sup> | 8F <sup>2</sup>          |
| Program voltage | 3 V             | 0.5 V             | 1.8 V               | ~ 2 V                    |
| Program time    | 50 ns           | 5 ns              | 100 ns              | < 10 ns                  |
| Program current | 100 µA          | 0.4 µA            | 100 µA              | zero                     |
| Program energy  | 2 pJ            | 1 fJ              | 4 pJ                | ~ 50 aJ                  |
| Read voltage    | 3 V             | 0.2 V             | 0.5 V               | < 0.1 V                  |
| Read time       | 60 ns           | 10 ns             | 10-20 ns            | < 0.1 ns                 |

TABLE II 256  $\times$  256 NV-NEMory Array Energy and Delay for Data Search

| Cells involved:                     | 1 column<br>× 1 row | 1 column<br>× 256 rows | 256 columns<br>× 256 rows | Delay    |  |
|-------------------------------------|---------------------|------------------------|---------------------------|----------|--|
| Program (V <sub>prog</sub> = 2.5 V) | 15 fJ               | 2.0 pJ                 | N/A                       | < 10 ns  |  |
| Match "0" or Match "1"              | N/A                 | N/A                    | 1.2 pJ                    | < 0.2 ns |  |

Transistor gate, source/drain, and wire capacitance values of 0.8, 1, and 0.2 fF/µm, respectively, are assumed; 1 k $\Omega$  beam-BL contact resistance, 3 k $\Omega$  access transistor on-resistance, and 0.1 V voltage swing required for sense-amplifiers are also assumed.

pull-in; however, adjustments in the fabrication process to ensure smaller contact gap can guard against this. Also, the actuation gaps can comprise dielectric material, *i.e.* the beam and/or AL can be coated by thin insulating material to prevent electrical conduction between the beam and the AL.)

The performance characteristics of the NV-NEMory cell are compared with those of other NVM cell designs in Table I [11]. Notably, both the programming energy per bit and the read access time are much smaller than for other NVM technologies. Performance metrics for a 256×256 NV-NEMory sub-array are tabulated in Table II. Less than 20 ns and 2 pJ is required to write a data string into one column in the sub-array, i.e. to program each cell within a column into the "0" state or the "1" state. The location of a data string can be found (with two matching steps as described above) in less than 0.5 ns with less than 2.5 pJ. To put these numbers into perspective, for a die size of 76 mm<sup>2</sup> with F = 20 nm and 35% cell density (similar to that of DDR4 DRAM), the NV-NEMory chip would have a storage capacity of 8 Gb and would consume only 300 nJ to find a match on the whole chip. In comparison, it would take a combination of CPU and DRAM with similar storage density as the proposed NV-NEMory array approximately 90 mJ and 80 ms for the same task [12]. The relatively fast read speed and low power consumption make the proposed NV-NEMory technology well-suited for real-time data searching applications.

### IV. SUMMARY

A compact non-volatile nano-electro-mechanical memory cell design, together with a novel memory array architecture and operating scheme is proposed for ultra-fast, energy-efficient data searching applications.

#### REFERENCES

- X.-T. Nguyen, H.-T. Nguyen, T.-T. Hoang, K. Inoue, O. Shimojo, T. Murayama, K. Tominaga, and C.-K. Pham, "DataBase processor (DBP)—A new search engine for the big data era," in *Proc. Int. Conf. Integr. Circuits, Design, Verification (ICDV)*, Aug. 2015, pp. 9–14.
- [2] M. Sharad, D. Fan, K. Aitken, and K. Roy, "Energy-efficient non-Boolean computing with spin neurons and resistive memory," *IEEE Trans. Nanotechnol.*, vol. 13, no. 1, pp. 23–34, Jan. 2014. DOI: 10.1109/TNANO.2013.2286424
- [3] N. Xu, J. Sun, I.-R. Chen, L. Hutin, Y. Chen, J. Fujiki, C. Qian, and T.-J. K. Liu, "Hybrid CMOS/BEOL-NEMS technology for ultralow-power IC applications," in *IEDM Tech. Dig.*, Dec. 2014, pp. 28.8.1–28.8.4. DOI: 10.1109/IEDM.2014.7047130
- [4] S. Natarajan, M. Agostinelli, S. Akbar, M. Bost, A. Bowonder, V. Chikarmane, S. Chouksey, A. Dasgupta, K. Fischer, Q. Fu, T. Ghani, M. Giles, S. Govindaraju, R. Grover, W. Han, D. Hanken, E. Haralson, M. Haran, M. Heckscher, R. Heussner, P. Jain, R. James, R. Jhaveri, I. Jin, H. Kam, E. Karl, C. Kenyon, M. Liu, Y. Luo, R. Mehandru, S. Morarka, L. Neiberg, P. Packan, A. Paliwal, C. Parker, P. Patel, R. Patel, C. Pelto, L. Pipes, P. Plekhanov, M. Prince, S. Rajamani, J. Sandford, B. Sell, S. Sivakumar, P. Smith, B. Song, K. Tone, T. Troeger, J. Wiedemer, M. Yang, and K. Zhang, "A 14 nm logic technology featuring 2nd-generation FinFET, airgapped interconnects, self-aligned double patterning and a 0.0588 μm<sup>2</sup> SRAM cell size," in *IEDM Tech. Dig.*, Dec. 2014, pp. 3.7.1–3.7.3. DOI: 10.1109/IEDM.2014.7046976

- [5] J. Yaung, L. Hutin, J. Jeon, and T.-J. K. Liu, "Adhesive force characterization for MEM logic relays with sub-micron contacting regions," *IEEE/ASME J. Microelectromech. Syst.*, vol. 23, no. 1, pp. 198–203, Feb. 2014. DOI: 10.1109/JMEMS.2013.2269995
- [6] (2012). International Technology Roadmap for Semiconductors (ITRS). [Online]. Available: http://www.itrs.net/
- [7] Coventor MEMS+ User Guide, Coventor, Inc., 2013.
- [8] J. Park and C. Hu, "Gate last MOSFET with air spacer and self-aligned contacts for dense memories," in *Proc. VLSI Symp. VLSI Technol., Syst.*, *Appl.*, Apr. 2009, pp. 105–106. DOI: 10.1109/VTSA.2009.5159312
- [9] K. Oyama, S. Yamauchi, K. Yabe, A. Hara, S. Natori, and H. Yaegashi, "The enhanced photoresist shrink process technique toward 22 nm node," *Proc. SPIE*, vol. 7972, p. 79722Q, Feb. 2011. DOI: 10.1117/12.878947
- [10] C. Pawashe, K. Lin, and K. J. Kuhn, "Scaling limits of electrostatic nanorelays," *IEEE Trans. Electron Devices*, vol. 60, no. 9, pp. 2936–2942, Sep. 2013. DOI: 10.1109/TED.2013.2273217
- [11] J. Hutchby and M. Garner. (Apr. 6–7, 2010). Assessment of the Potential and Maturity of Selected Emerging Research Memory Technologies Workshop and ERD/ERM Working Group Meeting. ITRS. [Online]. Available: http://www.itrs.net/
- [12] K. Sohn, T. Na, I. Song, Y. Shim, W. Bae, S. Kang, D. Lee, H. Jung, S. Hyun, H. Jeoung, K.-W. Lee, J.-S. Park, J. Lee, B. Lee, I. Jun, J. Park, J. Park, H. Choi, S. Kim, H. Chung, Y. Choi, D.-H. Jung, B. Kim, J.-H. Choi, S.-J. Jang, C.-W. Kim, J.-B. Lee, and J. S. Choi, "A 1.2 V 30 nm 3.2 Gb/s/pin 4 Gb DDR4 SDRAM with dual-error detection and PVT-tolerant data-fetch scheme," *IEEE J. Solid-State Circuits*, vol. 48, no. 1, pp. 168–177, Jan. 2013. DOI: 10.1109/JSSC.2012.2213512