# Advancing Nonvolatile Computing With Nonvolatile NCFET Latches and Flip-Flops

Xueqing Li, *Member, IEEE*, Sumitha George, Kaisheng Ma, Wei-Yu Tsai, Ahmedullah Aziz, *Student Member, IEEE*, John Sampson, *Member, IEEE*, Sumeet Kumar Gupta, *Member, IEEE*, Meng-Fan Chang, *Senior Member, IEEE*, Yongpan Liu, *Senior Member, IEEE*, Suman Datta, *Fellow, IEEE*, and Vijaykrishnan Narayanan, *Fellow, IEEE* 

Abstract-Nonvolatile computing has been proven to be effective in dealing with power supply outages for on-chip check-pointing in emerging energy-harvesting Internet-of-Things applications. It also plays an important role in power-gating to cut off leakage power for higher energy efficiency. However, existing on-chip state backup solutions for D flip-flop (DFF) have a bottleneck of significant energy and/or latency penalties which limit the overall energy efficiency and computing progress. Meanwhile, these solutions rely on external control that limits compatibility and increases system complexity. This paper proposes an approach to fundamentally advancing the nonvolatile computing paradigm by intrinsically nonvolatile area-efficient latches and flip-flops designs using negative capacitance FET. These designs consume fJ-level energy and ns-level intrinsic latency for a backup plus restore operation, e.g., 2.4 fJ in energy and 1.1 ns in time for one proposed nonvolatile DFF with a supply power of 0.80 V.

*Index Terms*—Internet-of-Things, power-gating, ferroelectric FET, NCFET, DFF, NV-DFF, nonvolatile processing.

# I. INTRODUCTION

**S** CHEDULED power-gating of VLSI computing systems has been widely adopted in both low-power portable devices and high-performance cloud server centers to cut off static leakage power. States of the registers and flip-flops in the pipelining logic need to be backed up to prevent loss

Manuscript received December 7, 2016; revised April 19, 2017; accepted May 2, 2017. Date of publication June 2, 2017; date of current version October 24, 2017. This work was supported in part by LEAST, a funded center of STARnet, a Semiconductor Research Corporation (SRC) program sponsored by MARCO and DARPA, in part by GRC under Grant 2657.001, and in part by NSFC under Grant 61674094. The work of K. Ma was supported by NSF ASSIST. This paper was recommended by Associate Editor M. Alioto. (*Corresponding authors: Xueqing Li; John Sampson.*)

X. Li, S. George, K. Ma, W.-Y. Tsai, A. Aziz, J. Sampson, S. K. Gupta, and V. Narayanan are with the School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA 16802 USA (e-mail: lixueq@cse.psu.edu; sug241@cse.psu.edu; kxm505@cse.psu.edu; wzt114@cse.psu.edu; afa5191@psu.edu; sampson@ cse.psu.edu; skg157@engr.psu.edu; vijay@cse.psu.edu).

M.-F. Chang is with the Department of Electrical Engineering, National Tsing Hua University, Hsinchu 30013, Taiwan (e-mail: mfchang@mx.nthu.edu.tw).

Y. Liu is with the Department of Electronic Engineering, Tsinghua University, Beijing 100084, China (e-mail: ypliu@tsinghua.edu.cn).

S. Datta is with the Department of Electrical Engineering, University of Notre Dame, Notre Dame, IN 46556 USA (e-mail: sdatta@nd.edu).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSI.2017.2702741

of computation status if the supply is removed. Similarly, mandatory state backup and restore operations are required for battery-less portable devices powered by energy-harvesting techniques. This is because the ambient energy sources, such as vibration, photovoltaics, and radio, are essentially intermittent even with sophisticated design methods [1]–[8].

# A. Impact of Power Supply Outages

Fig. 1(a) shows a sample of power income versus time. The power outage failures have a strong impact on the computation progress and the overall energy efficiency of IoT computing [1], [2], [4]–[8]. On the one hand, the computing system will lose its computation states if these states are not successfully backed up before the power failure. On the other hand, even if techniques of check-pointing are applied to reduce the chance of progress loss, significant overheads of energy and latency still exist for each backup and restore operation, as illustrated in Fig.1(b). If the backup data are stored into out-of-chip nonvolatile memory, huge energy and latency are noticeable, mainly caused by the long-distance data transmission and constrained parallelism.

It is also noted that, although a larger energy storage capacitor buffer can be used to smooth the power trace, it cannot reduce the amount of energy for each backup or restore operation. A larger capacitor also results in larger leakage current and more accumulating time to respond till it reaches an appropriate voltage level for supply regulation.

### B. Opportunities and Challenges for NVP

The recent success of embedding nonvolatile memory (NVM) into the same chip has incubated the concept of nonvolatile processing (NVP) with nonvolatile D flip-flop (NV-DFF) which backs up the computation states of each D flip-flop (DFF) into a local on-chip NVM [4]-[11]. Table I summarizes some existing NV-DFF designs based on ferroelectric capacitor [10]–[14], MTJ [15]–[17], ReRAM [18] etc., with their pros and cons. With a shorter data transmission distance and a wider bus than the out-of-chip backup, the energy and latency overhead could be reduced, as illustrated in Fig. 1. However, existing NVP solutions are still facing many challenges.

1549-8328 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.

|                      | [10] (measured)                                      | [11] (measured) | [19] (simulated)       | [18] (simulated)                        | [20] (simulated)                    | ) This Work          |  |
|----------------------|------------------------------------------------------|-----------------|------------------------|-----------------------------------------|-------------------------------------|----------------------|--|
| Year                 | 2014                                                 | 2014            | 2014                   | 2013                                    | 2016                                | (Simulated)          |  |
| Feature size         | 130nm                                                | 130nm           | 45nm                   | 180nm                                   | 10nm                                | 10nm                 |  |
| Nonvolatile material | PZT Capacitor                                        | PZT Capacitor   | MTJ                    | Al/TiO <sub>2</sub> /Al<br>ReRAM        | NCFET with HfO <sub>2</sub> and PZT |                      |  |
| Retention Time       | 10 hours to 10 years; varying by design              |                 | ~10 years              | ~10 years                               | Same as PZT capacitor in theory     |                      |  |
| Endurance            | Varying by material; possibly >10 <sup>15</sup> [21] |                 | >10 <sup>15</sup> [22] | 10 <sup>5</sup> -10 <sup>10</sup> [22]; | Same as PZT capacitor in theory     |                      |  |
| Voltage              | 1.5V                                                 | 1.5V            | 1.1V                   | 1.8V                                    | 0.4V-0.8V                           | 0.4V-1.0V            |  |
| Area overhead        | 64%                                                  | 49%             | -2%                    | -                                       | 35%                                 | 25% more transistors |  |
| Backup time          | 1.64µs                                               | 2.22µs          | 909ps                  | 10ns @2.4V                              | 1.4ns @0.5V                         | 1.0ns @0.8V          |  |
| Restore time         | 1.25µs                                               | 2.2µs           | 177ps                  | 1.3µs @0.4V                             | 75ps @0.5V                          | 5V 56ps @0.8V        |  |
| Backup energy        | 2.4pJ                                                |                 | 82.2fJ                 | 735fJ                                   | 7.0fJ @0.5V                         | 1.3fJ @0.8V          |  |
| Restore energy       | 2.34pJ                                               | 3.44pJ in total | -                      | 735fJ                                   | 9.0fJ @0.5V                         | 1.1fJ @0.8V          |  |
| Additional Control   | Needed                                               | Needed          | Needed                 | Needed                                  | Needed                              | Not Needed           |  |

TABLE I Performance Comparisons of Recently Reported NV-DFF Designs

The first challenge is how to handle the consequence of the energy, delay and possibly high peak power of a backup or restore operation. When a clustered NVM array is used for backup, across-the-chip interconnections and the series memory access pattern still result in a long backup/restore latency, and high energy consumption. Instead, if distributed NVM cells and interface circuitry are placed close to each DFF to build a nonvolatile DFF (NV-DFF) for local parallel backup, as illustrated in Fig. 1(c), the backup/restore operation will be much faster but still consume high energy because of the duplicated interface circuit between the DFF and NVM cells. The high current one NV-DFF backup/restore operation also limits the plausible parallelism.

The second challenge is how to provide backup and restore control for distributed NV-DFFs, and how to design the backup policy, especially on when to backup [1]. To provide this control results in additional wiring with more area and energy consumption. What's worse, the processor architecture and the software need to adapt to the control, which limits the compatibility of existing software and makes the operating system or processor design complicated. Furthermore, a backup triggered too early or too late will waste energy that could otherwise be saved, or result in a backup failure and rolling back in progress. Although architectures and even task scheduling could be optimized [1], [7], this challenge still exists because of the inherent nature of intermittency and unpredictability in ambient power sources.

## C. Proposed Solution of NCFET Nonvolatile Computing

This paper proposes an NVP solution through circuit innovations with negative capacitance FET (NCFET). NCFET is a promising emerging beyond-CMOS transistor [9], [23]–[29]. With the capability of steep switching, tunable hysteresis, and good scalability, NCFET has attracted interests in both logic and memory applications [32]–[34]. The innovation in this paper originates from the fact that an NCFET device could be



Fig. 1. Opportunities and challenges in energy harvesting systems: (a) Example of intermittent power income and computation progress; (b) harvested energy utilization; (c) conventional NV-DFF; (d) proposed intrinsically NV-DFF.

treated as an NVM cell while serving as a logic transistor. The proposed nonvolatile NCFET latches and DFFs enable a new paradigm of low-power intrinsically nonvolatile operations:

• The proposed nonvolatile latches and flip-flops have only a few extra transistors added to the conventional CMOS designs, leading to a compact cell design. Unlike most existing NV-DFF designs based on the Ferroelectric capacitor in [4] and [11] and the Spin Orbit Torque in [14], no additional circuitry or a different supply voltage is needed for sensing or driving functions.



Fig. 2. NCFET device structures: externally-connected ferroelectric capacitor in (a), integrated ferroelectric layer on the planar MOSFET gate in (b) and around the gate in a FinFET structure in (c), and atop a fin structure in (d).

- These designs are free from external backup and restore controls because all backup and restore operations are carried out autonomously, improving compatibility and complexity of existing logic designs with drop-in replacement of latches and DFFs, as shown in Fig. 1(d). Getting rid of the need for external controls could significantly reduce the control overhead for check-pointing and power-gating.
- These designs are also fast and energy-efficient, with similar delay and energy consumption to conventional volatile designs under a stable supply. Typical energy and latency for a backup plus restore operation are only at levels of fJ and ns, respectively. Note that the restore time also depends on the supply voltage recovery time. The operations could be even faster with kinetic coefficient improvement in the NCFET device. With such a low energy and delay overhead, the utilization of harvested energy for the purpose of computing will be significantly improved. For the same reason, check-pointing and power-gating could be carried out in fine grain as needed with significantly reduced energy and delay penalties.

In the rest of this paper, Section II introduces the NCFET background, Section III and IV describe the proposed NCFET nonvolatile latches and flip-flops, along with simulation results and performance comparisons. Section V discusses related applications and Section VI concludes this paper.

# II. NCFET DEVICE BACKGROUND

This session introduces the NCFET device basics, including device structures, operating mechanisms, recent advance in the device fabrication, and finally device modeling.

## A. Device Structure and Operating Theory

NCFET, also known as ferroelectric FET (FeFET), is essentially a metal-oxide-semiconductor FET with a negative capacitance material layer placed at the gate [23]. A few NCFET structures have been reported, including a MOSFET with an externally connected negative capacitor [24], [25] and a MOSFET with an integrated negative capacitor at the gate [26]. Fig. 2(a-c) illustrates some design structures.

Many ferroelectric materials, such as PbTiO, BaTiO, Pb(ZrTi)O, HfZrO, etc., have exhibited negative capacitance [35]. Strictly speaking, negative capacitance should be



Fig. 3. NCFET: (a) N-type NCFET device concept; (b) Capacitance network of an NCFET; (c) I-V curves: steep-switching and hysteresis with the baseline MOSFET curve in black.



Fig. 4. NCFET I-V hysteresis in (a) and P-E loop of ferroelectric capacitor in (b). The curves in (a) are for the fin-structure NCFET in Fig. 2(c).

"negative differential capacitance", whose charge decreases as the applied voltage increases within a certain voltage range. Fig. 3(a-b) illustrates a simplified concept for NCFET with the negative capacitor connected to the internal MOSFET gate capacitor. One interesting phenomenon that fits well with Boolean logic computation is the boosted voltage change  $\Delta V_{MOSFET}$  at the internal MOSFET gate induced by an external NCFET gate voltage change  $\Delta V_{NCFET}$  with  $|C_{FE}| >$  $C_{MOS}$ , in which  $C_{FE}$  and  $C_{MOS}$  represent the ferroelectric negative capacitance and the MOSFET gate capacitance, respectively. The voltage gain at the internal MOSFET gate over the overall applied NCFET gate voltage could be approximated by  $G = \Delta V_{MOSFET} / \Delta V_{NCFET} = C_{FE} / (C_{FE} +$  $C_{MOS}$  =  $|C_{FE}|/(|C_{FE}| - C_{MOS}) > 1$ . This indicates a steep switching that enables a lower supply to achieve the same ON-OFF ratio, as shown in Fig. 3(c).

More interestingly, recent reports have also theoretically predicted and experimentally confirmed that, by tuning  $C_{FE}$ , so that  $|C_{FE}| < C_{MOS}$  within some voltage range, a hysteresis loop in the  $I_{DS}$ - $V_{GS}$  appears and even expands to the negative  $V_{GS}$  range [23]–[25], as shown in Fig. 4(a).

The mechanism behind this is the polarization *P* versus electric-field *E* hysteresis of the ferroelectric capacitor shown in Fig. 4(b). The charge  $Q_{FE}$  of the ferroelectric capacitor is a function of *E* and *P* in the way of  $Q_{FE} = \varepsilon_0 E + P$ , where  $\varepsilon_0$  is the permittivity of the vacuum, and *P* is typically  $\gg \varepsilon E$ . The switching of *P* is electrically determined by *E* and a switching occurs when the applied voltage exceeds the coercive voltage. Before the polarization switching occurs,  $C_{FE}$  is positive. As  $V_{NCFET}$  increases from 0V to close to

|                        | EDL'15<br>[30] | EDL'16<br>[28]    | EDL'16<br>[27] | IEDM'15<br>[29] |
|------------------------|----------------|-------------------|----------------|-----------------|
| Ferroelectric material | HfZrO          | P(VDF-TrFE)       | BiFeO          | HfZrO           |
| Structure              | Planar         | Planar            | Fin            | Fin             |
| SS (Forward) (mV/dec)  | 50             | 45-52 w/o and     | 8.5-11         | 55              |
| SS (Reverse) (mV/dec)  | 50             | 2-4 w/ hysteresis | 16-50          |                 |
| Hysteresis             | Negligible     | Depends           | Yes            | Depends         |

TABLE II RECENT NCFET DEVICES

the  $I_{DS} - V_{GS}$  rising edge, the voltage across the ferroelectric material  $V_{FE}$  increases from the initial value of 0 to close to the coercive voltage. As  $V_{NCFET}$  further increases,  $V_{FE}$  steps beyond the coercive voltage, and  $C_{FE}$  becomes negative, leading to the unstable negative total series capacitance of  $C_{FE}$ and  $C_{MOS}$  that triggers a very steep switching of  $I_{DS}$  until  $C_{FE}$  settles to be positive again. The outbound trip is similar. In both outbound and inbound curves away from the switching edges, a relatively flat  $I_{DS} - V_{GS}$  may be observed if  $C_{FE}$ (being positive in this region) is much smaller than  $C_{MOS}$ , because  $\Delta V_{MOS} = \Delta V_{NCFET} \times C_{FE}/(C_{FE} + C_{MOS}) \approx 0$ .

Such a hysteresis feature covering  $V_{GS} = 0V$  indicates a stable remnant polarization when power is removed [35]. Actually, a zero-polarization of the ferroelectric material is not stable as the free energy is not the lowest [35].

It is also noted that the hysteresis width is a function of the dynamics of both  $C_{FE}$  and  $C_{MOS}$ , and could be typically tuned by varying the ferroelectric layer thickness, as observed in Fig. 3(c) and Fig. 4 [23], [24].

### B. Recent NCFET Fabrication and Modeling Progress

The recent NCFET device as a beyond-CMOS steep-slope device apparently got an impetus for low-voltage Boolean operations, like Tunnel FETs [30], [31]. To make NCFET compatible with conventional CMOS logic, some works try to prevent the I-V hysteresis [25]–[28], while others make use of it to build a new device different from CMOS [24]. Table II summarizes some recent NCFET devices showing the advantage of subthreshold swing (SS) that is substantially lower than the fundamental limit of 60mV/decade in CMOS in the room temperature. While a demonstration of integrated circuits is not yet available, these reports have shown the scalability of NCFETs below 100 nm.

NCFET modeling needs to consider the dynamic mechanism as discussed above. There have been some works on time-domain-based modeling supporting SPICE transient simulations [36], [37], although device variations, temperature dependence, etc. are not yet supported. This paper adopts the ferroelectric capacitor model in [20], [32], and [36] that has been experimentally calibrated, and then attach it to a PTM CMOS FinFET to build NCFET models. The ferroelectric layer thickness is set to 8 nm for proper hysteresis as shown in Fig. 4. The optimization of the ferroelectric layer thickness, will be further discussed in Section V. The switching



| Scenarios             | Operations                            |  |  |
|-----------------------|---------------------------------------|--|--|
| VDD='1'; CLK='1'      | Q follows D                           |  |  |
| VDD='1'; CLK='0'      | Q holds                               |  |  |
| VDD='0'               | Q=0                                   |  |  |
| VDD='0'→'1'; CLK= '0' | Q restores to stored state (NV-Latch) |  |  |

Fig. 5. Proposed NCFET latches: NV-Latch-N in (a) and NV-Latch-NC in (c) in comparison with CMOS volatile latches in (b) and (d); function table in (e).

time has been calibrated based on experimental results by setting the kinetic coefficient  $\rho$  of the ferroelectric material to be 0.25 [36].

#### **III. PROPOSED NCFET NONVOLATILE LATCHES**

This section presents the proposed two categories of NCFET nonvolatile latch designs in Fig. 5: NV-Latch-N in Fig. 5(a) with a differential-driving input pair D/DN, and NV-Latch-NC in Fig. 5(c) with only D as the data input. In Fig. 5(c), the two bottom NMOS transistors have a lower  $V_{TH}$  for higher yield (see Section V). Their baseline designs are shown in Fig. 5(b) and Fig. 5(d), respectively. As to be revealed later in this section, they behave differently in area, energy per switching, delay, etc. When the power supply is steady, their logical transfer function is the same as shown in Fig. 5(e).

## A. NCFET NV-Latch-N: Functionality and Operation Theory

NV-Latch-N has two extra N-type NCFETs over the CMOS counterpart design. Both latches are designed with sufficient input driving strength so as to overwrite the state by the input pair D/DN directly, when the clock signal CLK is high. When CLK is high, it connects the input differential signals D and DN to the output ports of Q and QN, respectively.

Fig. 6 illustrates a steady state of NV-Latch-N with CLK being high. Let us assume Q is logically '1' (representing



Fig. 6. NV-Latch-N operation: steady state in (a) and state transition in (b). In the figure, '0' and '1' are used to represent *GND* and *VDD*, respectively.



Fig. 7. Transient waveforms for sample and hold operations with a steady supply. The last P plot indicates "polarization" of the two NCFETs in the NV-Latch-N scheme (Blue: M5 polarization; Purple: M6 polarization).

VDD in voltage) and QN is logically '0' (representing GND in voltage). Controlled by Q or QN, M3 and M8 are turned off, and M4 and M7 are turned on. For NCFET M5, its gate voltage is high, and the drain connecting to QN is low, its source is discharged to GND by M7. As a result, M5 is turned on and stays at the positive polarization state with low drainsource resistance. For NCFET M6, it stays OFF at the negative polarization state with high drain-source resistance. It is noted that, a low voltage of GND at QN will turn off M8, and set M6 as negatively polarized (high drain-source resistance): if M6 was at a previous positive polarization (low drain-source resistance) state, M6 source will be charged to Q, which is VDD, by M6. As a result, the drain-source voltage of M6 becomes -VDD. After that, M6 will remain negative in its polarization even if its source node is slowly discharged by M8.

Fig. 7 shows the NV-Latch-N transient waveforms of the sample and hold operations with a steady supply. At time t1 around 61ns, when the input D is steadily '1', the rising CLK turns on the switches and the latch output Q switches from the previous state of '0' to follow the input D with '1', along with the observable polarization switching of the NCFETs. At time t2 around 62ns, the clock CLK turns to '0' and the latch holds the state even if D changes. At time t3 around 62.5ns, the polarization switching is accomplished and stays stable until the latches samples the next different input state.

In Fig. 7, all polarization switching could successfully finish before the next switching due to sufficient time window before the next switching event. In Fig. 8, with the given input pattern



Fig. 8. NCFET polarization switching progress in NV-Latch-N with a higher data-rate. The last P plot indicates "polarization" of M5 in blue and M6 in purple in the NV-Latch-N scheme.

of D and CLK, the duration of the total time of Q to stay at the new state, i.e. '1' from 60ns to 67ns, is too short for the polarization switching to finish. As a result, none of them in this 7ns period of time succeeds in switching the polarization state, until after 67ns when Q stays at '1' for more than 0.5nS.

Such a phenomenon of a polarization switching speed being lower than the input data rate actually reveals the possibility of having a pseudo-floating '0' of Q at the latch output, with high resistance to VDD (with turned-off PMOS) and also mediumto-high resistance to GND (due to partially turned-off NCFET in series with a turned-on NMOS). Note that a pseud-floating '1' does not exist because a Q output of '1' is always securely connected to VDD by the PMOS.

The pseudo-floating '0' has three major effects. First, functionality will stay correct with proper noise shielding. This is because a short period of ns in time of being pseudo-floating will not have the state corrupted if external coupling noise is properly isolated. For example, most standard embedded DRAM cells have a retention time of a few microseconds [39] at the internal floating MOSFET gate. After the polarization switching finishes in the order of ns or sub-ns, this pseudofloating '0' becomes a steady '0' connecting to *GND*.

Second, endurance will improve. Although existing research has not yet found out a fundamental bottleneck in improving the endurance or aging effects of ferroelectric materials [38], existing ferroelectric materials degrade faster in terms of number of full-swing switching cycles than CMOS transistors. The way above of slower polarization switching helps to reduce the number of full-swing switching activities, which improves the endurance of NCFETs.

Third, delay and energy performance may also improve. First, considering the conventional volatile CMOS latch in Fig. 5(b), in order to overwrite the previous '0' state of Qto be '1', the new input datum of '1' has to "fight" with the pulling down transistor in the latch. Such a racing condition causes significant power consumption until the switching completes. On the contrary, in NV-Latch-N, considering the pseudo-floating '0' Q due to the higher resistance from Qto *GND* through the not-fully-ON NCFET, overwriting this '0' to be '1' has less short current or racing conditions, which improves the speed and energy performance. It is also noted that, using a high threshold NMOS transistor could also



Fig. 9. Transient waveforms of backup and restore operations in NV-Latch-N.



Fig. 10. Proposed NV-Latch-P in (a) and NV-Latch-NP in (b).

mitigate the racing "fight" but could not provide equally low pull-down resistance after the polarization switching finishes.

Fig. 9 shows the transient waveforms with backup and restore operations due to power failures. At around 180nS and 2,175ns, backup of '0' and '1' is carried out, respectively; at around 2,100ns and 4,080ns, restore of '0' and '1' is observed, respectively. The simulation is accompanied with close to a 2us power-down window to mimic really field scenarios.

The idea of using the N-type NCFET to build NV-Latch-N could be extended straightforward to two other nonvolatile latch designs, namely NV-Latch-P with P-type NCFETs and NV-Latch-NP with both N-type and P-type NCFETs. Their circuit topologies are shown in Fig. 10. The operation theory is similar to that of NV-Latch-N.

In terms of the physical layout design, the overhead of the extra NCFETs could be mitigated by properly sharing the drain and source active region without the need for adding additional contacts for the drain and source terminals of the NCFETs.

# B. NV-Latch-NC: Operation Theory and Analysis

The second type of external-control-free nonvolatile NCFET latch, namely NV-Latch-NC is shown in Fig. 5(c). Compared



Fig. 11. Operating mechanism of NV-Latch-NC during sampling phase in (a) and hold phase in (b) with highlighted charging and discharging routes.



Fig. 12. Transient waveform snapshots of NV-Latch-NC.

with the baseline design of 10-transistor CMOS volatile latch in Fig. 5(d), the proposed NV-Latch-NC has 6 more transistors, including the two NCFETs, two extra NMOS transistors in parallel with the NCFETs, and a CMOS switch that is set to be always turned-on by *GND* and *VDD*. Both NV-Latch-NC and the CMOS latch in Fig. 5(d) could cut off the inverter latching loop during the sampling phase.

Fig. 11 explains the operation theory. During the sampling phase (high CLK), M3 and M5b form an inverter that provides a fast settling output of QN and no pseudo-floating status. The always-on CMOS switch consisting of M9 and M10 delivers QN to drive the second inverter to provide Q. The feedback loop of the switch is turned off to prevent racing short current during this sampling phase, and will be turned on to form a stable closed loop during the hold phase (low CLK), as shown in Fig. 11(b). The restore operation is similar to NV-Latch-N, by sensing a different Q or QN to ground resistance and continuing to settle down to VDD or GND in the feedback loop.

Fig. 12 shows the transient simulation waveforms of the proposed NV-Latch-NC. Along with the normal sampling and



Fig. 13. Energy versus delay comparison between CMOS and NCFET latches: (a) x1 fan-out driving strength; (b) x4 fan-out driving strength (only CMOS transistors are increased to x4). *VDD* ranges from 1.0V to 0.5V from left to right.

hold operations with a stable *VDD*, it also shows successful backup and restore operations of Q = '0' and Q = '1' during the power outages around  $5\mu$ s and  $10\mu$ s, respectively.

#### C. Latch Performance Evaluation and Discussions

Fig. 13 summarizes the simulated energy vs CLK-to-Qdelay performance of NV-Latch-N and NV-Latch-NC with x1 and x4 driving capabilities at different supply voltages, in comparison with the conventional CMOS volatile latches. Simulations are carried out with 1.0fF capacitor load, and an input rise and fall time of 20ps for the input clock, D, and DN. Energy consumption has included clocking and D/DNdriving. While being aware that the NCFET latches could operate at higher frequencies, in this section of evaluation, all polarization switching is fully completed, and the *Energy* in Fig. 13 includes the energy for Q/QN setup and also polarization switching as a backup operation. As the supply voltage reduces, simulation results summarized in Fig. 13 show that the energy per switching reduces and the delay increases. The proposed NCFET NV-Latch-NC has about 64% and 39% extra energy-delay product (EDP) over the CMOS latch. If there is no need for a complete polarization switching (in cases with a high data rate), it consumes less energy for each switching event.

Fig. 14 shows the energy and time for backup and restore operations. As supply voltage increases, backup



Fig. 14. Time and energy of the backup and restore operations of NCFET nonvolatile latches (x1 fan-out driving strength).

and restore time decreases, and backup and restore energy increases. Being an intrinsic nonvolatile latch, simulation results in Fig. 14 also show that the proposed NCFET latches are very fast and energy-efficient for backup and restore operations. For example, under 0.8V VDD, the backup time is lower than 1.0ns, and the intrinsic restore time is lower than 75ns for both NV-Latch-N and NV-Latch-NC. It is noted that, in Fig. 14, the intrinsic restore time is obtained with an assumption of very fast supply voltage recovery in a few ps. In real scenarios the supply voltage recovers slower (usually much larger than ns) and the capability of fast intrinsic restore operation guarantees that the overall restore operation closely follow the supply recovery. In Fig. 14, the restore energy is obtained based on  $0.8\mu$ s supply voltage ramp-up time to mimic real scenarios. Meanwhile, the backup and restore energy is lower than 3.0fJ for the entire operating range of 0.5V to 1.0V. This enables extremely low-energy backup and restore operations of the entire processor being carried out autonomously and in parallel without worrying about the peak current.

### IV. PROPOSED NCFET NONVOLATILE DFFS

A nonvolatile DFF could be directly built by replacing the master latch or the slave latch with the proposed NCFET nonvolatile latches. For a positive-edge-triggered DFF, making the slave latch to be nonvolatile simplifies clocking during restore, as the slave latch is isolated from the input data when *CLK* is low (i.e. in hold/restore phase). In a pipelined design, the output Q/QN will be sent to the following stages, and setting the master slave latch of the subsequent-stage DFFs.

Fig. 15(a) shows an example with the slave latch implemented with an NV-Latch-N. This structure is simple and has the minimum overhead in terms of number of additional number of transistors. However, the driving strength of the first latch should be sufficient to make sure a new state could be set up in time for the slave latch. Also, the possible bleeding current from *D*through the input switch affects the timing



Fig. 15. NCFET NV-DFF with the slave latch replaced by an NCFET NV-Latch. (a) Master and slave latches connected with switches; (b) Master and slave latches connected with clocked inverters for isolation.



Fig. 16. Proposed NV-DFF-NC in (a), in comparison with a conventional volatile CMOS DFF in (b).

behavior. Fig. 15(b) gives another example that uses clocked inverters as a buffer to drive the slave latch and to isolate the possible bleeding current between master and slave stages. The main shortcoming of the designs in Fig. 15 is the conflict between the differential input with an existing different latched state that causes extra energy and time to settle down.

In contrast, based on NV-Latch-NC, Fig. 16(a) shows another NV-DFF design, namely NV-DFF-NC. Fig. 16(b) shows the corresponding widely used CMOS volatile DFF scheme. In both designs, clocked inverters instead of CMOS switches are used to drive master and slave latches to provide better kick-back isolation between the two stages and the driver of D. It is noted that the proposed NCFET NV-DFF has only 6 more transistors than the CMOS DFF, with a total of 30 transistors including a local clock driver.

Fig. 17 shows the transient simulation waveforms of the proposed NV-DFF-NC. With a steady *VDD*, the DFF carries out the positive-edge-triggered sample-and-hold operations. The intrinsic non-volatility of NV-Latch-NC is inherited. As shown in Fig. 17, during power failures, the polarization of the NCFETs maintains stable. When power supply recovers, the restore operations automatically restore the output Q and QN to its previous state before the power failure. In Fig. 17, the backup and restore operations for bit '0' and bit '1' are shown around 4-6 $\mu$ s and 9-11 $\mu$ s, respectively.

Fig. 18 compares the backup and restore performance metrics with existing NV-DFF designs based on ferroelectric capacitor and NCFETs in [20]. This prior NCFET NV-DFF in [20] used a more complex backup and restore circuit scheme and needs multiple steps to carry out the operations.



Fig. 17. Transient waveform snapshot of NV-DFF-NC.



Fig. 18. Time and energy of the backup and restore operations of the proposed NCFET NV-DFF-NC in comparison with existing NV-DFF designs based on ferroelectric capacitor and NCFET in [20].

Meanwhile, the NCFET NV-DFF restore operation in [20] may cause static leakage current if the NCFET polarization state is positive (low-resistance state). Such a circuit design in [20] causes tens of times higher energy overheads. When compared with the NV-DFF based on stand-alone ferroelectric capacitor, more significant amount of energy could be saved for backup and restore operations. This is mainly because of the removal of the complex driving and sensing schemes for the ferroelectric capacitor. Thanks to the deeply embedded logic-in-memory operation in the proposed simple and effective circuit structure, the proposed design enables a new low-energy operation paradigm.

It is also noted that the proposed design has lower backup and restore speed than [20]. This is because of the adoption of low-power (high  $V_{TH}$ ) CMOS transistors to achieve low leakage current for low-power IoT applications. For example, the prior NCFET NV-DFF has around  $0.12\mu$ W static leakage power when operating at 0.8V supply as reported



Fig. 19. Energy-delay performance comparison during normal operations. *VDD* ranges from 1.0V to 0.5V from left to right.

in [20], while the proposed design has less than 0.2nW at 0.8V. Luckily, the backup and restore speed of the proposed NV-DFF is sufficiently fast for most scenarios, as charging and discharging the supply network on the chip following a power failure and recovery usually takes much longer time than a few nanoseconds. As will be discussed in Section V, the adoption of high  $V_{TH}$  MOSFETs in the main signal routes also improves the reliability during backup and restore operations.

Fig. 19 compares the energy versus delay between NV-DFF-NC and CMOS volatile DFF with x1 and x4 driving strength. The NCFET latch NV-DFF-NCx4 has around 35% energydelay-product overhead. Note that the energy here includes the backup energy consumed after Q/QN settles down. For applications with a higher bit-rate or dynamic frequency scaling (DFS) in which the clock cycle is shorter than the polarization switching time, the energy overhead could be much smaller. More importantly, for many IoT applications in which the processor is normally idle, most energy is consumed by the stand-by leakage. In these scenarios, a moderate energy overhead is fairly acceptable.

Table I also compares the proposed NV-DFF-NC with other reported NV-DFF designs at different technology nodes. Considering the lack of the same baseline of technology node and operating voltage, etc., Table I is not an apple-to-apple comparison. Nevertheless, there are fundamental advantages in the proposed NCFET NV-DFF design over existing designs based on MTJ and ReRAM. This is because MTJ and ReRAM are two-terminal devices and the change of their memory state requires a static current or voltage across them for a certain period of time. Such static current, especially considering the widened time window for write operations due to the impact of device variations, significant amount of energy will be consumed by MTJ and ReRAM. Furthermore, the great scalability, low-voltage operation, high ON-OFF ratio, and more importantly, the unique external-control-free feature, highlight the advantage of the proposed promising NCFET NV-DFF designs.

For the proposed NV-DFF-NC in Fig. 16(a), the setup and hold time will be almost identical to that of the CMOS volatile DFF in Fig. 16(b). This property stems from a master latch

of the same structure and size being used in both designs that features a clocked inverter to provide isolation.

## V. FURTHER DISCUSSIONS ON RELATED APPLICATIONS

This section continues to discuss a few important perspectives relating to the design and applications.

# A. Device Optimization

Given an NCFET structure and the ferroelectric material, the tunable NCFET design parameter is the ferroelectric layer thickness  $T_{FE}$ . During device optimizations, the first concern is the retention time. For NCFET memory devices, it depends on the energy barrier between the two polarization states. Increasing  $T_{FE}$  helps increasing the coercive voltage that is required to change the polarization (see Fig. 4). However, it results in a larger minimum required supply voltage, which indicates more energy consumption each time the polarization is switched. Another device optimization considering yield will be discussed in the next sub-section.

There are also other design considerations, such as circuit area and fabrication cost, operating voltage rage, endurance, etc. which will be of significance when the model becomes available. For the endurance consideration, the proposed NCFET NV-DFF may not be suitable for high-frequency applications. This makes improving device endurance useful.

### B. Design Variation and Yield Analysis

It is noted that the restore functionality of the NCFET latches and DFFs depends on the difference of the sensed resistance from Q/QN to GND or VDD, which indicates that the sensitivity of the sensed resistance may be critical for yield. For simplicity, only the sensed resistance to GND, i.e. $R_{O2GND}$ , is analyzed here.

Taking NV-Latch-N for example, the sensed resistance is the sum of the series NCFET drain-source resistance  $R_{NCFET}$ and the NMOS drain-source resistance  $R_{CMOS}$ . When storing a different latch bit information, the key difference in the initial sensed resistance, without considering process non-idealities, varies in  $R_{NCFET}$ . As a result, the ratio of  $R_{Q2GND}$  between the two branches in the latch is

$$\Gamma = R_{Q2GND,'1'}/R_{Q2GND'0'} 
= (R_{NCFET,'1'} + R_{CMOS,'1'})/(R_{NCFET,'0'} + R_{CMOS,'0'}) 
\approx R_{CMOS,'1'}/(R_{NCFET,'0'} + R_{CMOS,'0'}).$$
(1)

A smaller  $\Gamma$  will lead to more stable restore operation and is more noise-resistant. A small  $R_{CMOS}$  or a large  $R_{NCFET,'0'}$ is thus helpful. Considering the orders of difference in the ON-OFF resistance of  $R_{NCFET}$ , the approximation in (1) is rather safe. For this purpose,  $T_{FE}$  is set to be 8 nm so as to provide large ON-OFF state resistance while enabling low-voltage operation. Since the degradation of  $\Gamma$  can be easily caused by the variation of the NMOS initial resistance,  $R_{CMOS,'0'}$  and  $R_{CMOS,'1'}$ , analysis should be carried out. This is especially important for designs of NV-Latch-NC and NV-DFF-NC, because another parallel branch (see M5b and M6b in Fig. 11) is affecting  $\Gamma$ , too.



Fig. 20. Simulation setup for the analysis on the impact of variation.

|                              |      | Supply voltage |      |      |      |      |      |
|------------------------------|------|----------------|------|------|------|------|------|
|                              |      | 0.5V           | 0.6V | 0.7V | 0.8V | 0.9V | 1.0V |
| V <sub>TH</sub><br>variation | 30mV | Pass           | Pass | Pass | Pass | Pass | Pass |
|                              | 40mV | Fail           | Fail | Fail | Fail | Fail | Fail |

Fig. 21. Yield simulation results considering  $V_{TH}$  variation at different supply voltages for both NV-Latch-NC and NV-DFF-NC.

The difference in  $R_{CMOS,0'}$  and  $R_{CMOS,1'}$  mainly comes from device size mismatch and threshold voltage  $V_{TH}$  variation  $\Delta V_{TH}$ . By manually adding an opposite in-series gate driving voltage to the gate, as shown in Fig. 20, the impact of  $\Delta V_{TH}$  can be quantified through a series of simulations. The results are summarized in Fig. 21, with *VDD* ramp-up time equal to  $0.8\mu$ s to mimic typical real scenarios. The results show that, within the 0.5V to 1.0V supply voltage range, the design is reliable with  $\Delta V_{TH}$  no more than 30mV. Note that this result provides a fairly large margin for design, as all the MOSFETs in Fig. 20 are having an unfavorable direction of  $V_{TH}$  variation if the NCFETs on the left and right branches store a negative and positive polarization state, respectively.

It is also interesting to find out, that the variation impact is independent on the supply voltage within the given range of 0.5V to 1.0V. This is because of the relatively long rising time for the supply voltage to recover, and the fact that the initial restore trend is almost equal for scenarios with different *VDD*.

To provide a small  $\Gamma$ , the latch is designed in a way that all CMOS transistors have a higher  $V_{TH}$  than the bottom two transistors connecting to NCFETs, i.e. M7 and M8 in Fig. 11 and Fig. 20. By doing this, the following goals could be achieved:

(i) The resistance of M7 and M5b, or that of M8 and M6b, plays a less significant role than that of NCFETs, as M7 is in series with the NCFET, and M5b is in parallel with the NCFET. This will help built the correct rising trend of Q and QN when *VDD* starts to recover.

(ii) Static leakage current of the latch will not increase. This can be guaranteed by a proper NCFET design with a high OFF-state resistance. Given a certain ferroelectric material and transistor structure, this OFF-state resistance could be tuned by varying  $T_{FE}$  and the width of NCFET.

In this paper, the device variation of NCFET is not considered due to lack of model. Luckily, the large inherent ON-OFF resistance ratio will help to reduce the impact of NCFET variation.

## C. Related Applications

The proposed NCFET NV-DFF could be strongly complementary to existing power-gating approaches in both low and high-performance systems. In aggressive, high speed systems using fine-grained, low-latency power-gating techniques [40], the ability to power-gate stateful units up to and including entire processor cores within a handful of cycles would both expand the scope of what can be power-gated and simplify design constraints. While the impact of no need for backup and restore control for power-gating is still not yet fully explored, it is promising to open up new possibilities for further energy savings and architecture optimizations due to the reduced control complexity.

On the energy-harvesting end of the spectrum, one apparent benefit is the reduced backup and restore energy consumption and latency, which improves the utilization of harvested energy for the purpose computation. The intrinsic non-volatility ensures no missing backup without the need for backup control, leading to prevention of roll-back operations in the computation progress. While there are already works on nonvolatile processor optimizations [41]-[43], further architecture-level optimizations would be useful to capture the intrinsic non-volatility of NCFET flip-flops. Meanwhile, it has been shown that the recovery time in NVPs after a power emergency are sometimes dominated by the recovery of analog components, such as ensuring PLL stability [44], [45]. While this limits the impact of the rapid recovery time that NC-DFFs have in such systems, their rapid, completely distributed and low energy backup properties may allow power gating of data path and other digital components fast enough to divert energy during shorter or less severe power emergencies in order to preserve analog functionality, using the NCFET NV-DFF cycle-latency (at NVP frequencies) power gating potential to shave microseconds off recovery times.

## VI. CONCLUSION

This paper has proposed a set of intrinsically nonvolatile latches and DFFs by harnessing the built-in non-volatility of negative capacitance field-effect transistors (NCFETs) with novel circuitry. Simulation results have shown fJ-level energy and ns-level intrinsic latency for a backup plus restore operation. With such low-energy and low-latency backup and restore operations, these fully external-control-free latches and DFFs enable a new nonvolatile computing paradigm for future IoT applications and power-gating applications.

## ACKNOWLEDGMENT

The authors would like to thank Prof. Sayeef Salahuddin and Prof. Asif Khan from University of California, Berkeley, Dr. Francky Catthoor from IMEC, Prof. Sharon Hu from University of Notre Dame, and Prof. Peter Asbeck from University of California, San Diego, for useful discussions and suggestions.

## REFERENCES

- K. Ma *et al.*, "Architecture exploration for ambient energy harvesting nonvolatile processors," in *Proc. IEEE 21st Int. Symp. High Perform. Comput. Archit. (HPCA)*, Burlingame, CA, USA, Feb. 2015, pp. 526–537.
- [2] X. Li, U. D. Heo, K. Ma, V. Narayanan, H. Liu, and S. Datta, "RFpowered systems using steep-slope devices," in *Proc. IEEE 12th Int. New Circuits Syst. Conf. (NEWCAS)*, Trois-Rivières, QC, Canada, Jun. 2014, pp. 73–76.
- [3] S. Kim *et al.*, "Ambient RF energy-harvesting technologies for selfsustainable standalone wireless sensor platforms," *Proc. IEEE*, vol. 102, no. 11, pp. 1649–1666, Nov. 2014.
- [4] F. Su, Y. Liu, Y. Wang, and H. Yang, "A ferroelectric nonvolatile processor with 46 μs system-level wake-up time and 14 μs sleep time for energy harvesting applications," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 64, no. 3, pp. 596–607, Mar. 2017.
- [5] Y. Liu *et al.*, "A 65 nm ReRAM-enabled nonvolatile processor with 6× reduction in restore time and 4× higher clock frequency using adaptive data retention and self-write-termination nonvolatile logic," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Jan./Feb. 2016, pp. 84–86.
- [6] Y. Liu *et al.*, "Ambient energy harvesting nonvolatile processors: From circuit to system," in *Proc. 52nd ACM/EDAC/IEEE Design Autom. Conf. (DAC)*, San Francisco, CA, USA, Jun. 2015, pp. 1–6.
- [7] D. Zhang et al., "Solar power prediction assisted intra-task scheduling for nonvolatile sensor nodes," *IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.*, vol. 35, no. 5, pp. 724–737, May 2016.
- [8] Y. Wang *et al.*, "Storage-less and converter-less photovoltaic energy harvesting with maximum power point tracking for Internet of Things," *IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.*, vol. 35, no. 2, pp. 173–186, Feb. 2016.
- [9] International Technology Roadmap for Semiconductors—ITRS 2.0 Home Page, accessed on May 26, 2017. [Online]. Available: http://www.itrs2.net
- [10] H. Kimura et al., "A 2.4 pJ ferroelectric-based non-volatile flip-flop with 10-year data retention capability," in Proc. IEEE Asian Solid-State Circuits Conf. (A-SSCC), KaoHsiung, Taiwan, Nov. 2014, pp. 21–24.
- [11] M. Qazi, A. Amerasekera, and A. P. Chandrakasan, "A 3.4-pJ FeRAMenabled D flip-flop in 0.13-μm CMOS for nonvolatile processing in digital systems," *IEEE J. Solid-State Circuits*, vol. 49, no. 1, pp. 202–211, Jan. 2014.
- [12] H. Kimura, Z. Zhong, Y. Mizuochi, N. Kinouchi, Y. Ichida, and Y. Fujimori, "Highly reliable non-volatile logic circuit technology and its application," in *Proc. IEEE 43rd Int. Symp. Multiple-Valued Logic*, May 2013, pp. 212–218.
- [13] C. Mitchell, M. R. Hunt, C. L. McCartney, and F. D. Ho, "Implementation of low-power, non-volatile latch utilising ferroelectric transistor," *Electron. Lett.*, vol. 51, no. 23, pp. 1884–1886, Nov. 2015.
- [14] S. Izumi et al., "A ferroelectric-based non-volatile flip-flop for wearable healthcare systems," in Proc. 15th Non-Volatile Memory Technol. Symp. (NVMTS), Beijing, China, 2015, pp. 1–4.
- [15] K.-W. Kwon, S. H. Choday, Y. Kim, X. Fong, S. P. Park, and K. Roy, "SHE-NVFF: Spin Hall effect-based nonvolatile flip-flop for power gating architecture," *IEEE Electron Device Lett.*, vol. 35, no. 4, pp. 488–490, Apr. 2014.
- [16] R. Bishnoi, F. Oboril, and M. B. Tahoori, "Non-volatile non-shadow flip-flop using spin orbit torque for efficient normally-off computing," in *Proc. 21st Asia South Pacific Design Autom. Conf. (ASP-DAC)*, Macau, China, 2016, pp. 769–774.
- [17] S. Yamamoto and S. Sugahara, "Nonvolatile delay flip-flop based on spin-transistor architecture and its power-gating applications," *Jpn. J. Appl. Phys.*, vol. 49, no. 9R, p. 090204, Sep. 2010.
- [18] I. Kazi et al., "Energy/reliability trade-offs in low-voltage ReRAM-based non-volatile flip-flop design," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 61, no. 11, pp. 3155–3164, Nov. 2014.

- [19] T. Na, K. Ryu, J. Kim, S.-O. Jung, J. P. Kim, and S. H. Kang, "Highperformance low-power magnetic tunnel junction based non-volatile flip-flop," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, Jun. 2014, pp. 1953–1956.
- [20] D. Wang, S. George, A. Aziz, S. Datta, V. Narayanan, and S. K. Gupta, "Ferroelectric transistor based non-volatile flip-flop," in *Proc. ACM Int. Symp. Low Power Electron. Design*, 2016, pp. 10–15.
- [21] Enhanced Endurance Performance of 0.13 um Nonvolatile F-RAM Products. [Online]. Available: http://www.cypress.com/enhanced-enduranceperformance-013-um-nonvolatile-f-ram-products
- [22] Y. Xie, Ed., Emerging Memory Technologies: Design, Architecture, and Applications. New York, NY, USA: Springer, 2013, p. 21.
- [23] A. I. Khan, C. W. Yeung, C. Hu, and S. Salahuddin, "Ferroelectric negative capacitance MOSFET: Capacitance tuning & antiferroelectric operation," in *IEDM Tech. Dig.*, Washington, DC, USA, 2011, pp. 11.3.1–11.3.4.
- [24] A. I. Khan *et al.*, "Negative capacitance in short-channel FinFETs externally connected to an epitaxial ferroelectric capacitor," *IEEE Electron Device Lett.*, vol. 37, no. 1, pp. 111–114, Jan. 2016.
- [25] J. Jo and C. Shin, "Negative capacitance field effect transistor with hysteresis-free sub-60-mV/decade switching," *IEEE Electron Device Lett.*, vol. 37, no. 3, pp. 245–248, Mar. 2016.
- [26] K.-S. Li et al., "Sub-60 mV-swing negative-capacitance FinFET without hysteresis," in *IEDM Tech. Dig.*, Dec. 2015, pp. 22.6.1–22.6.4.
- [27] M. H. Lee *et al.*, "Steep slope and near non-hysteresis of FETs with antiferroelectric-like HfZrO for low-power electronics," *IEEE Electron Device Lett.*, vol. 36, no. 4, pp. 294–296, Apr. 2015.
- [28] M. H. Lee *et al.*, "Prospects for ferroelectric HfZrO<sub>x</sub> FETs with experimentally CET = 0.98 nm, SS<sub>for</sub> = 42 mV/dec, SS<sub>rev</sub> = 28 mV/dec, switch-off <0.2 V, and hysteresis-free strategies," in *IEDM Tech. Dig.*, Dec. 2015, pp. 22.5.1–22.5.4.
- [29] S. Dasgupta *et al.*, "Sub-kT/q switching in strong inversion in PbZr<sub>0.52</sub>Ti<sub>0.48</sub>O<sub>3</sub> gated negative capacitance FETs," *IEEE J. Explor. Solid-State Computat. Devices Circuits*, vol. 1, pp. 43–48, Dec. 2015.
- [30] A. C. Seabaugh and Q. Zhang, "Low-voltage tunnel transistors for beyond CMOS logic," *Proc. IEEE*, vol. 98, no. 12, pp. 2095–2110, Dec. 2010.
- [31] K. Swaminathan, H. Liu, X. Li, M. S. Kim, J. Sampson, and V. Narayanan, "Steep slope devices: Enabling new architectural paradigms," in *Proc. 51st ACM/EDAC/IEEE Design Autom. Conf. (DAC)*, San Francisco, CA, USA, Jun. 2014, pp. 1–6.
- [32] S. George et al., "Nonvolatile memory design based on ferroelectric FETs," in Proc. 53rd ACM/EDAC/IEEE Design Autom. Conf. (DAC), Austin, TX, USA, Jun. 2016, pp. 1–6.
- [33] S. George *et al.*, "Device circuit co design of FEFET based logic for low voltage processors," in *Proc. IEEE Comput. Soc. Annu. Symp. VLSI (ISVLSI)*, Pittsburgh, PA, USA, Jul. 2016, pp. 649–654.
- [34] D. E. Nikonov and I. A. Young, "Overview of beyond-CMOS devices and a uniform methodology for their benchmarking," *Proc. IEEE*, vol. 101, no. 12, pp. 2498–2533, Dec. 2013.
- [35] A. I. Khan *et al.*, "Negative capacitance in a ferroelectric capacitor," *Nature Mater.*, vol. 14, no. 2, pp. 182–186, 2015.
- [36] A. Aziz, S. Ghosh, S. Datta, and S. K. Gupta, "Physics-based circuitcompatible SPICE model for ferroelectric transistors," *IEEE Electron Device Lett.*, vol. 37, no. 6, pp. 805–808, Jun. 2016.
- [37] C. Hu, S. Salahuddin, C.-I. Lin, and A. Khan, "0.2 V adiabatic NC-FinFET with 0.6 mA/µm I<sub>ON</sub> and 0.1 nA/µm I<sub>OFF</sub>," in *Proc. 73rd Annu. Device Res. Conf.*, 2015, pp. 39–40, doi: 10.1109/DRC.2015.7175542.
- [38] S. Sakai and R. Ilangovan, "Metal-ferroelectric-insulator-semiconductor memory FET with long retention and high endurance," *IEEE Electron Device Lett.*, vol. 25, no. 6, pp. 369–371, Jun. 2004.
- [39] K. C. Chun, P. Jain, T.-H. Kim, and C. H. Kim, "A 667 MHz logiccompatible embedded DRAM featuring an asymmetric 2T gain cell for high speed on-die caches," *IEEE J. Solid-State Circuits*, vol. 47, no. 2, pp. 547–559, Feb. 2012.
- [40] A. B. Kahng, S. Kang, T. S. Rosing, and R. Strong, "Many-core tokenbased adaptive power gating," *IEEE Trans. Comput.-Aided Des. Integr.*, vol. 32, no. 8, pp. 1288–1292, Aug. 2013.
- [41] K. Ma et al., "Nonvolatile processor architecture exploration for energyharvesting applications," *IEEE Micro*, vol. 35, no. 5, pp. 32–40, Sep./Oct. 2015.
- [42] K. Ma, X. Li, Y. Liu, J. Sampson, Y. Xie, and V. Narayanan, "Dynamic machine learning based matching of nonvolatile processor microarchitecture to harvested energy profile," in *Proc. IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD)*, Austin, TX, USA, Nov. 2015, pp. 670–675.

- [43] K. Ma *et al.*, "Spendthrift: Machine learning based resource and frequency scaling for ambient energy harvesting nonvolatile processors," in *Proc. 22nd Asia South Pacific Design Autom. Conf. (ASP-DAC)*, Chiba, Japan, Jan. 2017, pp. 678–683.
- [44] C.-H. Hsiao, T.-S. Horng, K.-C. Peng, and C.-H. Lee, "A low-noise CMOS frequency synthesizer with an ultra-short settling time," in *Proc. IEEE Int. Symp. Radio-Freq. Integr. Technol. (RFIT)*, Taipei, Taiwan, Aug. 2016, pp. 1–3.
- [45] V. K. Chillara *et al.*, "An 860 μW 2.1-to-2.7 GHz all-digital PLL-based frequency modulator with a DTC-assisted snapshot TDC for WPAN (Bluetooth Smart and ZigBee) applications," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, USA, Feb. 2014, pp. 172–173.



Xueqing Li (M'13) received the B.S. degree and Ph.D. degree in electronics engineering from Tsinghua University, Beijing, China. Since 2013, he has been a Post-Doctoral Research Associate with the Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA. He has authored or co-authored over 50 journal and conference papers and is a reviewer of over ten IEEE/ACM journals. His interests include high-performance CMOS data converters and highefficiency circuits and systems, including energy

harvesting, low-voltage digital circuits, nonvolatile memory and circuits, and nonvolatile processing architectures using emerging devices, such as tunnel FETs, ferroelectric negative-capacitance FETs, and phase-transition FETs. He was a recipient of Best Paper Awards at the HPCA'15 and the ASP-DAC'17.



Sumitha George was born in India. She received the B.Tech. degree in electronics and communication from Kerala, India, and the M.Tech. degree from IIT Delhi. She joined the Systems and Technology Group, IBM, in 2007. Her main area of focus at IBM was microprocessor physical design. She is currently a Graduate Student of Computer Science and Engineering with The Pennsylvania State University. Her current research interests include computer architecture and circuit design.



Kaisheng Ma received bachelor's degree (Hons.) in electrical engineering from Hangzhou Electronic University, Zhejiang, and the master's degree from the Institute of Microelectronic, following Dr. D. Yu and Dr. X. Cao, Peking University, Peking, China. He is currently pursuing the Ph.D. degree with the Department of Computer Science and Engineering, The Pennsylvania State University, following Dr. V. Narayanan, Dr. J. Sampson in PSU, and Dr. Y. Xie in UCSB. His current research interests include nonvolatile processor architecture, machine

learning, big data, neural networks, and neuromorphic computing.



Wei-Yu Tsai received the B.S. degree in electrical and control engineering from National Chiao Tung University, Taiwan, in 2009, and the M.S. degree in electrical engineering from National Tsing Hua University, Taiwan, in 2011. He is currently pursuing the Ph.D. degree with the Department of Computer Science and Engineering, The Pennsylvania State University, USA. His current research interests include nonBoolean computing, such as neural networks and coupled oscillators, and the full-custom integrated circuit designs in CMOS and emerging teremention. Tweed EET

devices such as III-V Heterojunction Tunnel FET.



Ahmedullah Aziz (S'10) received the B.Sc. degree from the Bangladesh University of Engineering & Technology in 2013, and the M.S. degree from The Pennsylvania State University in 2016, where he is currently pursuing the Ph.D. degree with the Department of Electrical Engineering. He is with the Integrated Circuits and Devices Lab as a Graduate Researcher. Previously, he was an Engineer with the Samsung Research and Development Institute. He was affiliated to the Tizen project in Solution Lab. His current research focus is on spin based memory

and emerging steep slope device technology.



John Sampson (M'04) received the Ph.D. degree from the University of California at San Diego, San Diego, CA, USA. He is currently an Assistant Professor with the Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA. His current research interests include energy-efficient computing, architectural adaptations to exploit emerging technologies, and mitigating the impact of dark silicon.



Sumeet Kumar Gupta (M'12) received the B.Tech. degree in electrical engineering from IIT Delhi, New Delhi, India, in 2006, and the M.S. and Ph.D. degrees in electrical and computer engineering from Purdue University, West Lafayette, IN, USA, in 2008 and 2012, respectively. He is currently a Monkowski Assistant Professor of Electrical Engineering with The Pennsylvania State University, University Park, PA, USA. His current research interests include low-power variation aware verylarge-scale integration circuit design, nanoelectron-

ics and spintronics, device-circuit co-design, and nanoscale device modeling and simulations.



**Meng-Fan Chang** (M'05–SM'14) received the M.S. degree from The Pennsylvania State University, University Park, PA, USA, and the Ph.D. degree from National Chiao Tung University, Hsinchu, Taiwan. He is currently a Full Professor with National Tsing Hua University, Taiwan. He has was in industry over 10 years.

From 1996 to 1997, he designed memory compilers in Mentor Graphics, NJ, USA. From 1997 to 2001, he designed embedded SRAMs and Flash in Design Service Division with Taiwan Semiconductor

Manufacturing Company Ltd., Hsinchu. From 2001 to 2006, he was a cofounder and a Director of IPLib Company, Taiwan, where he developed embedded SRAM and ROM compilers, flash macros, and flat-cell ROM products. His current research interests include circuit designs for volatile and nonvolatile memory, ultra-low-voltage systems, 3-D-memory, circuit-device interactions, memristor logics for neuromorphic computing, and computingin-memory.

He is the corresponding author of numerous ISSCC, Symposium VLSI Circuits, IEDM and DAC papers. He is an Associate Editor of the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION SYSTEMS, the IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, and *IEICE Electronics*. He has been serving on the technical program committees for ISSCC, IEDM, A-SSCC, ISCAS, VLSI-DAT, and numerous international conferences. He has been a Distinguished Lecture Speaker of the IEEE Circuits and Systems Society. He has been serving as an Associate Executive Director of the Taiwan's National Program of Intelligent Electronics from 2011 to 2016. He received the Academia Sinica (Taiwan) Junior Research Investigators Award in 2012, the Ta-You Wu Memorial Award of the National Science Council (NSC-Taiwan) in 2011. He also received numerous awards from the Taiwan's National Chip Implementation Center, NTHU, MXIC Golden Silicon Awards, and ITRI.



Yongpan Liu (M'07–SM'15) is currently an Associate Professor with Tsinghua University. His current research interests include power design, emerging circuits and systems, and design automation. He has authored or co-authored over 100 papers and led over ten chip design projects for sensing applications, including the first nonvolatile processor (THU1010N) and has received Design Contest Awards from (ISLPED2012, ISLPED2013) and best paper award HPCA2015 and served as TPC in DAC, ASPDAC, and ASSCC.



**Vijaykrishnan Narayanan** (F'11) received the B.S. degree in computer science and engineering from the University of Madras, Chennai, India, in 1993, and the Ph.D. degree in computer science and engineering from the University of South Florida, Tampa, FL, USA, in 1998.

He is currently a Professor of Computer Science and Engineering and Electrical Engineering with The Pennsylvania State University (Penn State), University Park, PA, USA. His current research interests include power-aware and reliable systems,

embedded systems, nanoscale devices, and interactions with system architectures, reconfigurable systems, computer architectures, network-on-chips, and domain-specific computing.

Prof. Narayanan has received several awards, including the Penn State Engineering Society Outstanding Research Award in 2006, the IEEE CAS VLSI Transactions Best Paper Award in 2002, the Penn State CSE Faculty Teaching Award in 2002, the Association for Computing Machinery (ACM) Special Interest Group on Design Automation Outstanding New Faculty Award in 2000, the Upsilon Pi Epsilon Award for Academic Excellence in 1997, the IEEE Computer Society Richard E. Merwin Award in 1996, and the University of Madras First Rank in Computer Science and Engineering in 1993. He has received several certificates of appreciation for outstanding service from ACM and the IEEE Computer Society. He is the Editor-in-Chief of the IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS.



Suman Datta (F'13) was a Professor of Electrical Engineering with The Pennsylvania State University, University Park, PA, USA, from 2007 to 2011. He is currently the Chang Family Chair Professor of Engineering Innovation with the University of Notre Dame, Notre Dame, IN, USA. His current research interests include super-steep slope transistors, phase transition-based transistors, ferroelectric gated transistors, 3-D cross-point memories, coupled dynamical systems, and unsupervised learning systems.