# **DesignCon 2019**

# Enabling 6.4Gbps/pin LPDDR5 Interface using Bandwidth Improvement Techniques

Billy Koo, SAMSUNG Electronics.

Jinho Choi, SAMSUNG Electronics.

Kwanyeob Chae, SAMSUNG Electronics.

Juyoung Kim, SAMSUNG Electronics

## Abstract

New applications such as artificial intelligence, autonomous cars, high performance computing, and embedded vision are driving stricter requirements for memory performance and power efficiency. These demands led the new mobile DRAM technology evolved to the fifth generation (LPDDR5). LPDDR5 delivers significant reduction in power and extremely high bandwidth as compared to LPDDR4. In this paper, we present various bandwidth improvement techniques which enable world's first 6.4 Gbps/pin LPDDR5 interface. The achieved READ and WRITE valid window margin(VWM) at 6.4 Gbps are 0.36 UI and 0.4 UI, respectively. The measured WCK clock duty was within 43~57% at 3.2 GHz including process variation and peak-to-peak periodic jitter was less than 20ps. Also various measured results will be shown as enabling references for other LPDDR5 interface designers.

# **Author's Biographies**

**Billy**(**Kyounghoi**) **Koo** received B.S. and M.S. degrees in electrical engineering from Chungbuk National University, Korea in 1996 and 1998, respectively with focus on high speed I/O transceiver. In 1998, he joined SAMSUNG Electronics where he was designed and developed high-speed peripheral interfaces such as PCI-X, AGP, HSTL, SSTL, LVDS and USB2.0. From 2004 to current, he was responsible for the developing high-speed memory interface circuits for native DDR2/DDR3/DDR4 and mobile LPDDR2/LPDDR3/LPDDR4. He holds over 10 U.S. and foreign patents, and published 6 papers and conference contributions in analog/digital mixed signal design and high-speed interface area.

**Jinho choi** received the B.S. and M.S. degrees in electronics engineering from Chung-Ang University, Seoul, Korea, in 2003 and 2005. Jinho Choi is a senior engineer in Samsung Electronics, where he is working the various memory interface circuit design in the specialization of high-speed I/O interface, low-power CMOS circuit design and analog circuit design.

**Kwanyeob Chae** received the B.S. and M.S. degrees in electronics engineering from Korea University, Seoul, Korea, in 1998 and 2000, respectively, and the Ph.D. degree in electrical and computer engineering from Georgia Institute of Technology, Atlanta in 2013. Kwanyeob Chae is a principal engineer in Samsung Electronics, where he is leading the architecture and implementation part. His research interests include high-speed/low-power digital interface circuits, self-adaptive circuits and systems, variation-tolerant design, and 3-D ICs.

**Juyoung Kim** received the B.S. degrees in computer science from Pusan National University, Busan, Korea, in 2006. Juyoung Kim is a senior engineer in Samsung Electronics, where he is working pre-silicon verification and post-silicon validation in the specialization of memory interface

## Acknowledgements

The author would like to acknowledge and give special thanks to Chanmin Jo for his support in LPDDR5 memory channel modeling and simulation and also writing contained here. Also following individuals whose dedication was invaluable in enabling works: Joyoung Kim for carrying system test and debug with measurement, Sukhyun Jung for FD characterization with VNA measurement, Chan-Min Jo for performing memory off-chip simulation for PI/SI analysis, Gyoungbum Kim for leading improvement of electrical performance in package. Lastly, I'd like to give special appreciation to Sanghune Park who is the technical leader and advisor of electrical council task force for directing the enabling activities.

#### LPDDR5 WCK Clocking Scheme

LPDDR5 DRAM is developed to provide higher data bandwidth with lower power compared with LPDDR4x DRAM. To address this technical challenge, LPDDR5 DRAM operates with reduced supply voltage based on WCK clocking, which is used for write and read clock source. In LPDDR4x, CK was used for read clock source, which has long clock latency. Thus WCK clocking can reduce the read clock network latency and clock power. Fig. shows the block diagram of LPDDR5 which adopts WCK clocking. WCK signals are adopted for the write and the read operation in LPDDR5 like GDDR5/6. In LPDDR4x, as the data rate increases and voltage decrease, the power noise induced jitter due to long CLK to DQS delay has become the dominant factor to limit the high speed operation. In order to minimize clock to DQ delay, WCK signals are transmitted to each byte, meanwhile since WCK signals can be only transmitted during DQ operation to reduce power consumption, WCK2CK synchronization should be executed for domain cross between CK and WCK signals whenever read or write commands are issued.



Figure 1: Top-level block diagram of LPDDR5 WCK clocking

#### LPDDR4/4X/5 Interface Schemes

LPDDR4/4X[1,2] adopted LVSTL(Low Voltage Swing Terminated Logic) interface, the signal swing level is about VDDQ/3. In case of LPDDR4X VDDQ is 0.6V and signal swing level is about 300mV. To reduce the power LPDDR5 use 0.5V VDDQ and swing level is 250mV under VSSQ-TERM condition. Also un-termination condition, the maximum signal swing voltage is limited to VDDQ-Vth. The proposed LPDDR4 driver in Fig. 1 has the multi-VOH level with the termination, which also can handle the VOH drift control in case of the un-termination. Pre-emphasis and slew-rate control scheme are implemented for high speed interface signaling and reduce the EMI for mobile devices.

In the case of VSSQ-TERM condition, signal swing voltage is determined by the following equations.

$$\frac{Vx}{Rterm} = \frac{Vds}{Rds} = \frac{Vds}{\frac{Vds * Ron\_up * 2 * (VDD - Vth)}{(VGS - Vth)^{2}}} = \frac{(VGS - Vth)^{2}}{Ron\_up * 2 * (VDD - Vth)} = \frac{(Vx - VDD + Vth)^{2}}{2 * Ron\_up * (VDD - Vth)}$$

$$Rterm \times Vx^{2} - 2(Rterm(VDD - Vth) + Ron(VDD - Vth))Vx + (VDD - Vth)^{2} * Rterm = 0$$

An NMOS pull-up driver operates in the saturation region like a source follower so no additional current source is required. With this NMOS pull-up driver characteristics non-linear type driver configuration is possible and provides fast driving current. And it is possible to reduce a junction capacitance caused by small size of an NMOS pull-up driver. In the case of a pull-down driver, a driver level converges to VSSQ through the termination resistor (RTERM) and a pull-down driver operates in the linear region, so a pull-down driver contribution to drive small signaling is minor. Fig. 1 shows the various LPDDR4/4X/5 interface configurations. Fig. 2 shows various LPDDR4/4X/5 signaling.



Figure 2: LPDDR4/4X/5 interface signaling

#### **ODT Features in LPDDR5**

To support high density memory, LPDDR5 support 2-rank configuration. However signal integrity(SI) in 2- rank configuration is not good due to the reflection noise. Large reflection is caused in memory package because one DQ pad in a package is shared with each DQ pad of two dies. To prevent reflection noise in 2-rank configuration LPDDR5 memory supports non-target ODT feature which mitigate the reflection noise to improve SI at high frequency operation. Fig. 4 shows various ODT schemes. Fig. 3 shows general 2-rank configuration equivalent circuit. L in equation(1) means pkg. inductor and C is summation of pkg capacitance and memory Cio. R1/R2 means termination resistor value in each memory rank.



Figure 3: Equivalent Circuit of 2-rank configuration



Both ODT Figure 4: Termination topologies

#### **Bandwidth Improvement Techniques**

To increase LPDDR5 interface speed beyond the 6.4 Gbps/pin, various bandwidth improvement techniques and P/SI studies based on channel analysis will be discussed. To achieve over 6.4 Gbps interface speed, ISI minimized tunable equalizer scheme which can support either de-emphasis or pre-emphasis is used in driver side. And also power-efficient CTLE is adopted in receiver side. Area optimized per bit offset calibration scheme is used to improve each bit's READ valid window margin. To improve signal integrity, channel analysis results considering various off-chip conditions including discrete package will also be presented in paper. And based on this channel analysis, we suggest optimal ODT schemes for both DRAM side and controller side. Compared to conventional ODT scheme, proposed ODT scheme resulted in 17% VWM incensement. The measured WCK clock duty was within 43~57% at 3.2 GHz including process variation and peak-to-peak periodic jitter was less than 20ps. Another topic to discuss is low power circuit features. To save interface power and increase power efficiency, we adapted single supply driver scheme instead of conventional LVSTL driver scheme, which required dual power supply for pre-driver and driver. In LVSTL type driver, high voltage domain pre-driver consumes a large amount of interface power, whereas single supply driver scheme needed only VDDQ power supply. By using this single supply driver scheme, substantial pre-driver power can be saved. Based on our system test results, about 50% power reduction can be achieved in controller side and this leads to 20 minutes of battery time increase in Day of Use (DoU) scenario in smart phone. More detailed interface power analysis will be described in the full paper. Using lower threshold device in 10nm FinFET process, VDDQ domain Dynamic Voltage Frequency Scaling (DVFS) scheme is implemented.

## Hybrid pre/de-emphasis Driver Control

To achieve over 6.4Gbps LPDDR5 interface speed various bandwidth improvement techniques are implemented in this paper. Hybrid pre-driver control scheme which can generate pre-emphasis and de-emphasis control signal. Fig. shows pre/de-emphasis driver control scheme. In Fig. 5 equalizer which adopted in driver side in the LPDDR5 controller.



Figure 5: Driver bandwidth improvement techniques

## **Equalizer in Receiver**

To extend receiver bandwidth, we adopt CTLE scheme in receiver side to compensating 1<sup>st</sup> pole with zero of CTLE[6]. Increasing degeneration capacitor (Cs) moves zero and 1<sup>st</sup> pole to lower-frequency without impacting peaking. Also increasing degeneration resistor(Rs) moves zero to lower frequency and increasing peaking but reduces the DC gain. Active CTLE tuning is key point to improve bandwidth and should be considered current consumption. Fig. 6 shows equalizer in receiver side and various simulation results.



Figure 6: Receiver bandwidth improvement technique

#### **Receiver Duty Adjustment Scheme**

The inherent characteristic of DDR (Double Data Rate) interface, the duty ratio of the write and read path is very important because the VWM is directly affected by this duty performance [5]. Especially, the duty of write path should have to be more take care. Since the LPDDR4 LVSTL signaling and VSSQ-TERM ODT does not make the output signal to be a full swing level and the low to high level shifter on the write path can be a major source of duty distortion under various DVFS case. The duty control block is located at the first stage of I/O at Fig. 7 and it consists of two CMOS inverter group which is composed by the 3 different kinds of strength pull up and down transistors separately as Fig. 7. According the each 3 bit pull up and down control, the slope of internal signal node is changed and it can be detected by output signal duty change.



Figure 7: Duty Adjustment scheme

The duty control function should be run within PHY initial training sequence for no performance drop by black out time. So it is necessary to find and save the optimal duty control setting under specific DVFS condition before normal memory write and read procedure.

#### **CI/O Reduction**

In Fig. 8 shows driver/receiver block diagram of proposed LPDDR5 interface in controller side. To achieve over 6.4Gbps signaling, low CI/O is key design factor especially in LPDDR5 interfaces. As signal data rates increase, designers should take into account signal reflection while signal matching becomes more difficult at higher frequencies, due to parasitic at the receiver end. At the channel end of mobile system, receiver is placed with termination resistor to improve signal integrity. Due to a receiver input capacitance at pad, actual channel is terminated by the capacitance (CI/O). Capacitive termination affects two points of view with time constant and signal attenuation. To reduce CI/O value down to sub- pF range, small junction capacitance with termination merged output driver should be considered. According to we described the previous section; the NMOS stack scheme can be achieved with a small size pull-up driver like as a source follower which can provide fast current. But this small size pull-up driver limits the signal swing level. So, the NMOS pull-up driver size should be decided carefully considering both VOH level and CI/O value. To reduce CI/O values in receiver part, we change the receiver power domain from VDD to VDDQ. Due to this power domain changes, low parasitic device can be used for receiver part and which leads to CI/O reduction and minimize the receiver power consumption. And to reduce device parasitic capacitance, we adopt super lower Vth device in driver and receiver part. Compared to previous LPDDR4X IO, CI/O can be reduced from 0.7pF to 0.5pF.



Figure 8: Low CI/O Driver/Receiver scheme

#### **Optimal On-die De-cap. Estimation**

It is very critical to achieve cost-effective PDN design with optimized on-die power delivery network including de-cap and power/ground grids. Also to reduce SSO noise in weak Power Delivery Network (PDN) condition such as POP package environment, Ondie de-cap insertion is considered, but adding on-die de-cap increases the chip size as one of negative sides. An Accurate and fast method for optimizing on-die de-cap is needed. To maintain lower PDN impedance, on-die de-cap optimization was studied by using the worst SSO patterns. Test patterns to make signal degradations more effectively are necessary to allow system test and SI simulation time shorten. Fig. 9 shows the simulated eye opening size under various on-die de-cap values with the combination of Target-ODT(T-ODT) and Non target-ODT(NT-ODT). In our studies, based on SI/PI combined analysis an optimal value of on-die de-cap has been determined.



Figure 9: On-die de-cap effect

## **Test Platform Configuration**

The LPDDR5 platform which was fabricated using 10nm low power CMOS FinFET process achieves 6.4Gbps operation frequency bin. The verification system is shown in Fig. 10 The implemented system includes LPDDR45PHY, memory controller, 128-bit system bus, DMA block, and system control circuits. The test platform used 12x12 529-FCFBGA package to interface single-chip package LPDDR5 component. Maximize the valid window margin under process/voltage/temperature variations, thereby providing reliable memory access, various READ/WRITE valid window margin(VWM) test were performed.



Figure 10: LPDDR5 Test Platform

## **Overall Test Results**

Fig. 11 shows the READ/WRITE VWM test results according to the VDDQ on controller and memory. Test results shown memory operated up to 6.4Gbps/pin with sufficient valid window margin even VDDQ supply voltage is 410mV.



#### **Per-bit VWM Characteristics**

As shown in Fig. 12 per-bit VWM shape and optimal Vref conditions are slightly different. To ensure enough VWM margin, we adopt per-bit calibration such as receiver offset/duty control and impedance calibration. Test and analysis for per-bit characteristics are very important to ensure sufficient VWM.



Figure 12: Per-bit VWM test results @6.4Gbps

## **Duty Adjustment**

Double-data rate(DDR) memory interface, signal duty is one of the most important factors to ensure system performance. Duty management entire signal path from controller to memory should be considered. And duty adjustment function is needed to recover duty distortion due to the system environment. Fig. 13 shows the WRITE mode eye-diagram improvement after enabling adjustment technique.



Figure 13: Duty adjustment results @6.4Gbps

## **Driver Impedance vs. ODT values**

Asymmetric pull-up and pull-down driving strength control scheme[7] was adopted LPDDR5 driver in controller side. Fig. 13 shows the READ/WRITE VWM test results under various driver impedance versus ODT values. In controller side 30ohm driver impedance can be provided and this strong driver impedance result in good WRITE VWM.

| READ VWM @6.4Gbps |         |         |       |  |  |
|-------------------|---------|---------|-------|--|--|
| DS                | NT-ODT  | SOC-ODT | UI(%) |  |  |
| 34                | Disable | 40      | 48.15 |  |  |
| 34                | 240     | 48      | 48.15 |  |  |
| 34                | 120     | 60      | 43.21 |  |  |
| 34                | 80      | 80      | 39.51 |  |  |
| 34                | 60      | 120     | 33.33 |  |  |
| 34                | 48      | 240     | 20.99 |  |  |

| Table 1: VWM | test results | (DS/ODT | split) |
|--------------|--------------|---------|--------|
|--------------|--------------|---------|--------|

| WRITE | VWM | @6.4Gbps |
|-------|-----|----------|
|-------|-----|----------|

| DS | NT-ODT  | T-ODT | UI(%) |
|----|---------|-------|-------|
| 30 | Disable | 40    | 52.76 |
| 30 | 240     | 48    | 53.09 |
| 30 | 120     | 60    | 48.15 |
| 30 | 80      | 80    | 54.32 |
| 30 | 60      | 120   | 55.21 |
| 30 | 48      | 240   | 56.79 |

| 34 | Disable | 48      | 49.38 | 30 | 40      | Disable | 58.02 |
|----|---------|---------|-------|----|---------|---------|-------|
| 34 | 240     | 60      | 42.94 | 34 | Disable | 40      | 49.38 |
| 34 | 120     | 80      | 39.51 | 34 | 240     | 48      | 50.62 |
| 34 | 80      | 120     | 38.27 | 34 | 120     | 60      | 50.62 |
| 34 | 60      | 240     | 23.46 | 34 | 80      | 80      | 51.85 |
| 34 | Disable | 60      | 43.21 | 34 | 60      | 120     | 49.08 |
| 34 | 240     | 80      | 39.51 | 34 | 48      | 240     | 53.09 |
| 34 | 120     | 120     | 38.27 | 34 | 40      | Disable | 55.56 |
| 40 | Disable | 40      | 45.68 | 40 | Disable | 40      | 50.31 |
| 40 | 240     | 48      | 44.44 | 40 | 240     | 48      | 48.15 |
| 40 | 120     | 60      | 40.49 | 40 | 120     | 60      | 48.15 |
| 40 | 80      | 80      | 39.26 | 40 | 80      | 80      | 50.62 |
| 40 | 60      | 120     | 33.13 | 40 | 60      | 120     | 49.38 |
| 40 | 48      | 240     | 18.4  | 40 | 48      | 240     | 53.09 |
| 40 | 40      | Disable | -     | 40 | 40      | Disable | 54.32 |
| 48 | Disable | 60      | 43.21 | 48 | Disable | 60      | 42.94 |
| 48 | 240     | 80      | 39.51 | 48 | 240     | 80      | 42.94 |
| 48 | 120     | 120     | 38.27 | 48 | 120     | 120     | 40.74 |
| 48 | 80      | 240     | 29.63 | 48 | 80      | 240     | 44.44 |
| 48 | 60      | Disable | -     | 48 | 60      | Disable | 41.72 |
| 60 | Disable | 60      | -     | 60 | Disable | 60      | 35.8  |
| 60 | 240     | 80      | 33.33 | 60 | 240     | 80      | 37.04 |
| 60 | 120     | 120     | 29.63 | 60 | 120     | 120     | 39.51 |
| 60 | 80      | 240     | 22.22 | 60 | 80      | 240     | 39.51 |
| 60 | 60      | Disable | -     | 60 | 60      | Disable | 39.26 |

#### DBI on/off

Efforts to reduce high-speed memory interface power have led to the adoption of data bus inversion or bus-invert coding. The LPDDR4 uses VSSQ-TERM termination method to increase interface speed. VSSQ-TERM consumes power when the 'high' level data transferred while no static power consumes when the 'low' level data transfer. When the DBI mode used the number of 'high' data transfer limited in byte data transition. Fig. 14, which presents measured VWM data taken from LPDDR5 DRAM operating at 6.4Gbps. As demonstrated by the VWM reported in the figure, application of the DBI mode can increase timing margin by as much as +10 steps.



## **6.4Gbps Measurement Results**

Efforts to reduce high-speed memory interface power have led to the adoption of data bus inversion or bus-invert coding. The LPDDR4 uses VSSQ-TERM termination method to increase interface





Figure 15: READ eye-diagram @6.4Gbps





Figure 16: WRITE eye-diagram @6.4Gbps

#### **Measurement vs. Simulation Correlation**

Correlation between simulation and measurement is one of the key points to enabling high-speed memory interface system design. Based on our LPDDR2/3/4 enabling history [4,5,7], over 6.4Gbps LPDDR5 interface was also successfully enabled. Fig. 17 shows measured eye-diagram w/ and w/o de-embedding. The de-embedding eye-diagram is well matched our simulation results.



Figure 17: Eye-diagram measurement vs. simulation @6.4Gbps

#### **Corner sample Measurement**

Our 6.4Gbps LPDDR5 interface enabling results included memory and controller corner sample variations. Fig. 18 shows VWM size of different memory and controller corner sample. Even though immature LPDDR5 memory samples have large variations, measured READ/WRITE VWM size is over 0.6UI.



Figure 18: READ/WRITE VWM size

# **Over 6.4Gbps Measurement Results**

To check the maximum operating speed using immature LPDDR5 memory sample, we increase memory's VDDQ voltage from 0.5V to 0.6V and measure the READ/WRITE eye-opening. Fig. 19 shows the 7.2Gbps READ/WRITE eye-diagram.



Figure 19: Corner sample VWM size comparison

## Conclusions

Proposed LPDDR5 interface has backward compatibility to LPDDR4/4X. The achieved READ and WRITE valid window margins at 6.4Gbps are 0.36UI and 0.4UI, respectively. The measured WCK clock duty was within 43~57% at 3.2 GHz and peak-to-peak periodic jitter was less than 20ps. Use of 10nm low power CMOS FinFET process technology and single supply driver scheme resulted in 20% less energy efficiency compared to that of LPDDR4X.

### References

[1] Y.C. Cho, et al. " A sub-1.0V 20nm 5Gb/s/pin Post-LPDDR3 I/O interface with Low Voltage-Swing Termination Logic and Adaptive Calibration Scheme For Mobile Application", VLSI, 2013

[2] T.Y. Oh, "A 3.2Gbps/pin 8Gbit 1.0V LPDDR4 SDRAM with Integrated ECC Engine for sub-1V DRAM Core Operation", JSSC, 2014

[3] K.Y. Chae, "A 690mV 4.4Gbps/pin all-digital LPDDR4 PHY in 10nm FinFET technology", ESSCIRC, 2016.

[4] K.H. Koo, et al. "Robust I/O circuit schemes for world's first over 1.6Gbps LPDDR3" DesignCon 2013.

[5] K.H. Koo, et al. "Versatile IO Circuit Schemes for LPDDR4 with 1.8mW/Gbps/pin Power Efficiency" DesignCon2014.

[6] Srikanth Gondi, et al. "Equalization and Clock and Data Recovery Techniques for 10-Gb/s CMOS Serial-Link Receivers" JSSC, 2007.

[7] K.H. Koo, et al. "Enabling World's first over 4.4Gbps/pin at sub-1V LPDDR4 Interfaces" DesignCon2017.