## DesignCon 2019

# A Comparative Study of Equalization Schemes for 112G PAM4 Links

Yu-Chun Lu, Huawei Technologies <u>yuchun.lu@huawei.com</u> Peng-Chao Zhao, Huawei Technologies <u>zhaopengchao@huawei.com</u> Wei-Yu Wang, Huawei Technologies <u>wangweiyu2@huawei.com</u> Zhi-Lei Huang, Huawei Technologies <u>huangzhilei2@huawei.com</u> Henry Wong, Huawei Technologies <u>Henry.Wong@huawei.com</u> Davide Tonietto, Huawei Technologies Davide.Tonietto@huawei.com

## Abstract

Trade-off between continuous time linear equalizer (CTLE) and analog-to-digital (ADC) designs, digital signal processing (DSP) equalization schemes and stronger forward error correction (FEC) algorithms should be considered in high speed electrical design to overcome the technical challenges at rates of 112Gbps and beyond. This work makes a comparative study of advanced DSP equalization schemes with IEEE 802.3ck&100GEL channels. Performances of four advanced DSP equalization schemes: (1) feed forward equalizer (FFE) and decision feedback equalizer (DFE), (2) FFE, DFE and maximum likelihood sequence estimation (MLSE), (3) FFE and PR (Partial Response), (4) FFE and PR and MLSE will be explored. Advanced DSP equalization schemes and stronger FEC can ease the requirements on CTLE, ADC, and package design and extend the design space of 112G links in order to enhance the performance or reduce the chip area and power.

## Author(s) Biography

**Yu-Chun Lu** joined Huawei Technologies in 2010 as a research engineer. He has been working on the high speed link architecture and modeling and FEC algorithm. He received his B.Eng. degree in communication engineering in 2005 and Ph.D. degree in communication and information system in 2010, both from Beijing Jiaotong University, China. He was a visiting researcher from January 2007 to August 2008 at McMaster University, Canada. His current interests include equalization, CDR, FEC, and simulation methodology. Currently, he researches on 112Gbps technologies for optical and electrical channels and next generation Ethernet.

**Peng-Chao Zhao** joined Huawei Technologies in 2017 as a research engineer. He has been working on the high speed link architecture and modeling. He received his B.S. degree in physics in 2008 from Nanjing University and Ph.D. degree in microelectronics in 2017 from institute of semiconductors, China. He is currently working on 112G technologies for optical and electrical channels.

**Wei-Yu Wang** joined Huawei Technologies in 2017 as a research engineer. She received her B.S. degree in electronic science and technology in 2008 from Xi'an Jiaotong University and Ph.D. degree in physical electronic in 2017 from institute of semiconductors, China. She was a visiting researcher from October 2016 to May 2017 at Universidad Politecnica de Valencia, Spain. Currently, she researches on 112Gbps and beyond Technologies for optical and electrical links.

**Zhi-Lei Huang** joined Huawei Technologies in 2018 as a research engineer. He received his B.S. degree in applied physics in 2013 and Ph.D degree in physical electronic in 2017 from Tsinghua University, China. He was a visiting student from July 2016 to January 2017 at University of California, Berkeley, USA. He is currently working on 112G technologies for optical and electrical channels.

**Henry Wong** received a B.A.Sc. and Ph.D. degrees both in Electrical Engineering and is currently a SerDes Manager in Huawei for system architecture. His past experience includes cable, ADSL modems, and high-speed modem design for Nortel and Lucent. He

had been with Gennum (Semtech) for 10 years in CDR and SerDes design. Currently, he works on 56Gbps SerDes for optical and electrical channels.

**Davide Tonietto** joined Huawei Technologies Canada in 2011 where he is currently Director of SerDes IP development. He has over twenty years of experience in analog and mixed signal design as well as high speed SerDes and signal conditioning IC products and IP. He has held several technical management and design positions in various companies such as Gennum, STmicroelectronics, and Broadcom.

#### 1. Introduction

High-speed electrical design faces technical challenges at data rates of 112Gbps and beyond. Even with the most advanced printed circuit board or cable technology, the insertion loss becomes too high to be conquered. High data rate also presents challenges for the design of packages, connectors, passive channels, and transceivers. Higher order modulation, such as Pulse Amplitude Modulation (PAM) can squeeze the signal bandwidth to match the harsh frequency response of the channel, and PAM-4 is a promising candidate for 112Gbps and beyond links. Advanced digital signals processing (DSP) schemes and advanced forward error correction (FEC) schemes can also help. Advanced DSP and FEC will be a combo technology that needs to be co-designed with continuous time linear equalizer (CTLE) and analog-to-digital converter (ADC) to achieve the most cost-effective solutions. The impact of DFE error propagation effect on the FEC performance has been discussed in [1]. The advanced low power, low complexity and low latency FEC schemes have been discussed in [2]. In this paper, the performance of advanced DSP equalization schemes will be investigated and compared. Joint design of DSP, ADC, CTLE, package as well as FEC will be discussed. It will show that the advanced DSP algorithms can reduce the requirements on ADC, CTLE and package.

In ADC based receiver design, the CTLE and ADC itself take majority proportion of area and power; on the other hand, DSP and FEC are digital logic so as the processing node improves, their area and power will continue to shrink. It is reasonable to shift some of the burden from CTLE and ADC to the DSP and FEC, in order to achieve cost-effective design. Advanced equalization scheme i.e. maximum likelihood sequence estimation (MLSE) and its variants may be used to improve the equalization capability of the receiver without increasing the baud rate. The performance of advanced DSP schemes under different ADC, CTLE designs as well as different packages will be investigated to show that the performance enhancement due to the advanced DSP algorithms can reduce the requirements of ADC, CTLE and package.

The joint design of DSP and FEC is also a very important technical approach to achieve cost-effective solution. Stronger FEC will provide higher coding gain. However the cost-effectiveness is usually questionable because FEC complexity and latency increase; if larger overhead FEC is applied the baud rate and the corresponding channel insertion loss will increase. The DSP performance at two different baud rate, i.e. 53GBd and 56GBd will also be investigated to show the performance improvement when applying a 12% stronger FEC with respect to the regular 6.8% KP4 FEC, i.e. RS(544, 514) FEC with raw BER requirement to be 1e-5. The 12% stronger FEC can be RS(576, 514) or other FEC schemes, which the raw BER requirement is assumed to be 3e-4. Some margin is considered for both 6.8% and 12% FEC to accommodate the impact of burst errors.

Joint analysis and design of DSP and FEC will be critical, the upper bound of the performance of equalizer (i.e. CTLE and DSP) and FEC exists, which is the so called 'Shannon limit'. Reasonable allocation of burden between DSP and FEC is very important so as to achieve the most cost-effective solution, due to their difference in power and chip area. The ultimate solution definitely comes from the joint design and

optimization of DSP and FEC. This means that the DSP not only provides the determined bit stream, but also provides 'soft information' for FEC, and FEC is also designed to use 'soft information' to improve overall performance and get closer to the 'Shannon limit'. The FEC that only utilized the decided bit stream from the equalizer is usually called hard-decision (HD) FEC; the FEC that can further utilized the 'soft information' to further improved the performance is called soft-decision (SD) FEC. Although the SD-FEC is outside the scope of this paper, it is helpful to explore the promising DSP schemes as the first step of DSP and FEC joint design.

This work will investigate the performance of four DSP equalization schemes for 112Gbps PAM4 links, trying to give a thorough guideline for system designers, they are: (1) FFE+DFE, (2) FFE+DFE+MLSE, (3) FFE+PR (partial response), (4) FFE+PR+MLSE. The FFE in the transmitter side is switched off to simplify the analysis; and the CTLE is chosen to make the frequency response as flat as possible and no overshooting appears in the pulse response; a linear phase stepping technology is applied to make sure the sampling phase is locked to the pulse response peak after CTLE. These assumptions will slightly disturb the simulation results away from the optimal, but will not have a significant impact, while the simulation complexity will be greatly simplified. The joint optimization of DSP with TX FFE and CTLE is beyond the scope of this paper. These DSP equalization schemes are compared by simulating measured IEEE 802.3 100GEL and 802.3ck channels [5][6] with RTL-complied DSP models which take into account quantization, adaptation and clock data recovery, etc. Variants such as DSP configurations, CTLE bandwidth, ADC resolution and package under different baud rates will be explored.

The remainder part of this paper is organized as follows: Section 2 gives the illustration of these four equalization schemes; Section 3 provides the simulation parameters, the channels and the packages used in the simulation; Section 4, 5, 6 discuss the joint analysis of DSP with CTLE, ADC as well as package; Section 7 gives the joint analysis of DSP and FEC; Section 8 gives the conclusions.

#### 2. Equalization Schemes

Figure 1 and Figure 2 gives the illustration of four DSP equalization schemes to be explored in this paper, they are: (1) FFE+DFE, (2) FFE+DFE+MLSE, (3) FFE+PR, (4) FFE+PR+MLSE. In the DFE based scheme (1) and (2), the received signals first go through CTLE, and then sampled by ADC. The sampled signal is then equalized by the FFE to remove the pre-cursor ISIs and all the post-cursor ISIs except for post 1.

In the  $(1+\alpha \cdot D)$  schemes, the post 1 FFE tap is turned off (FFE tap weight is set to zero) and post-1 DFE tap is used to cancel the post 1 cursor. In scheme (1) FFE+DFE, the post 1 tap ISI is further removed by the DFE and gives the DFE BER results. In scheme (2), the partially equalized signal at the output of the FFE is fed into the MLSE equalizer to obtain the  $(1+\alpha \cdot D)$  MLSE BER results.

In the (1+D) schemes, the FFE filter the channel to (1+D) format. PAM-4 signals will turn into PAM-7 signals after going through the equivalent (1+D) channel. The PAM-7 symbols are then sliced and decoded by 1/(1+D) decoder into original PAM-4 symbols,

giving the result of (1+D) PR BER. This PR receiver corresponds to scheme (3) FFE+PR. In scheme (4), the PAM-7 signal is fed into MLSE engine to give the FFE+PR+MLSE BER.

Note that in both  $(1+\alpha \cdot D)$  and (1+D) schemes, the MLSE engines can take the FFE output signal only and work in parallel with the DFE and PR, respectively; the MLSE engine can also receive signals from DFE and PR receiver, for example, their decisions, their input or output signals, assertion informations etc., they can be used to improve the DFE and PR receiver performance, so that on-demand MLSE can be implemented to reduce the complexity of the MLSE engine. For example, the DFE error propagation always terminate with signal out-of-range. By detecting the equalized signal of DFE, it is easy to locate the burst error that DFE generates [3]. This signal can be utilized to trigger the MLSE engine only when burst error happens, thus the burst errors can be corrected by the MLSE engine.



Figure 1 Illustration of DFE and  $(1+\alpha \cdot D)$  MLSE equalization schemes.



Figure 2 Illustration of PR and (1+ D) MLSE equalization schemes.

The sampled phase is dynamically controlled by Muller-Muller CDR, a linear phase stepping technology is applied to make sure the sampling phase is locked to the pulse response peak after the CTLE. FFE, DFE are adapted with least mean square (LMS) algorithm. RTL-compliant DSP models are used in the simulation.

The 1/(1+D) decoder can be viewed as simplified 4-bit DFE for PAM-4 receivers, and the complexity and power will reduced tremendously compared to the traditional DFE with 12- or more bits in the calculation. Scheme (3) (1+D) PR receiver replaces the DFE in scheme (1) with 1/(1+D) decoder. The limitation is that the channel has to be filtered to (1+D) format. The simulation results in this paper indicate that even with the (1+D)channel filtering constrains, the scheme (3) FFE+PR receiver has similar performance comparing with scheme (1) FFE+DFE receiver. Hence scheme (3) FFE+PR is a low cost solution of scheme (1) FFE+DFE. As for the performance, since MLSE is introduced, scheme (2) FFE+DFE+MLSE will be better than scheme (1) FFE+DFE; scheme (4) FFE+PR+MLSE will be better than scheme (3) FFE+PR. As for the complexity, scheme (2) FFE+DFE+MLSE will have the largest complexity because it includes both DFE and MLSE; scheme (3) FFE+PR will have the smallest complexity, because it replaces the DFE with PR. Scheme (4) FFE+PR+MLSE generally has larger complexity than scheme (1) FFE+DFE, but it has much higher performance. If the coding scheme and DSP is properly designed, the MLSE can be triggered on demand, the complexity of MLSE engine in scheme (2) and (4) will reduce tremendously. One example to achieve ondemand MLSE is to use the signal out-of-range of DFE [3]. The MLSE engine can be triggered by this signal to correct the burst errors generated by the DFE. There are also other ways to achieve low complexity MLSE, e.g. block MLSE by introducing termination bits as shown in IEEE802.3bj KP4 standard [4]. Considering the balance between complexity and performance, scheme (1), (3) and (4) are focused in this paper. Scheme (1) FFE+DFE is the reference DSP design and performance benchmark; scheme (3) FFE+PR is the low complexity solution; and scheme (4) FFE+PR+MLSE is the high performance solution.

#### 3. Simulation Parameters, Channels, and Packages

Table 1 gives the parameters that used in the simulations. Some effects for example the transmitter noise, transmitter rise/fall time, nonlinearity, Duty cycle distortion (DCD) and Skew, etc. are not considered in the simulation. 400mV ADC input swing with 7bit and 5bit ADC resolution are considered for ADC quantization noise; 1mV thermal noise is considered as background noise. 100 part per million (PPM) frequency difference between transmitter and receiver clock is applied; random jitter scaled from a power spectrum density (PSD) of measured 13.07GHz clock is used in the simulation; the CDR loop bandwidth is ~5MHz. Pre-coding is turned-off. Two packages are chosen in this investigation, #1 is 25mm long with Cd=13fF, Cp=110fF, #2 is 30mm long with Cd=130fF, Cp=110fF. They represent the good and bad packages, respectively. The crosstalk is aggregated by choosing the sampling phase that minimize the crosstalk noise RMS. And the crosstalk noise will be scaled up and down to get different integrated crosstalk noise (ICN) settings to simulate the BER versus ICN curves and investigate the ICN tolerance. CTLE is chosen to make the frequency response as flat as possible and no over-shooting appears in the pulse response. Ten million PAM-4 symbols are simulated in order to get reliable Monte Carlo BER results larger than 1E-6. Totally 16 tap FFE are used in the simulation, and 3 pre-taps are allocated for scheme (1) and (2); 5 pre-taps are

allocated for scheme (3) and (4). Only 1 tap DFE is considered in the simulation. We found 16 FFE taps is good enough, because further increase of FFE tap number does not provide notable change in the simulation results, and the FFE tap weights beyond 16 FFE taps are small and negligible.

| Parameters                           | Values                                                         |                            |  |  |  |
|--------------------------------------|----------------------------------------------------------------|----------------------------|--|--|--|
| Transmitter Noise                    | Not considered                                                 |                            |  |  |  |
| Transmitter Rise/Fall Time           | Not considered                                                 |                            |  |  |  |
| ADC resolution                       | 7 bits/5 bits                                                  |                            |  |  |  |
| ADC input swing                      | 400mV                                                          |                            |  |  |  |
| Thermal Noise RMS                    |                                                                | 1mV                        |  |  |  |
| Nonlinearity                         | Not c                                                          | considered.                |  |  |  |
| Random Jitter                        | Scaled from measured random jitter PSD of 13.07GHz clo         |                            |  |  |  |
| Clock Frequency Difference           | 100 PPM                                                        |                            |  |  |  |
| Duty cycle distortion (DCD) and Skew | 0ps, not considered                                            |                            |  |  |  |
| CDR loop bandwidth                   | ~4.58MHz @53.125GBd; ~4.87MHz @56.500GBd                       |                            |  |  |  |
| Limit Cycle Jitter                   | Loop filter                                                    | latency is 640UI           |  |  |  |
| Package Model                        | #1: L=25mm, Zc=90Ohms,                                         | #2: L=30mm,Zc=900hms,      |  |  |  |
| I ackage widdei                      | Cd=13fF, Cp=110fF                                              | Cd=130fF, Cp=110fF         |  |  |  |
| Cross-talk Aggregation               | Sampling phase with mi                                         | nimum noise RMS is chosen. |  |  |  |
| Pre Coding                           |                                                                | Off                        |  |  |  |
| CTLE optimization                    | Optimized f                                                    | for DFE receivers.         |  |  |  |
| FFE tap number                       | 16                                                             |                            |  |  |  |
| FFE pre tap number                   | 3 for DFE & (1+ $\alpha$ ·D) MLSE; 5 for PR & (1+D) MLSE       |                            |  |  |  |
| DFE tap number                       | 1 for DFE & $(1+\alpha \cdot D)$ MLSE; 0 for PR & $(1+D)$ MLSE |                            |  |  |  |
| Simulation Time                      | 10 million                                                     | PAM-4 symbols              |  |  |  |
| Pattern                              | P                                                              | RBS31                      |  |  |  |

Table 1 Simulation Parameters.

The configuration in Table 1 is just a reasonable one, it does not represent the worst case, best case or typical case. Some of the impairments are ignored to simplify the analysis. However, all the equalization algorithms are compared under the same configuration, thus, these relatively compared results are meaningful and can provide a lot of insights.

Seven selected channels from IEEE 802.3 100GEL and 802.3ck [5][6] are investigated in this work to cover the main scenarios. Channel #1, #2, #3 and #4 are cable backplane, direct attached cable (DAC), orthogonal mid-plane, and host backplane, respectively. Channel #5, #6, and #7 are backplane channels with different trace length. They are used to investigate the crosstalk tolerance and impact of stronger FEC with 12% overhead. Table 2 gives the channel insertion loss and crosstalk noise RMS of the seven channels. Figure 3 shows the frequency response curves of the channels. The frequency response of channel #3 is more linear and smooth compared with the other three channels, there is no ripples and notches between 30 and 40GHz in frequency response of channel #3, and the maximum frequency in this measured s-parameter is 40GHz. The original s-parameter files can be downloaded from the IEEE 802.3 100GEL and 802.3ck webpages [5][6].

| #  | Channel                                                                                | Insertion I | Loss (dB) | Cros<br>RMS* | stalk<br>• (mV) |      |  |  |  |  |
|----|----------------------------------------------------------------------------------------|-------------|-----------|--------------|-----------------|------|--|--|--|--|
|    |                                                                                        | @26.5625GHz | @28.25GHz | Min          | Max             | (dB) |  |  |  |  |
| 1  | tracy_100GEL_05_0118                                                                   | -23.71      | -26.98    | 0.5          | 0.66            | 1.5  |  |  |  |  |
| 2  | mellitz_100GEL_adhoc_02_021218                                                         | -27.05      | -28.89    | 1.22         | 1.49            | 0.57 |  |  |  |  |
| 3  | zambell_100GEL_02_0318                                                                 | -28.63      | -30.18    | 0.23         | 0.39            | 0.27 |  |  |  |  |
| 4  | heck_100GEL_85ohm_nom_01_011718                                                        | -30.2       | -33.35    | 1.28         | 1.58            | 2.29 |  |  |  |  |
| 5  | mellitz_100GEL_adhoc_02_010318                                                         | -16.34      | -18.27    | 2.24         | 2.83            | 1.24 |  |  |  |  |
| 6  | mellitz_100GEL_adhoc_03_010318                                                         | -26.03      | -27.61    | 1.64         | 2.14            | 1.48 |  |  |  |  |
| 7  | mellitz_100GEL_adhoc_04_010318                                                         | -31.01      | -32.7     | 1.62         | 2.11            | 1.61 |  |  |  |  |
| *  | * 1 Vpp, PAM4, time domain simulation @ 53.125GBd. "Min" and "Max" values indicate the |             |           |              |                 |      |  |  |  |  |
| ph | ase of aggregation for each crosstalk aggre                                            | egator.     |           |              |                 |      |  |  |  |  |

Table 2 Insertion loss and crosstalk RMS of the four channels under investigation.



Figure 3 Frequency response of the seven channels.



Figure 4 Frequency response of package #1 and #2.

The impact of packages will be discussed in section 6. Two packages modified from 802.3cd COM package model are investigated [7][8]. The parameters of the two package channels are listed in Table 1. The frequency response of the two package channels are shown in Figure 4. The frequency points of 26.5625GHz and 28.25GHz are marked with circles. The insertion loss of package #1 at 26.5625GHz and 28.25GHz are -3.68dB and - 3.67dB and that of package #2 are -5.72dB and -4.66dB, respectively. Table 3 gives the insertion loss of the four channels with package #1 and #2 at 26.5625GHz. Package #1 is

| CH           | [ #1   |               | CH #2              |        |        |  |  |
|--------------|--------|---------------|--------------------|--------|--------|--|--|
| IL (dB)      | Pkg #1 | <b>Pkg #2</b> | IL (dB)            | Pkg #1 | Pkg #2 |  |  |
| Ball-to-Ball | -23    | .71           | Ball-to-Ball       | -27    | .05    |  |  |
| Bump-to-Bump | -31.07 | -35.16        | Bump-to-Bump       | -34.41 | -38.5  |  |  |
| CH           | [ #3   |               | CH                 | [ #4   |        |  |  |
| IL (dB)      | Pkg #1 | Pkg #2        | IL (dB)            | Pkg #1 | Pkg #2 |  |  |
| Ball-to-Ball | -28    | .63           | Ball-to-Ball -30.2 |        | 0.2    |  |  |
| Bump-to-Bump | -35.99 | -40.08        | Bump-to-Bump       | -37.6  | -41.65 |  |  |

chosen as baseline of package model in this investigation. Package #1 is considered as good package and package #2 is considered as bad package.

Table 3 Insertion loss of the four channels with two different packages at 26.5625GHz.

CTLE is chosen according to a rule-of-thumb: making the frequency response as flat as possible and no over-shooting appears in the pulse response. Figure 5 gives the frequency and pulse responses of the channel #1~#4 as well as the corresponding CTLE that chosen by the rule-of-thumb. The CTLE scale factor a=1.0 and package #1 are used in Figure 5, other settings will generate similar curves. Table 5 gives the CTLE settings of the channels investigated in this paper, the CTLE model is defined in IEEE802.3cd [7][8].



Figure 5 Frequency/pulse response and corresponding CTLE.

In the  $(1+\alpha \cdot D)$  schemes, the post 1 FFE tap is turned off (FFE tap weight is set to zero) and post-1 DFE tap is used to cancel the post 1 cursor. The sampling phase is locked to the pulse response peak after the CTLE by a linear phase stepping algorithm. Hence the ratio of post 1 cursor over the main cursor ratio  $\alpha$  dependents on the frequency response of the channel and the CTLE. Table 4 is the post 1 cursor over main cursor ratio  $\alpha$  of the channels investigated in this paper.

|                |       |                             | Pa             | ckage | #1   |      |      | Package #2 |       |      |      |  |
|----------------|-------|-----------------------------|----------------|-------|------|------|------|------------|-------|------|------|--|
| Scale factor a | Alpha |                             |                |       |      |      |      |            | Alpha |      |      |  |
|                | CH1   | CH1 CH2 CH3 CH4 CH5 CH6 CH7 |                |       |      |      |      |            | CH2   | CH3  | CH4  |  |
| 0.6            | 0.21  | 0.25                        | 0.26           | 0.26  | 0.13 | 0.21 | 0.25 | 0.24       | 0.27  | 0.29 | 0.28 |  |
| 0.8            | 0.17  | 0.22                        | 0.22 0.19 0.25 |       |      |      |      |            |       |      |      |  |
| 1              | 0.15  | 0.2                         | 0.21           | 0.23  | N.A. |      |      |            |       |      |      |  |

Table 4  $\alpha$  obtained from the adaptation algorithm in (1+ $\alpha$ ·D) MLSE simulations

#### 4. Joint Analysis with CTLE Bandwidth

The crosstalk tolerance of the four receivers under different CTLE bandwidth are investigated in this section. All the simulations are based on 53.125GBd, 7-bit ADC, and package #1. A scaling factor 'a' is introduced in the IEEE 802.3cd CTLE formula to adjust the CTLE bandwidth, the CTLE parameters are choose as follows and the detailed CTLE settings are given in Table 5:

- 1. High-frequency CTLE:  $f_{p1}=f_z=a \cdot f_b/2.5$ ;  $f_{p2}=a \cdot f_b$ ;  $g_{DC}$  from -20 to 0 dB;
- 2. Low-frequency CTLE:  $f_{LF}=a \cdot f_b/80$ ;  $g_{DC2}$  from -6 to 0 dB;
- 3. a = 0.6, 0.8, and 1.0.

|      |                |                  | Packa            | nge #1           |                  |                  | Packa            | ige #2           | ,<br>,    |
|------|----------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|-----------|
| CH # | Scaling factor | 53.12            | 25Gb             | 56.5             | Gbau             | 53.12            | 25Gb             | 56.5             | Gbau      |
|      | (a)            | g <sub>DC1</sub> | g <sub>DC2</sub> | g <sub>DC1</sub> | g <sub>DC2</sub> | g <sub>DC1</sub> | g <sub>DC2</sub> | g <sub>DC1</sub> | $g_{DC2}$ |
|      | 0.6            | -12              | -3               | -13              | -2               | -12              | -3               | -13              | -2        |
| 1    | 0.8            | -14              | -3               | -14              | -3               |                  | NT               | ٨                |           |
|      | 1              | -15              | -3               | -16              | -3               | N.A.             |                  |                  |           |
|      | 0.6            | -14              | -2               | -14              | -3               | -14              | -2               | -14              | -3        |
| 2    | 0.8            | -15              | -3               | -16              | -3               |                  | NI               | ٨                |           |
|      | 1              | -16              | -4               | -17              | -3               |                  | N.A.             |                  |           |
|      | 0.6            | -12              | -2               | -12              | -3               | -12              | -2               | -12              | -3        |
| 3    | 0.8            | -13              | -3               | -13              | -3               |                  | N                | ٨                |           |
|      | 1              | -14              | -4               | -15              | -3               |                  | N.               | А.               |           |
|      | 0.6            | -11              | -2               | -12              | -2               | -12              | -2               | -12              | -2        |
| 4    | 0.8            | -13              | -2               | -12              | -2               |                  |                  |                  |           |
|      | 1              | -14              | -3               | -12              | -2               |                  |                  |                  |           |
| 6    | 0.6            | -9               | -2               | -9               | -2               |                  | N.               | A.               |           |
| 7    | 0.6            | -12              | -2               | -12              | -2               | ]                |                  |                  |           |
| 7    | 0.6            | -13              | -3               | -13              | -3               |                  |                  |                  |           |

Table 5 IEEE 802.3cd CTLE settings of the channels under investigation.

Figure 6 shows the CTLE frequency response with scaling factor 'a' equals to 0.6, 0.8 and 1.0. The CTLE peak frequency is about 0.36fb, 0.49fb, and 0.61fb for 'a' equals to 0.6, 0.8, 1.0, respectively, where fb is the baud rate. The CTLE peak frequency at 53.125GBd and 56.5GBd are shown in Table 6. There is trade-off between the CTLE bandwidth and the boost gain. Higher boost gain needs more CTLE-stages which will limit the bandwidth due to parasitic poles. So the higher bandwidth requirement of CTLE will bring more difficulty in the implementation. It is meaningful to use advanced DSP to ease the bandwidth requirement of CTLE.



Figure 6 Frequency response of CTLE at a=0.6, 0.8 and 1.0.

| Seeling Factor | <b>CTLE Peak</b>    | Frequency | Implementation |
|----------------|---------------------|-----------|----------------|
| Scaling Factor | 53.125GBd 56.500GBd |           | difficulty     |
| a=0.6          | 19.34GHz            | 20.57GHz  | Normal         |
| a=0.8          | 25.78GHz            | 27.42GHz  | Hard           |
| a=1.0          | 32.23GHz            | 34.28GHz  | Very Hard      |

Table 6 CTLE peak frequency at a=0.6, 0.8, 1.0.

Table 7 gives crosstalk tolerance results of the four channels at BER of 1E-4 under scaling factor 'a' equals to 0.6, 0.8, 1.0. The cross talk tolerance is the tolerable ICN value at a specific BER of the "BER versus ICN" curves which are obtained by scaling the crosstalk responses up and down. Figure 7 visualized the data in Table 7. We can summarized the impact of CTLE bandwidth as follows:

- For channels with ball-to-ball insertion loss <28dB (CH #1), increased CTLE bandwidth will impair performance for all the four receivers;
- For channels with ball-to-ball insertion loss around 28~30dB (CH #2 and CH #3), the increase of CTLE bandwidth does not benefit performance for all the four receivers;
- For channels with ball-to-ball insertion loss >30dB (CH #4), increase of CTLE bandwidth can benefit performance for PR and DFE receivers;
- The performance of (1+D) MLSE is always better than (1+ $\alpha$ ·D) MLSE in all the CTLE bandwidth.

| CH  | #1 Cro                                       | sstalk | Toleranc                              | e (mV)        | CH       | #2 Cro | sstalk | Toleranc                              | e (mV)        |  |
|-----|----------------------------------------------|--------|---------------------------------------|---------------|----------|--------|--------|---------------------------------------|---------------|--|
|     |                                              | BER    | @1E-4                                 |               | BER@1E-4 |        |        |                                       |               |  |
| a   | DFE                                          | PR     | $(1+\alpha \cdot \mathbf{D})$<br>MLSE | (1+D)<br>MLSE | a        | DFE    | PR     | $(1+\alpha \cdot \mathbf{D})$<br>MLSE | (1+D)<br>MLSE |  |
| 0.6 | 6.47                                         | 6.8    | 7.63                                  | 9.12          | 0.6      | 3.78   | 3.7    | 4.45                                  | 5.01          |  |
| 0.8 | 5.86                                         | 6.1    | 6.8                                   | 7.98          | 0.8      | 3.61   | 3.6    | 4.41                                  | 4.86          |  |
| 1.0 | 1.0 5.33 6.2 6.15 6.84 1.0 3.62 3.5 4.13 4.6 |        |                                       |               |          |        |        |                                       | 4.64          |  |
| CH  | #3 Cro                                       | sstalk | Toleranc                              | e (mV)        | CH       | #4 Cro | sstalk | Toleranc                              | e (mV)        |  |
|     |                                              | BER    | @1E-4                                 |               |          |        | BER    | @1E-4                                 |               |  |
|     | DFE                                          | пп     | (1+α·D)                               | (1+D)         |          | DFE    | PR     | (1+α·D)                               | (1+D)         |  |
| a   | a DFE PR MLSE ML                             |        | MLSE                                  | a             | DLF      | ГN     | MLSE   | MLSE                                  |               |  |
| 0.6 | 5.14                                         | 4.6    | 6.28                                  | 6.38          | 0.6      | 1.54   | 2.6    | 3.02                                  | 4.26          |  |
| 0.8 | 5.21                                         | 4.8    | 6.27                                  | 6.48          | 0.8      | 2.52   | 2.9    | 3.68                                  | 4.46          |  |
| 1.0 | 5.26                                         | 4.8    | 6.25                                  | 6.56          | 1.0      | 2.65   | 2.9    | 3.64                                  | 4.38          |  |

Table 7 Crosstalk tolerance performance at different CTLE bandwidth.



Figure 7 CTLE scale factor a vs. crosstalk tolerance of 53.125Gbaud PAM4 channels.

Note that the boundary value of insertion loss i.e.  $28dB \sim 30dB$  may vary slightly under different simulation conditions, however, we believe the trend it gives is true. Low bandwidth CTLE (e.g. a=0.6) may be a better choice because they have better performance for <28dB channels; small impact for 28dB  $\sim$ 30dB channels; easy for implementation, low power and area. In the remainder part of this paper, a=0.6 will be chosen as a baseline for the rest of the simulations.

Figure 8 compares the ICN vs. BER results of the four receivers under the cases of a=0.6, 53.125Gbaud. The performance ranking follows DFE  $\leq$  PR < (1+ $\alpha$ ·D) MLSE < (1+D) MLSE order except for channel #3. The reason for CH #3 exception is because the channel insertion loss curve is so linear and the maximum frequency of the s-parameter is

only 40GHz, so that the CTLE, FFE, and DFE are optimal for this channel, the (1+D) MLSE receiver architecture does not give more crosstalk tolerance margin in this case. However, the ripples and notches are inevitable in real channels. These ripples and notches will transfer to residue ISI that can be viewed as background noise. The results of CH #1, 2, and 4, show that the noises can be better handled by (1+D) MLSE receiver. The trend of a=0.8 and 1.0 cases are the same as a=0.6. The performance of (1+D) PR is equal or better than DFE, while the (1+D) MLSE is always better than (1+ $\alpha$ ·D) MLSE with the same CTLE bandwidth. The performance of (1+ $\alpha$ ·D) MLSE receiver is always worse than the (1+D) MLSE receiver while the implementation difficulty is the same as that of (1+D) MLSE receiver. The DFE, (1+D) PR and (1+D) MLSE receivers (corresponding to scheme 1, 3 and 4) will be focused in the following discussions.

In summary, the simulation results show that lower bandwidth CTLE with peaking frequency of ~20GHz may be a better choice for 112G PAM4 links. Higher CTLE bandwidth will not improve the performance for noise dominant channels. For ISI dominant channels, the high bandwidth of CTLE may slightly improve the performance, but needs more power and area. On the other hand, we can achieve same or even better performance with low bandwidth CTLE by using advanced DSP algorithms, which may save the overall power and area.



Figure 8 Integrated crosstalk noise vs. BER of 53.125Gbaud PAM4 channels at a=0.6.

#### 5. Joint Analysis with ADC Resolution

ADC-based receivers are currently being proposed in high-speed serial link applications to enable flexible, complex, and robust digital equalization to support high insertion loss channels. However, the power dissipation of ADC is a major concern [9]. The ADC takes a large proportion of power and area of DSP base receivers. It cost more than 80% power at 10Gbps NRZ [10] and more than 50% power at 56Gbps PAM4 with 16nm FinFET technology [11]. Power and area can be greatly reduced for every bit reduction of ADC. It is reported that 6-bit ADC only consumes ~60% power and 21% area of 8-bit ADC [12]. However, the ADC resolution impacts the quantization error added to the signals. Quantization error will be doubled for every bit reduction in ADC resolution. There is a trade-off between the area, power and the ADC resolution. The MLSE receiver is more resistant to noise, so it can help reduce ADC resolution without sacrificing performance.

| CH#1   | Crosstalk | Tolerence | e. (mV) | CH#2     | Crosstalk | Tolerence | e (mV) |  |
|--------|-----------|-----------|---------|----------|-----------|-----------|--------|--|
|        | BER       | @1E-4     |         | BER@1E-4 |           |           |        |  |
| ADC    | DFE       | PR        | (1+D)   | ADC      | DFE       | PR        | (1+D)  |  |
|        |           |           | MLSE    |          |           |           | MLSE   |  |
| 7 bits | 6.47      | 6.84      | 9.19    | 7 bits   | 3.76      | 3.66      | 5.01   |  |
| 5 bits | 4.9       | 4.62      | 7.61    | 5 bits   | 2.52      | 1.93      | 3.94   |  |
| CH#3   | Crosstalk | Tolerence | e (mV)  | CH#4     | Crosstalk | Tolerence | e (mV) |  |
|        | BER       | @1E-4     |         |          | BER       | @1E-4     |        |  |
|        | DEE       | DD        | (1+D)   |          | DEE       | DD        | (1+D)  |  |
| ADC    | DFE       | PR        | MLSE    | ADC      | DFE       | PR        | MLSE   |  |
| 7 bits | 5.14      | 4.63      | 6.38    | 7 bits   | 1.54      | 2.6       | 4.26   |  |
| 5 bits | 3.13      | 1.75      | 4.52    | 5 bits   | -         | -         | -      |  |

Table 8 Crosstalk tolerance performance at 7-bit and 5-bit ADC.

In this section, we will investigate the impact of ADC resolution to the three receivers, i.e. DFE, PR, and PR+MLSE receivers, and compare the performance under 7-bit and 5-bit ADC. All the simulations are based on parameters of 53.125GBd, CTLE bandwidth scaling factor 'a' equals to 0.6, and package #1. Table 8 gives the crosstalk tolerance of the three receivers under 7-bit and 5-bit ADC. The performance of 5-bit ADC is worse than that of 7-bit ADC in all cases, because more quantization noise is introduced for 5-bit ADC resolution. The (1+D) MLSE can compensate the performance degradation due to the ADC resolution reduction.



Figure 9 Crosstalk tolerance of 53.125Gbaud PAM4 channels with 7-bit ADC.

Figure 9 shows the BER versus ICN of 53.125GBd PAM4 channels with 7-bit ADC. The performance ranking follows  $DFE \le (1+D) PR < (1+D) MLSE$  except for CH #3 for the same reason as discussed in Section 4. The performance of DFE and (1+D) PR are almost the same in channels #1 and #2. The performance of (1+D) PR is much better than that of DFE in high loss channel of CH #4. The performance of (1+D) MLSE is much better than DFE and PR, and the improvement becomes more obvious in high loss channels of CH #4.



Figure 10 Crosstalk tolerance of 53.125Gbaud PAM4 channels with 5-bit ADC

Figure 10 shows the crosstalk tolerance performance of 53.125GBd PAM4 channels with 5-bit ADC. The performance ranking follows (1+D) PR < DFE < (1+D) MLSE except for CH #4. It is noticed that CH#4 fails to meet the BER of 1E-4, because the insertion loss of CH #4 is too high, when the ADC resolution reduce from 7 to 5, the introduced quantization noise will make BER error floor to be higher than 1E-4. In the case of CH #4 under 5-bit ADC, stronger FEC may further improve the performance by suppressing the error floor.



Figure 11 Crosstalk tolerance of 7-bit and 5-bit ADC comparison.

Figure 11 shows the performance comparison of DFE, (1+D) PR receivers with 7-bit ADC and (1+D) MLSE receiver with 5-bit ADC. The (1+D) PR receiver has similar performance compared with DFE receiver under 7-bit ADC. The (1+D) MLSE can help to reduce the ADC resolution from 7 to 5, while the crosstalk tolerance does not degrade. It works for CH #1, #2, and #3, but does not work for CH #4 because the insertion loss is too high and beyond the capability of MLSE equalizer. The error floor problem of CH #4 may be solved by introducing stronger FEC. Reducing ADC resolution from 7 to 5, the ADC power can be saved significantly if (1+D) MLSE receiver is applied compared with traditional DFE receiver.

In summary, with the same ADC resolution, PR receiver can be used to replace DFE receiver to achieve the same performance with lower power and complexity. With the help of advanced MLSE DSP, the ADC resolution can be reduced and reserve the crosstalk tolerance performance as long as the insertion loss is within the equalization capability of MLSE. The ADC area and power can be reduced significantly with advanced (1+D) MLSE DSP.

### 6. Joint Analysis with Package

Table 9 gives the crosstalk tolerance performance of the four receivers with two packages channels under 53.125Gbaud, CTLE scaling factor a=0.6 and 7-bit ADC. The parameters of package #1 and #2 are as given in Table 1, the two package trace length are 25 mm and 30 mm, the pad capacities ( $C_d$ ) are 13fF and 130fF, respectively. The frequency response are given in Figure 4.

Figure 12 gives the crosstalk tolerance comparison between three receivers with two package channels. We can summarized as follows:

- 1) The performance of receivers with package #2 will be always worse than that with package #1 in all four channels, because larger Cd introduce more insertion loss and reflections. Only the DFE receiver crosstalk tolerance curve for package #1 is shown in Figure 12.
- 2) The performance of PR receiver with the same package is similar with that of DFE receiver in channel #1, #2, and #3, and is even better in channel #4. Only crosstalk tolerance curves for package #2 are shown in Figure 12. The trend is the same with package #1.
- 3) The performance of (1+D) MLSE with package #2 (bad package) is better than that of DFE with package #1 (good package) in channel #1, #2, #4. The reason of channel #3 exception is the same as described in Section 4, the frequency response of channel #3 has not ripples and notches and the maximum frequency is 40GHz, which minimize the difference of DFE receiver and (1+D) MLSE receiver.
- 4) With the same package, advanced DSP like MLSE can increase the channel operating margin. In some stringent channels like channel #4, MLSE is essential to get a target performance;
- 5) With the same performance target, advanced DSP like MLSE can ease the challenge of packaging: trace length, Cd, Cp, impedance continuity, etc.

| CH#1 0 | Crosstalk | Tolerand | e (mV)        | CH#2 Crosstalk Tolerance (mV) |           |          |        |  |  |
|--------|-----------|----------|---------------|-------------------------------|-----------|----------|--------|--|--|
|        | BER       | @1e-4    |               | BER@1e-4                      |           |          |        |  |  |
| Pkg    | DFE       | PR       | (1+D)<br>MLSE | Pkg DFE PR (1<br>M            |           |          |        |  |  |
| #1     | 6.47      | 6.84     | 9.19          | #1                            | 3.76      | 3.66     | 5.01   |  |  |
| #2     | 4.94      | 5.47     | 8.17          | #2                            | 2.52      | 2.45     | 3.93   |  |  |
| CH#3 ( | Crosstalk | Toleranc | e (mV)        | CH#4 (                        | Crosstalk | Tolerand | e (mV) |  |  |
|        | BER       | @1e-4    |               |                               | BER       | @1e-4    |        |  |  |
| Pkg    | DFE       | PR       | (1+D)<br>MLSE | Pkg DFE PR (1+I<br>MLS        |           |          |        |  |  |
| #1     | 5.14      | 4.63     | 6.38          | #1                            | 1.54      | 2.6      | 4.26   |  |  |
| #2     | 2.92      | 2.81     | 4.74          | #2                            | -         | -        | 2.71   |  |  |

Table 9 Crosstalk tolerance performance of channels with two packages.



Figure 12 Crosstalk tolerance performance comparison with two package channels.

| N   | N K T M | тм | тм  | ОН     | Coding ( | Gain (dB) | Net Coding | g Gain (dB) | <b>BERin Re</b> | quirement |
|-----|---------|----|-----|--------|----------|-----------|------------|-------------|-----------------|-----------|
|     | N       | I  | IVI | On     | @1e-13   | @1e-15    | @1e-13     | @1e-15      | @1e-13          | @1e-15    |
| 528 | 514     | 7  | 10  | 2.72%  | 5.5      | 5.87      | 5.39       | 5.76        | 3.92E-05        | 2.18E-05  |
| 544 | 514     | 15 | 10  | 5.84%  | 6.79     | 7.25      | 6.55       | 7.01        | 3.10E-04        | 2.26E-04  |
| 560 | 514     | 23 | 10  | 8.95%  | 7.49     | 7.99      | 7.12       | 7.61        | 7.61E-04        | 6.07E-04  |
| 576 | 514     | 31 | 10  | 12.06% | 7.97     | 8.49      | 7.48       | 7.99        | 1.31E-03        | 1.10E-03  |

## 7. Joint Analysis with FEC

Table 10 Ethernet RS FEC code family.

In this section, the DSP performance at two baud rates of 53.125GBd and 56.5GBd will be investigated with channel #5, 6, and 7. The only difference of channel #5, 6, 7 is the insertion loss as shown in Table 2 and Figure 3. In this analysis 53.125GBd data rate use RS(544, 514) FEC which requires 1e-5 raw BER while 56.5GBd data rate uses a higher gain FEC with 12% overhead, and assume it requires 3e-4 raw BER. Table 10 gives the Ethernet RS FEC code family, RS(544, 514) FEC requires 2.26e-4 input BER to achieve output BER of 1e-15; the 12% stronger FEC can be RS(576, 514) or concatenated FEC which requires 1.3e-3 input BER to achieve output BER of 1e-15.

Raw BER of 1e-5 for 6.8% FEC and 3e-4 for 12% FEC is assumed to derive the crosstalk tolerance, hence some margin is consider for both 6.8% and 12% FEC to accommodate the impact of burst errors,.

Table 11, Table 12, and Table 13 give the crosstalk tolerance with raw BER requirements of these two FECs with package #1, CTLE factor 'a' equals to 0.6 and 7-bit ADC with thermal noise RMS equals to 1mV, 2mV and 3mv, respectively.

| Crosstalk Tol. (mv) |      | CH # | <i>‡</i> 5    | CH #6 |      |               | CH #7 |      |               |
|---------------------|------|------|---------------|-------|------|---------------|-------|------|---------------|
| BaudRate            | DFE  | PR   | (1+D)<br>MLSE | DFE   | PR   | (1+D)<br>MLSE | DFE   | PR   | (1+D)<br>MLSE |
| 53.125G@1E-5        | 8.56 | 8.28 | 13.57         | 3.73  | 3.81 | 5.99          | 1.93  | 2.51 | 3.74          |
| 56.5G@3E-4          | 9.04 | 9.86 | 13.57         | 4.36  | 4.42 | 5.95          | 2.42  | 3.02 | 3.9           |

Table 11 Crosstalk tolerance with two FEC implementation at thermal noise=1mV.

| Crosstalk Tol. (mv) |      | CH #5 |               |      | CH # | <sup>±</sup> 6 | CH #7 |      |               |
|---------------------|------|-------|---------------|------|------|----------------|-------|------|---------------|
| BaudRate            | DFE  | PR    | (1+D)<br>MLSE | DFE  | PR   | (1+D)<br>MLSE  | DFE   | PR   | (1+D)<br>MLSE |
| 53.125G@1E-5        | 8.13 | 7.53  | 12.39         | 3.03 | 3.08 | 5.55           | 0.87  | 1.14 | 3.36          |
| 56.5G@3E-4          | 8.59 | 9.07  | 13.34         | 3.88 | 3.72 | 5.47           | 1.7   | 2.13 | 3.37          |

Table 12 Crosstalk tolerance with two FEC implementation at thermal noise=2mV.

| Crosstalk Tol. (mv) | CH #5 |      |       | CH #6 |      |          | CH #7 |      |       |
|---------------------|-------|------|-------|-------|------|----------|-------|------|-------|
| BaudRate            | DFE   | E PR | (1+D) | DFE   | PR   | (1+D) DF | DFE   | E PR | (1+D) |
|                     |       |      | MLSE  |       |      | MLSE     |       |      | MLSE  |
| 53.125G@1E-5        | 7.05  | 5.51 | 11.26 | 1.42  | NaN  | 4.43     | NaN   | NaN  | 1.98  |
| 56.5G@3E-4          | 7.61  | 7.81 | 12.23 | 2.7   | 2.38 | 4.75     | NaN   | NaN  | 2.44  |

Table 13 Crosstalk tolerance with two FEC implementation at thermal noise=3mV.





Figure 13 Crosstalk tolerance comparison of three receivers with the same FEC at thermal noise=1mV. (Channel #5, #6 and #7)



Figure 14 Crosstalk tolerance comparison between two FECs at thermal noise=1mV. (Channel #5, #6 and #7)

We first investigate the performance of different DSP with same FEC. Figure 13 gives the crosstalk tolerance comparison of the three receivers with the KP4 FEC (53.125GBd) and a 12% overhead higher gain FEC (56.5GBd) at thermal noise=1mV. The crosstalk tolerance curves are shown both in mV and dBmV. The performance of DSP are summarized as follows:

1) For 53.125GBd at BER=1E-5, the crosstalk tolerance of (1+D) PR receiver is similar with that of DFE receiver, while the crosstalk tolerance of (1+D) MLSE receiver is at least 4.00 dB better than that of DFE receiver.

- 2) For 56.5GBd at BER = 3E-4, the crosstalk tolerance of (1+D) PR receiver is slightly better than that of DFE receiver, while the crosstalk tolerance of (1+D) MLSE receiver is at least 2.70 dB better than that of DFE receiver.
- 3) The performance of DFE and (1+D) PR receivers are similar. Advanced DSP like (1+D) MLSE receiver gives more crosstalk tolerance margin.

Figure 14 gives the crosstalk tolerance comparison between two FECs at thermal noise=1mV. Compared with 6% KP4 FEC with 1E-5 raw BER requirement, the 12% FEC with 3E-4 raw BER requirement will gives 3.16dB, 4.45dB, 4.64dB insertion tolerance extension for DFE, PR, and MLSE receiver under 3mV crosstalk, respectively. If both (1+D) MLSE and 12% FEC are utilized, it will give ~8dB insertion loss tolerance extension. The insertion loss extension of (1+D) MLSE is obtained by extrapolation.



Figure 15 Crosstalk tolerance comparison between two FECs at thermal noise=2mV. (Channel #5, #6 and #7)

Figure 15 gives the crosstalk tolerance comparison between two FECs at thermal noise equals to 2mV. Compared with 6% KP4 FEC with 1E-5 raw BER requirement, the 12% FEC with 3E-4 raw BER requirement will gives 3.72dB, 7.22dB, 1.99dB insertion tolerance extension for DFE, PR, and MLSE receiver under 1.5mV crosstalk, respectively. If both (1+D) MLSE and 12% FEC are utilized, it will give ~7.75dB insertion loss tolerance extension.

Figure 16 gives the crosstalk tolerance comparison between two FECs at thermal noise=3mV. Compared with 6% KP4 FEC with 1E-5 raw BER requirement, the 12% FEC with 3E-4 raw BER requirement will gives 4.09dB, 4.87dB insertion tolerance

extension for DFE and MLSE receiver under 1mV crosstalk, respectively. If both (1+D) MLSE and 12% FEC are utilized, it will give ~11.69dB insertion loss tolerance extension.

In summary, the 56.5GBd links with 12% FEC will provide positive net coding gain over 53.125GBd links with KP4. Besides the advanced DSP, stronger FEC with larger overhead is another dimension of system design to enhance the performance or reduce the overall area and power.



Figure 16 Crosstalk tolerance comparison between two FECs at thermal noise=3mV. (Channel #5, #6 and #7)

#### 8. Conclusions

In this paper, the performance of advanced equalization schemes are investigated and compared. The joint design of DSP, CTLE, ADC, package as well as FEC are discussed. The impact of CTLE bandwidth, ADC resolution, package, and FEC are studied with different equalization schemes. Advanced equalization schemes can ease the pressure of CTLE, ADC and package designs. The FFE+IIR scheme has similar performance compared with FFE+DFE scheme, and the complexity in power and area can be reduces drastically. The FFE+IIR+MLSE scheme will provide large improvement compared with FFE+DFE scheme, and the complexity of MLSE may be reduced if MLSE can be triggered on demand. Stronger FEC is another dimension to enhance the performance or reduce the area and power. The simulation results show that stronger FEC with 12% overhead always provide positive net coding gain under different DSP equalization

schemes. Advanced equalization schemes and stronger FEC can extend the design space of 112G links to enhance the performance or reduce the chip area and power. Joint design of DSP and FEC will potentially push the performance towards the 'Shannon limit'.

#### Reference

- [1] Yu-Chun Lu, Henry Wong, Davide Tonietto, Da-Jun Zang, "DFE Error Propagation Characteristics in Real 56Gbps PAM4 High-Speed Links with Pre-Coding and Impact on the FEC Performance", DesignCon 2017.
- [2] Yu-Chun Lu, Lin Ma, Daochun Mo, Liang Li, "High Gain Low Complexity Low Latency FEC Codes for Ethernet and Backplane Applications", DesignCon 2018.
- [3] <u>http://www.ieee802.org/3/ck/public/adhoc/jan02\_19/lu\_3ck\_adhoc\_01\_010219.pdf</u>
- [4] "IEEE Standard for Ethernet Amendment 2: Physical Layer Specifications and Management Parameters for 100 Gb/s Operation Over Backplanes and Copper Cables", Clause 94, 2014.
- [5] <u>http://www.ieee802.org/3/ck/public/tools/index.html</u>
- [6] <u>http://www.ieee802.org/3/100GEL/public/tools/index.html</u>
- [7] P802.3cd IEEE Draft Standard for Ethernet Amendment: Media Access Control Parameters for 50 Gb/s and Physical Layers and Management Parameters for 50 Gb/s, 100 Gb/s, and 200 Gb/s Operation.
- [8] <u>http://www.ieee802.org/3/ck/public/18\_05/ran\_3ck\_02\_0518.pdf</u>
- B. Zhang, et al., "A 195mW / 55mW dual-path receiver AFE for multistandard 8.5to-11.5 Gb/s serial links in 40nm CMOS," ISSCC Dig. Tech. Papers, pp. 34- 35, Feb. 2013.
- [10] A. Shafik, E. Z. Tabasy, S. Cai, K. Lee, S. Hoyos and S. Palermo, "3.6 A 10Gb/s hybrid ADC-based receiver with embedded 3-tap analog FFE and dynamicallyenabled digital equalization in 65nm CMOS," 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers, San Francisco, CA, 2015, pp. 1-3.
- [11] Y. Frans et al., "A 56Gb/s PAM4 wireline transceiver using a 32-way timeinterleaved SAR ADC in 16nm FinFET," 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits), Honolulu, HI, 2016, pp. 1-2.
- [12] Kull L, Pliva J, Toifl T, et al. Implementation of low-power 6–8 b 30–90 GS/s time-interleaved ADCs with optimized input bandwidth in 32 nm CMOS[J]. IEEE Journal of Solid-State Circuits, 2016, 51(3): 636-648.