

# A 65 nm Data Concentration ASIC for the CMS Outer Tracker Detector Upgrade at HL-LHC

# Benedetta Nodari<sup>1</sup>

Institut de Physique Nucléaire de Lyon (IPNL), IN2P3-CNRS, Lyon, France *E-mail:* b.nodari@ipnl.in2p3.fr

## Luigi Caponetto

Institut de Physique Nucléaire de Lyon (IPNL), IN2P3-CNRS, Lyon, France *E-mail:* 1.caponetto@ipn1.in2p3.fr

## **Geoffrey Galbit**

Institut de Physique Nucléaire de Lyon (IPNL), IN2P3-CNRS, Lyon, France *E-mail:* g.galbit@ipnl.in2p3.fr

## **Sebastien Viret**

Institut de Physique Nucléaire de Lyon (IPNL), IN2P3-CNRS, Lyon, France *E-mail: s.viret@ipnl.in2p3.fr* 

#### Simone Scarfì

Microelectronics System Laboratory (LSM), Ecole Polytechnique Federale Lausanne (EPFL), CERN, Geneva, Switzerland E-mail: simone.scarfi@cern.ch

The Concentrator Integrated Circuit (CIC) ASIC is a front-end chip for both Pixel-Strip (PS) and Strip-Strip (2S) modules of the future Phase-II CMS Outer Tracker upgrade at the High-Luminosity LHC (HL-LHC). It collects the digital data coming from eight upstream front-end chips (either MPAs or CBCs, depending on the module type), formats the signal in data packets containing the trigger information from eight bunch crossings and the raw data from events passing the first trigger level, and finally transmits them to the LpGBT unit. The design and its implementation in a 65 nm CMOS technology of the first prototype that integrates all functionalities for system level operation are presented in this contribution.

Topical Workshop on Electronics for Particle Physics (TWEPP2018) 17-21 September 2018 Antwerp, Belgium

#### <sup>1</sup>Speaker

© Copyright owned by the author(s) under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0)

## 1. Introduction

The next generation of ASICs at HL-LHC will be exposed to the challenging conditions related to extremely high levels of radiation and particle rates [1]. In the so-called phase-II upgrade, the CMS experiment needs a completely new tracker detector, requiring an higher granularity able to cope with the increased occupancy. One of the upgrade's main goal is to maintain the performances of the phase-I CMS detector at the corresponding pileup rates in the HL environment. In order to reach this goal, the new CMS Outer Tracker (OT) employs a novel technique for detecting high transverse momentum particles and provides this information at the 40 MHz Bunch Crossing (BX) rate to the L1-trigger system. In parallel, triggered events are transmitted to the Data Acquisition (DAQ) system at a nominal average trigger rate of 750 kHz.

The detection area of the future tracker modules ( $p_T$ -modules) consists of a superposition of 2 silicon sensors. A charged particle crossing this area creates two correlated clusters, which are reconstructed to form a *stub*. Stubs are selected by the front-end electronics with respect to a configurable threshold based on the transverse momentum ( $p_T$ ) of the particle: this technique allows measuring the radial bend of the particles and thus reducting the required bandwidth of at least one order of magnitude. For track trigger, only low bend stubs are necessary for identifying the high  $p_T$  tracks (>2 GeV/c) and this is possible because of CMS 3.8 T strong magnetic field.

The future OT will be populated by 2 types of p<sub>T</sub>-modules: the PS (Pixel-Strip, located in the innermost region of the tracker and having higher granularity and higher precision hits) and the 2S (Strip-to-Strip, located in the outer parts of the OT). The *Concentrator Integrated Circuit* ASIC (CIC) is the only shared component between the two types of *Front-End Hybrids* (FEH) used in these modules: its role consists in buffering, aggregating and formatting the data coming from the 8 *Front-End* (FE) ASICs within each FEH. Two CIC will be used per module, for a total number of approximately 26600 in the whole OT.

### 2. ASIC architecture

The CIC collects the digital data coming from 8 upstream FE chips (*MPAs* [2] in the case of PS, *CBCs* [3] in the case of 2S), it formats those data in packets containing the trigger information from 8 BXs and the raw data from events passing the first trigger level and it transmits them to the LpGBT chip [4]. The CIC inputs are 6 differential bitlines for each of the 8 FE chip (1 bitline for *L1* data and 5 bitlines for *Trigger* data) at 320 Mbps with a different data format for PS and 2S modules. The outputs are 7 differential bitlines (1 for *L1* and 6 for *Trigger* data) at 320 Mbps (640 Mbps in the case of inner layers PS modules). All the bitlines between FE chips and CIC, and between CIC and LpGBT, are differential (sLVS). Due to the different balancing of the power distribution network between PS and 2S modules, the digital core of the CIC is powered at 1 V (for PS) and 1.25 V (for 2S), while the custom sLVS drivers and receivers [5] are powered at 1.25 V.

Two different data streams are generated by the CIC to the *Back-End* (BE) and independently handled: the *L1* and the *Trigger* data stream. L1 data is the stream of frames responding to the L1-accept trigger signal. Each frame aggregates the hit clusters data received from all the FE chips. The frame size is flexible and it depends on the number of hits in the module. The sustainable capacity (per CIC chip and per L1-accept event) is 254 clusters (127

Pixel-clusters plus 127 Strip-clusters for PS) and 127 clusters (2S). The CIC internal processing of the L1 data starts after reception of the L1-accept signal: the arrival of a new L1 data frame is detected and stored by the corresponding L1-FE Block into a FIFO (see Figure 1). Each FIFO stores data from up to 16 L1 events with a fixed size per event for a maximum latency of 12.6 µs and each L1-FE block handles frame reception independently from the others. Data stored in the eight FIFOs that correspond to the same L1-accept event are then merged by the L1 'output formatter' block.

The Trigger data stream contains the necessary information for the L1-accept generation which is sent through 8 BX-long block-synchronous frames. Each data block includes the data aggregated from 8 FE input chips. The CIC provides to the readout chain an extra factor 10 of data reduction at 320 MHz (5 at 640 MHz), by grouping data over time (8 BX blocks) and space (8 input chips). Furthermore, the CIC can select and forward up to 40 stubs among the 192 potential input stubs.

The CIC stores the trigger data incoming from the FE chips during 8 BXs, sorts them with regard to the information on the stub direction (bend info) and finally sends them out in 8 packets synchronous with the BX clock. A word alignment procedure is required for the data received from the trigger input stream in order to identify the beginning and the end of the event payload and to align them with the 40 MHz clock. The blocks 'StubSelection' and 'StubOutputFormatter', shown in Figure 1, collect the data produced by the 8 FE reconstruction blocks at a frequency of 40 MHz and store them into a stub register. A selection based on the stub bend is performed in the case when the stub occupancy doesn't fit the size of this register, giving priority to the smaller bends. The output stage is a serializer block, taking the packet register and serializing the data on the 6 trigger output bitlines. Whereas the elaboration of the trigger information is the same for both FE chips (CBC and MPA), a different treatment is required for L1 data, depending on the FE chip. Data coming from CBCs are unsparsified, thus a data sparsification process is performed within the CIC.

The allocated bandwidth for trigger and L1 frames is the same for MPA and CBC configuration. However, in order to handle the much higher occupancy in the tracker inner layers, trigger data from the MPA are sent in synchronous blocks of 2 clock cycles in order to average the stub rates. For 2S modules, trigger data transmission can be configured via 5 or 6 output lines. In the PS modules, all 6 trigger lines are always used. As for the 2S configuration, there is the possibility to transmit the trigger block without any bend information.

The input data for the CIC are produced by eight different FE ASICs each one clocked by an external 320 MHz clock. A bitline phase alignment feature is required in order to resynchronize the CIC input bitlines for both datapaths with respect to the internal CIC system clock. A simplified version of the LpGBT phase aligner block (developed by SMU Univ. and CERN) is implemented to achieve this task.

The CIC slow control communication is based on the I<sup>2</sup>C protocol, managed by an I<sup>2</sup>C slave block (developed by CERN) integrated within the chip. There is one differential sLVS system clock input at 320 MHz (or 640 MHz) and a serial control input bitline operating at 320 MHz for fast commands that is able to transmit eight possible commands within one single BX period. The 40 MHz internal reference clock is derived from the fast command input by detecting a specific sync code pattern, while other internal clocks, such as the 20 MHz and the 640 MHz, are generated within the 'System manager' block from the 320 MHz reference clock.



Figure 1: CIC block diagram.

# 3. Physical design

The CIC ASIC has been implemented using a fully scripted digital-on-top methodology. The die dimensions of 2.8 mm by 6.5 mm are imposed by the FEH constraints and by the number of I/O bumps. It is a flipchip design that also includes bondable pads all along the periphery of the die that will be used only for tests. Due to the strict power requirement, a 65 nm CMOS technology has been chosen, with 7 metal layers plus an aluminum re-distribution layer (RDL). This technology provides 6 thin metal layers that are used for signal and power routing and an ultra-thick (3.4  $\mu$ m) metal exclusively used for power routing. The core power distribution is performed via 8 vertical stripes in the RDL layer plus 312 horizontal stripes in metal 7 (ultra-thick). The power routing of the periphery supply is separated from the core and the radiation tolerant ESD protections (provided by SOFICS) are placed in the periphery ring.

The digital core is implemented using standard cell libraries having standard and low threshold voltage devices to locally improve speed performances. The power distribution network has been validated through an activity-based power verification that have provided the IR drop maps shown in Figures 2b and 2c. The power consumption simulations in worst corner (at 1.1 V for PS; at 1.32V for 2S) have estimated 288 mW (PS) and 415 mW (2S) as average power values, during a typical running phase. These estimated values are above the power requirements (200 mW for PS; 300 mW for 2S), but no power optimization technique has been applied in the design of this prototype.





Figure 2: (a) Final layout; (b) VDD IR drop; (c) VSS IR drop.

The use of radiation tolerant technique (triple module redundancy technique) is foreseen to be applied in the next version of CIC in order to mitigate the effects of Single Event Upset (SEU) in the digital control circuitry. The consequences in power dissipation increase, due to the use of such a triplication technique, will be addressed by a dedicated power optimization study.

#### 4. Simulation results

A standalone script-based verification environment has been implemented to extensively simulate the CIC with back-annotated delays extracted from the final layout (in Figure 2a). Simulations have been performed in 3 different corners (maximum at 0.9 V, -40°C; typical at 1 V, 25°C, inimum at 1.32 V, 0°C) and for both test cases (PS and 2S), showing that all the functionalities are met. The testbench performs the comparison between data streams from CMS simulation environment with the CIC model outputs after the phase alignment and data treatment. In addition to that verification environment, a system level testbench for validating the full acquisition chain composed by eight FE ASICs has been developed by CERN [6].

#### 5. Conclusion and future developments

The CIC is a front-end chip common to both PS and 2S modules for the future Phase-II upgrade CMS Outer Tracker. The first prototype integrating all required functionalities has been submitted for fabrication. Timing analysis and post layout simulations show that the prototype reaches the required performances for both the FE configurations. For the second iteration of the CIC the triplication of the control paths is the first feature that will be introduced in the design along with the optimization of its power consumption.

#### Acknowledgements

The authors would like to thank Alessandro Caratelli, Davide Ceresa and Kostas Kloukinas for the help received in the definition of the digital flow of the CIC.

## References

- [1] CMS collaboration, CMS Technical Design Report for the Phase-2 Tracker Upgrade, CERN-LHCC-2017-009, CMS-TDR-014.
- [2] Design and simulation of a 65 nm Macro-Pixel Readout ASIC (MPA) for the Pixel-Strip (PS) module of the CMS Outer Tracker detector at the HL-LHC, D. Ceresa et al., PoS TWEPP 2017: 032.
- [3] *CBC3: a CMS microstrip readout ASIC with logic for track-trigger modules at HL-LHC*, M.Prydderch et al., PoS TWEPP 2017: 001.
- [4] https://indico.in2p3.fr/event/14305/contributions/17786/attachments/14696/18009/moreira\_benodet 201705191708.pdf.
- [5] *Design of low-power, low-voltage, differential I/O links for High Energy Physics applications*, G. Traversi et al., JINST Vol. 10, Jan'15.
- [6] System Level simulation framework for the ASICs development of a novel particle physics detector", A.Caratelli et al., 2018 PRIME 10.1109/PRIME.2018.8430367.