

# Short-Strip ASIC (SSA): A 65nm Silicon-Strip Readout ASIC for the Pixel-Strip (PS) Module of the CMS Outer Tracker Detector Upgrade at HL-LHC

# Alessandro Caratelli<sup>\* $ab\dagger$ </sup>, Davide Ceresa<sup> $a\dagger$ </sup>, Jan Kaplon<sup>a</sup>, Kostas Kloukinas<sup>a</sup>, Yusuf Leblebici<sup>b</sup>, Jan Murdzek<sup>a</sup>, Simone Scarfi<sup>ab</sup>

<sup>a</sup> CERN,

Geneva, Switzerland.

<sup>b</sup> Microelectronic System Laboratory (LSM), École polytechnique fédérale de Lausanne (EPFL). Lausanne Switzerland

*E-mail:* Alessandro.Caratelli@epfl.ch, Davide.Ceresa@cern.ch, Jan.Kaplon@cern.ch, Kostas.Kloukinas@cern.ch, Yusuf.Leblebici@epfl.ch, Jan.Murdzek@cern.ch, Simone.Scarfi@epfl.ch

The Compact Muon Solenoid (CMS) experiment at CERN is foreseen to receive a substantial upgrade of the outer tracker detector and its front-end readout electronics, requiring higher granularity and readout bandwidth to handle the large number of pileup events in the High-Luminosity LHC. For this reason, the entire tracking system will be replaced with new detectors featuring higher radiation tolerance and ability to handle higher data rates and readout bandwidths [1]. The possibility to identify particles with high transverse momentum (>2 GeV/c) and provide primitives for the L1 trigger decision, was achieved by the adoption of double layer sensor modules, combining a pixel sensor with a strip one. Two different front-end ASICs were developed, the Short Strip ASIC (SSA) and the Macro-Pixel ASIC (MPA) [2], in order to readout the sensors hits and to locally process and reduce the total output data flow with a compression factor of around 20 [3]. The SSA is the front-end ASIC responsible of reading-out the Short-Strip silicon sensor and to provide encoded information for the particle momentum discrimination. It is a 120-channel ASIC with double-threshold binary readout architecture, utilizing a quick hit cluster finding logic to provide encoded hit information for particle momentum discrimination to the Macro Pixel ASIC (MPA) at the bunch crossing rate of 40 MHz, while allowing the full sensor readout at a nominal average trigger rate of 1 MHz. To match the strict power requirement of 50 mW and the radiation tolerance up to a total ionizing dose of 200 Mrad, low power and radiation hardening techniques have been employed. The design and the implementation in a 65 nm CMOS technology of the first prototype ASIC that integrates all functionalities for system level operation is presented in this paper.

Topical Workshop on Electronics for Particle Physics 11 - 14 September 2017 Santa Cruz, California

\*Speaker. †Main authors

<sup>©</sup> Copyright owned by the author(s) under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0).

## **1. ASIC architecture**

The upgrade of the CMS Outer Tracker Detector for the High Luminosity LHC will adopt a double layer sensor architecture in order to provide a fast on-detector identification of high transverse momentum (high- $\rho$ T) tracks (> 2 GeV/c) for the Level-1 trigger decision [1]. The Short-Strip ASIC (SSA) is the chip responsible of reading out the P-type silicon strip sensor [4], providing encoded data for the particle discrimination (stub generation) [3].

Each sensor signal is read by the analog front-end circuitry (channel). A single ended architecture of the input stage is optimal from the point of view of noise performance and the demanding power requirements of maximum 50 mW per chip. After the amplification and the shaper stage, the hit signals are discriminated with a double threshold binary system. The lower threshold, called detection threshold, detects hits with an energy over 1/4 of the minimum ionizing particle energy (MIP). The second threshold is configurable and set around 1.5 MIP in order to distinguish hits related to high ionizing particles (HIPs). The discriminator pulses are digitalized with an edge sensitive circuitry able to guarantee no dead cycles for consecutive particle hits, and sampled at the 40 MHz bunch crossing (BX) rate (frequency of the particle collision in the LHC). In order to maximize the hit detection efficiency, the sampling clock phase is adjustable with a precision of 200 ps across the full 25 ns period. A coarse de-skewing is achieved by selecting the clock generation phase from the 320 MHz input clock. A more fine de-skewing is accomplished by the internal Delay Locked Loop (DLL).

The digitalized hits are separated in two distinct data paths: the stub data path (Figure 1.a) and the L1 data path (Figure 1.b). The first generates the primitives for the discrimination of high momentum particles which are transmitted for every event. The encoded coordinates of the hit clusters are sent at the 40 MHz bunch crossing (BX) rate, with a bandwidth of 2.560 Gbps. In the L1 data path, the complete raw sensor image is internally stored and transmitted only when requested by Level-1 trigger system. The internal memory and FIFOs are designed to support trigger rates up to 1 MHz. For testing and calibration purposes, a 15-bits ripple counter connects to the discriminator of each channel and can be serially readout when the ASIC is set in calibration mode (Figure 1.c).



**Figure 1:** a) block diagram of the stub data path; b) block diagram of the L1 data path; c) scheme of the calibration and asynchronous readout



Figure 2: Schematic of the single channel analog front-end

Configuration registers accessible via a slow-control serial interface, allow to define the operating modes and calibrate front-end parameters like the values for threshold equalization.

# 1.1 Analog Front-End

The analog front-end channel (Figure 2) consists of a trans-impedance preamplifier AC coupled to the booster amplifier enclosed with active feedback providing attenuation of the overshoot. Amplified signal is applied through passive low pass filter to the inputs of two discriminators (high and low threshold). The first stage of the discriminator is built with folded cascode differential amplifier loaded with resistors and a swing limiter. The common threshold voltage is applied to one of the inputs through a low pass filter. The trimming voltage for the discriminator offset compensation is applied by the means of the 5 bit current DAC supplying the current to the output of the booster (thanks to the low output impedance of this stage) and developing the DC drop voltage on the resistors used in the low pass filter. The peaking time and the gain can be adjusted by switching off the capacitor matrix (4 bit resolution) in the RC filter. For the nominal process parameters, the peaking time is 17 ns and the gain is in the order of 60 mV/fC. The simulated ENC for 5 pF sensor capacitance is below  $800e^{-}$ . The overall current consumption of the single analog front-end channel is lower than  $220 \,\mu$ A.

# 1.2 Stub data processing

A quick combinatorial clustering of the hits is accomplished at the 40 MHz event rate. Particles with high transverse momentum and energy in the interesting range, have low probability to generate clusters larger than 400  $\mu$ m. Wide clusters can be therefore eliminated at this stage, with consequent save in terms of bandwidth and data processing. Since the PS-module strip sensor is composed by 1920 strips disposed in two columns, 16 SSAs are necessary for the readout operation. To handle hit clusters located at the edge between two ASICs, the SSA implements a lateral communication, transmitting the information related to the eight most peripheral channels to the neighbor chips via a dedicated SLVS link operating at 320 Mbps. A programmable offset is applied to centroids accordingly to the module location. This offset corrects the parallax error generated by approximating a cylindrical geometry with planar pixel-strip sensors. The coordinates of the cluster information can be transmitted per event. The SSA continuously streams out the trigger data to the MPA using eight differential SLVS links [5] operating at 320 Mb/s.

# 1.3 Raw image transmission

The full sensor image (called raw-data) is stored for every cycle in a radiation tolerant static RAM [6] located in the periphery of the SSA ASIC. This circular pipeline memory keeps the raw data for the latency of the level-1 trigger (L1) data processing (up to  $12.8 \,\mu s$  [1]). When a L1 trigger decision is received on the fast control input differential port, the raw-data is processed and prepared for transmission. While the raw-data is transmitted without compression with a fixed packet size since no data loss is accepted at this stage, signals related to HIPs are transmitted along the frame with a zero-suppression technique which limits their number to 24 HIP flags per bunch crossing period. The HIP flag generation consists in a fast hit clustering logic followed by a detection logic able to verify if the high ionizing particle threshold was passed for at least one strip within each cluster. The HIP flags are reordered and the L1 data frame is build. A configurable depth FIFO allows to queue the frames to be transmitted, permitting L1 trigger rates up to 1 MHz and up to 16 consecutive L1 triggers.

# 2. ASIC implementation and results

For the design implementation, a fully scripted Digital-On-Top flip-chip methodology was used. The die dimensions are 11 mm by 3.5 mm. A bump pitch of  $270 \,\mu$ m with a passivation opening of 70  $\mu$ m was selected to match the requirements of the bare-die assembly on the front-end module flex. An additional row of wire-bond pads is present for testability and for wafer probing. Special attention was given in the use of radiation tolerant techniques in order to mitigate the effects of Single Event Upsets (SEU) in the digital control circuitry, while maintaining low power consumption. A full triple module redundancy (TMR) technique was applied to all the control state machines and the configuration registers at RTL level. Specific constraints were developed for synthesis and standard cell placement, in order to guarantee a minimum distance of 15  $\mu$ m among triplicated registers, while allowing optimization. The clock distribution was also triplicated to mitigate the effect of single event transient (SET) on the clock tree buffer elements. Radiation hardening techniques have also been adopted in the analog front-end circuits, biasing modules, DLL and memories, in order to guarantee operation up to a target Total Ionizing Dose (TID of 100 Mrad (1 MGy). Due to the strong degradation induced by high TID in 65 nm technology, cells with short channel length transistors have been avoided.



Figure 3: Final layout of the SSA ASIC

Due to the strict power requirement, a 65 nm CMOS technology was chosen for the chip-set, with 7 metal layers plus an aluminum re-distribution Layer (RDL). While the analog Front-End and the custom SLVS drivers are powered at 1.25 V, the digital core and the memories are supplied with a typical voltage of 1 V. The digital core is implemented using different threshold voltage devices to reduce the overall power consumption while locally improve the performance on critical paths. Clock gating was largely used across the digital domain of the SSA. For this reason, a SEU detection logic was necessary to locally re-enable the clock to correct eventual induced errors.

# 2.1 Post-layout simulation results

Figure 3 shows the final layout of the ASIC. A module-level verification environment, described in [3], allowed for extensive simulation of the chip-set with back-annotated delays extracted from the final layout. Considering a maximum occupancy foreseen close to 4 stubs/module for a pileup 200, the simulation shows a total inefficiency of the readout chain due to the bandwidth limitation < 0.2%. The static timing analysis together with the post-layout simulations allowed to prove the ASIC functionalities and verify that SSA to MPA and SSA lateral communication works for all the corners combinations. Activity-based power verification allowed to provide a detailed study of the voltage drop which results limited to < 20 mV, and to evaluate the total average power consumption of 44.9 mW (< 50 mW required).

# 3. Conclusions

The first prototype of the SSA ASIC integrating all required functionalities for system level operation was submitted for prototyping in a common full-mask-set TSMC 65 *nm* engineering run. Timing analysis and post layout simulations shows that the prototype is matching the efficiency and power requirements.

#### Acknowledgement

We would like to show our gratitude to Prof. Datao Gong Ye, Yang Dongxu and Jian Wang from SMU for the design of the Phase Aligner and to Prof. Gianluca Traversi and Dr. Francesco De Canio from Bergamo/Pavia University for the design of the SLVS transmitters and receivers.

#### References

- CMS collaboration. "Technical proposal for the Phase-II upgrade of the Compact Muon Solenoid." CERN-LHCC-2015-010, LHCC-P-008 (2015).
- [2] Ceresa, Davide, et al. "A 65 nm pixel readout ASIC with quick transverse momentum discrimination capabilities for the CMS Tracker at HL-LHC." Journal of Instrumentation 11.01 (2016): C01054.
- [3] Ceresa, Davide, et al. "Readout architecture for the Pixel-Strip module of the CMS Outer Tracker Phase-2 upgrade." PoS (2017): 066.
- [4] Adam, W., et al. "P-type silicon strip sensors for the new CMS tracker at HL-L-HC." Journal of instrumentation.-Bristol, 2006, currens 12 (2017).
- [5] G. Traversi et al. Design of low-power, low-voltage, differential I/O links for High Energy Physics applications, 2015 JINST 10 C01055.
- [6] R. Brouns et Al, Development of a radiation tolerant low power SRAM compiler, AMICSA 2014.