A Prototype of a New Generation Readout ASIC in 65 nm CMOS for Pixel Detectors at HL-LHC

L. Pacher\*, E. Monteil
Università di Torino and INFN Sezione di Torino, Torino, Italy

A. Paternò, S. Panati
Politecnico di Torino and INFN Sezione di Torino, Torino, Italy

L. Demaria, A. Rivetti, M. Da Rocha Rolo, G. Dellacasa, G. Mazza, F. Rotondo, R. Wheadon
INFN Sezione di Torino, Torino, Italy

F. Loddo, F. Licciulli
INFN Sezione di Bari, Bari, Italy

F. Ciciriello, C. Marzocca
Politecnico di Bari and INFN Sezione di Bari, Bari, Italy

L. Gaioni, G. Traversi, V. Re
Università di Bergamo and INFN Sezione di Pavia, Bergamo, Italy

F. De Canio, L. Ratti
Università di Pavia and INFN Sezione di Pavia, Pavia, Italy

S. Marconi, P. Placidi
Università di Perugia and INFN Sezione di Perugia, Perugia, Italy

G. Magazzù
INFN Sezione di Pisa, Pisa, Italy

A. Stabile
Università di Milano and INFN Sezione di Milano, Milano, Italy

S. Mattiazzo
Università di Padova, Italy
Abstract - The foreseen High-Luminosity upgrade at the CERN Large Hadron Collider (LHC) will constitute a new frontier for particle physics after year 2024, demanding for the installation of new silicon pixel detectors able to withstand unprecedented track densities and radiation levels in the inner tracking systems of current general-purpose experiments. This paper describes the implementation of a new-generation pixel chip demonstrator using a commercial 65 nm CMOS technology and targeting HL-LHC specifications. It was designed as part of the Italian INFN CHIPIX65 project and in close synergy with the international CERN RD53 collaboration on 65 nm CMOS.

The prototype is composed of a matrix of 64×64 pixels with 50 μm × 50 μm cells featuring a compact design, low-noise and low-power performance. The pixel array integrates two different analogue front-end architectures working in parallel, one with asynchronous and one with synchronous hit discriminators. Common characteristics are a compact layout able to fit into half the pixel size, low-noise performance (ENC < 100 e− RMS for 50 fF input capacitance), below 5 μW/pixel power consumption, linear charge measurements up to 30 ke− input charge using Time-over-Threshold (ToT) encoding and leakage current compensation up to 50 nA per pixel.

A novel region-based digital architecture has been designed in order to ensure > 99% efficiency for expected 3 GHz/cm² hit rate, 1 MHz trigger rate and 12.5 μs trigger latency at HL-LHC. Pixels have been organized into regions of 4×4 cells and a common synthesized logic shared among all pixels provides a centralized memory for latency buffering, performs the trigger matching and handles the local configuration. The simulated particle inefficiency for this architecture is below 0.1% under nominal HL-LHC conditions.

All global biases and voltages required by analogue front-ends are generated on-chip using 10-bit programmable DACs. Bias currents and voltages can be monitored by a 12-bit ADC. A bandgap voltage reference circuit provides a stable reference voltage for all these blocks. The readout of triggered data is based on replicated FIFOs placed at the chip periphery. Data are finally sent off-chip with 8b/10b encoding using a high-speed serializer. Triggerless and debug operating modes are also supported. Chip configuration and slow-control are performed through fully-duplex synchronous Serial Peripheral Interface (SPI) master/slave transactions. The digital I/O interface uses custom-designed JEDEC-compliant SLVS transmitters and receivers.

All blocks and analogue front-ends have been silicon-proven during a previous prototyping phase and were demonstrated to be radiation tolerant up to 580 Mrad Total Ionizing Dose (TID) or beyond. The CHIPIX65 demonstrator was submitted for fabrication on July 2016. It was received back from the foundry on October 2016 and preliminary experimental characterizations started.
1. Introduction

Research and development activities devoted to the design of new pixel Application Specific Integrated Circuits (ASIC) suitable for HL-LHC upgrades have started. A commercial 65 nm CMOS technology has been identified by the pixel ASIC community as a promising choice for the implementation of new-generation readout chips for HL-LHC. Radiation hardness qualification and design activities using such a 65 nm CMOS are now part of the CERN RD53 international collaboration research program [1] and of the Italian INFN CHIPIX65 project [2, 3].

An innovative 64×64 pixel array demonstrator has been designed and fabricated as part of the CHIPIX65 project targeting HL-LHC requirements. The final layout of the prototype is presented in Figure 1. The chip integrates two different architectures of analogue front-ends, one synchronous and one asynchronous, working in parallel. Charge measurement is performed by means of Time-over-Threshold (ToT) encoding up to 5-bit resolution. Pixels of 50 µm × 50 µm size have been grouped in regions of 4×4 pixels. A novel region-based digital architecture for local buffering and trigger matching was designed to guarantee > 99% efficiency at nominal 3 GHz/cm² hit rate and 1 MHz trigger rate with 12.5 µs trigger latency at HL-LHC. The chip periphery implements a FIFO-based readout architecture. Configuration and slow-control are performed through Serial Peripheral Interface (SPI) protocol. Silicon-proven IP blocks designed by INFN for RD53 such as biasing DACs, a bandgap voltage reference, a monitoring ADC, a high-speed serializer and custom SLVS drivers have been included. The chip was submitted for fabrication and received back from the foundry on October 2016. Details about prototype implementation are discussed in the following.

Figure 1: CHIPIX65 demonstrator layout, 3.5 mm × 5.1 mm. Pixel regions with synchronous (1) and asynchronous (2) analogue front-ends. Replicated bias cells with current mirrors (3). Global DACs (4), bandgap voltage reference (5) and monitoring ADC (6). Readout/configuration digital block and high-speed serializer at the chip periphery (7). SLVS transmitters/receivers and I/O cells (8).
2. Analogue front-ends

The CHIPIX65 demonstrator integrates two different analogue front-end (AFE) architectures interfacing with a common digital readout and configuration scheme. One half of the pixel array has been equipped with a synchronous architecture, the second one with a continuous-time architecture. Both solutions were successfully validated on silicon using dedicated small-prototypes submitted by INFN and tested after irradiation. In order to minimize area and power consumption each circuit features a shaper-less Charge-Sensitive Amplifier (CSA) with Krummenacher feedback, which provides triangular pulse shaping for linear Time-over-Threshold (ToT) charge measurement and sensor leakage-current compensation. A common calibration circuit is used to inject test charges at the input node.

A block diagram of the synchronous front-end chain [4, 5] is reported in Figure 2. The input stage uses a core amplifier implemented as a telescopic-cascode inverting amplifier with two selectable feedback capacitors. A track-and-latch voltage comparator is adopted for the hit discrimination. The generation of a CMOS digital pulse when a signal is found above the nominal threshold is synchronized with a 40 MHz clock strobing the latch. A fast ToT charge encoding can be retrieved at the pixel level exploiting a high-frequency self-generated clock signal up to 800 MHz. This is obtained by turning the latch into a voltage-controlled oscillator (VCO) using asynchronous logic. Finally pixel-to-pixel threshold variations are compensated by means of autozeroing based on Output Offset Storage (OOS) between the differential amplifier and the positive-feedback latch. As a result there is no need of a per-pixel DAC for the local threshold adjustment. AC coupling is also performed to avoid baseline fluctuations to propagate to the discriminator. Test measurements performed on this architecture have shown very promising results. On the one hand, low-noise performance with an ENC of about 90 e\(^{-}\)RMS at 100 fF input capacitance are guaranteed despite the intense latch digital switching activity. On the other hand both autozeroing and latch operations as a local oscillator have been fully validated. Irradiation tests show that the front-end is still fully working after 600 Mrad TID with negligible degradation of analogue key parameters.

![Figure 2: Synchronous front-end chain. From left to right: charge-sensitive amplifier with Krummenacher feedback, AC coupling and track-and-latch voltage comparator with autozeroing by means of Output Offset Storage (OOS) technique.](image)
Figure 3: Asynchronous front-end chain. From left to right: charge-sensitive amplifier with Krummenacher feedback, voltage-to-current transconductor and fast current comparator.

The asynchronous front-end solution [6, 7] is depicted in schematic of Figure 3. The design features different CSA and feedback-network optimizations despite the common choice of a Krummenacher based solution. The core amplifier uses a folded-cascode inverting amplifier with selectable charge sensitivity. A current-comparator is then adopted for the hit discrimination. A transconductor stage provides the voltage-to-current conversion, feeding its output current to a transimpedance amplifier. CMOS inverters are then used to obtain a full-swing rail-to-rail digital pulse. The threshold value is defined by a current flowing into a diode-connected MOS device. A global reference current is distributed to all pixels, whereas a local 4-bit binary-weighted DAC compensates pixel-to-pixel threshold variations. Measurements on test prototypes indicate that low-noise performance with an ENC of about 80 e− RMS at zero input capacitance are fully compliant with specifications. Leakage currents up to 15 nA are efficiently compensated without affecting preamplifier operations. Irradiation tests up to 800 Mrad TID have shown no significant degradation in the preamplifier signal shape and a 20% increase of the noise at 50 fF input capacitance.

3. Regional digital architecture

A novel region-based digital architecture for latency buffering and trigger matching able to withstand extended trigger latencies and unprecedented data rates at HL-LHC has been investigated and implemented. As shown in Figure 4, pixels have been grouped in pixel regions (PR) composed of 4×4 pixels. Each region includes 16 analogue front-ends arranged in so-called analogue islands. A common digital logic shared among pixels stores hits information for the whole trigger latency, handles the local configuration, performs trigger matching and sends zero-suppressed hit data to the readout block at the chip periphery upon a trigger request. In a different way with respect to a traditional double-column approach, automated digital synthesis and place-and-route tasks were performed on an entire pixel region including analogue front-ends treated as macros. The advantage offered by a region-based digital architecture resides in the possibility of sharing among pixels common functionalities and temporary data-storage capabilities. A single 4×4 pixel region represents therefore the basic replica-unit layout for clock distribution and pixel array assembling.
Figure 4: Layout view of a $4 \times 4$ pixel region (PR). The design includes 16 analogue front-ends arranged into $2 \times 2$ analogue islands surrounded by a common digital logic for data readout and configuration.

Figure 5: Selected analytical and physics-driven simulation results from high-level architecture studies using SystemVerilog/UVM and assuming a $4 \times 4$ pixel region arrangement. Buffer overflow probability versus chosen number of memory locations for 3 GHz/cm$^2$ hit rate and 12.5 $\mu$s trigger latency (left) and percentage of binary information versus number of ToT words saved per pixel region (right). A buffer depth of 16 memory locations guarantees an overall inefficiency below 0.1\%, whereas the ToT information is registered for more than 99.5\% of hits despite a limited number of 6 ToT codes saved per region.
The choice for a 4×4 pixel region organization was assessed by means of extensive analytical and physics-driven high-level simulations using the verification environment based on SystemVerilog and Universal Verification Methodology (UVM) classes developed within RD53 [8, 9]. As shown in Figure 5, a common centralized buffer with 16 memory locations, shared among pixels, ensures an inefficiency below 0.1% assuming to have a 4×4 pixels arrangement and a nominal 12.5 µs trigger latency. The charge information is retrieved from each pixel using 5-bit local ToT counters. Indeed, data compression based on priority queues is performed in the pixel region. The ToT information is saved for only a fixed number of 6 fired pixels, discarding codes from other pixels not in the priority queue. ToT words from pixels in the queue are temporarily stored into a common latch-based circular buffer along with a Gray-encoded timestamp sent from the chip periphery, waiting for a trigger validation. Additionally, binary information is always available for all 16 pixels in the pixel region for the whole trigger latency. Trigger matching is finally performed using a set of comparators. Proper synchronization in accessing the shared buffer during write operations is guaranteed by a configurable per-pixel fixed-deadtime counter, which can be set to either 5 or 15 clock cycles. With such an architecture the memory usage is optimized without writing unnecessary zeroes, which in turn allowed to save about 60% of the area in terms of number of latches in the shared memory. According to simulations the ToT information is registered for more than 99.5% of hits regardless the limited number of ToT words saved per region. Beside default triggered operations, a triggerless readout is also supported. A debug mode that bypasses selectable analogue front-end outputs can be adopted in both triggered and triggerless configurations, thus increasing the testability for the overall digital design.

The implemented pixel configuration scheme uses a parallel approach that allows to fully address pixels to be configured. Payload data are written into 16-bit width Pixel Configuration Registers (PCR) according to commands and addresses received from the chip periphery. At the same time configuration data can be also read back for debug purposes.

4. Bias network and monitoring

All bias currents and voltages featuring programmability or fine-tuning requirements in the pixel array are generated using programmable DACs and distributed to pixel columns. On-chip monitoring capabilities have been also included into the prototype by means of an ADC. A schematic block diagram of the adopted bias distribution and monitoring network placed at the chip periphery is presented in Figure 6. Synchronous front-ends use 9 DACs, whereas pixels equipped with continuous-time analogue front-ends need 6 DACs. An additional DAC common to both synchronous and asynchronous front-end architectures provides a programmable calibration level to inject a test charge in selected pixels. Each block has been implemented as a segmented 10-bit current-steering DAC [10]. Irradiation tests with both X-rays and protons up to 1 Grad TID have shown that the circuit still works with acceptable degradation in terms of INL and DNL performance metrics. Replicated bias cells placed at the bottom of the pixel array provide biases to pixel columns. Each cell uses highly optimized current mirrors to obtain necessary currents and voltages with the requested resolution and linearity in the full analogue front-ends operating range.
All global DACs use a 4 $\mu$A reference current obtained from a well-defined DC level generated by a bandgap voltage reference (BGR) [11]. In order to ensure stability against radiation damage the circuit uses a MOS device biased in weak-inversion instead of a bipolar transistor. Radiation hardening by design (RHBD) using enclosed-layout transistors (ELT) was also adopted. This block was successfully characterized with X-rays up to 580 Mrad TID. Mismatches and process variations affecting this reference voltage can be compensated by a programmable 5-bit trimming resistor. An additional 5-bit current DAC can be used to further adjust with 250 nA resolution the nominal reference current fed to global DACs if requested.

The bandgap reference voltage and all global currents/voltages are accessible on dedicated test-pads through internal multiplexing. At the same time they can be monitored on-chip using a 12-bit ADC. A dual-slope integrating architecture has been adopted for this block. An automated calibration algorithm has been also implemented to finely adjust ADC analogue internal parameters.

5. End of Column (EoC) readout, chip configuration and I/O

A digital block placed at the chip periphery implements all readout and chip configuration functionalities. The readout of triggered data coming from pixel regions is based on replicated modules referred to as Macro-Column Drainers (MCD). Each MCD module handles triggers and pixel outputs for a macro-column composed of 16 pixel regions. A first FIFO is used to buffer hit packets drained from pixel regions upon a trigger request. A second FIFO is used to temporarily store triggers and trigger timestamps to be sent to pixel regions if a macro-column readout is still
ongoing. Triggerless readout operations are also supported by the logic. Binary counters are used to generate bunch-crossing and trigger timestamp buses distributed to macro-columns. The trigger latency and the fixed-deadtime introduced for pixel regions are properly accounted when generating the trigger timestamp. Timestamp words can be distributed to pixel regions either with binary or Gray-encoded formatting. Data stored on replicated MCDs are polled and read by a finite state machine, looping over MCDs FIFOs and pushing their contents into a common output FIFO. Data packets with 8b/10b encoding are finally assembled and fed to a high-speed serializer integrated as a standalone macro. This block was designed to sustain 2 Gb/s readout speed and successfully prototyped and tested after irradiation. Indeed, a 320 MHz clock has been adopted for the serializer, providing enough bandwidth to process nominal HL-LHC hit rates re-scaled on a 64×64 pixel matrix. Design-for-Test (DFT) has been also included for all readout components with scan-chain synthesis and shadow-logic insertion with scannable registers around FIFOs.

Chip configuration and slow-control are performed at 13.33 MHz through fully-duplex synchronous Serial Peripheral Interface (SPI) protocol. A dedicated control logic interfaces with a customized SPI slave-port which decodes commands, addresses and payload data starting from 24-bit width serial streams. Configuration data are then distributed to Global Configuration Registers (GCR), End-of-Column Configuration Registers (ECCR) or to the pixel array according to SPI commands and addresses. An automated internal generation of register addresses has been also implemented using counters. Triple Modular Redundancy (TMR) against radiation-induced Single-Event Upsets (SEU) was included for all configuration registers at the chip periphery. Additional SPI commands are used to control ADC start/read operations, autozeroing operations for synchronous pixels and proper serializer synchronization.

Scalable Low-Voltage Signaling (SLVS) has been adopted for all digital I/O signals. Custom JEDEC-compliant SLVS-400 transmitters and receivers were therefore used to interface the core logic with I/O pads. These blocks were successfully characterized and tested after irradiation.

6. Conclusions

A prototype of a new-generation pixel readout ASIC has been designed and fabricated as part of the Italian INFN CHIPIX65 project using a commercial 65 nm CMOS technology. The chip is composed of a matrix of 64×64 pixels with 50 µm × 50 µm pixel size embedding two different architectures of analogue front-ends working in parallel. A novel region-based digital architecture for latency buffering and trigger matching has been investigated and implemented in order to fully address HL-LHC requirements. All blocks included into the chip to provide biasing, monitoring and I/O functionalities are silicon-proven and were characterized through small prototypes, demonstrating their reliability also after irradiation up to 580 Mrad TID or beyond. The final layout of the chip was submitted and successfully accepted for fabrication on July 2016. Samples were received back from the foundry on October 2016 and experimental characterizations have started.

This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement n. 654168 and from the Italian Ministry of Education and Research (MIUR) under grant agreement n. 2012Z23ERZ, PRIN 2012.
References


