A System-Verilog Verification Environment for the CIC Data Concentrator ASIC of the CMS Outer Tracker Phase-2 Upgrades

Simone Scarfi, Alessandro Caratelli, Luigi Caponetto, Davide Ceresa, Geoffrey Galbit, Kostas Kloukinas, Yusuf Leblebici, Benedetta Nodari and Sebastien Viret on behalf of the CMS Tracker Group

a CERN, Geneva, Switzerland.
b Microelectronic System Laboratory (LSM), École polytechnique fédérale de Lausanne (EPFL), Lausanne, Switzerland
c Institut de Physique Nucléaire de Lyon (IPNL), IN2P3, CNRS, Lyon, France

E-mail: Simone.Scarfi@cern.ch, Alessandro.Caratelli@cern.ch

The foreseen Phase-2 upgrades at the LHC present very challenging requirements for the front-end readout electronics of the CMS Outer Tracker detector. High data rates in combination with the employment of a novel technique for rejecting locally low transverse momentum particles as well as the strict low power consumption constraints require the implementation of an optimized readout architecture and specific interconnect synchronization schemes for its components. This work focuses on the development and the verification of the Concentrator IC (CIC) ASIC, a 65 nm digital chip featuring high input and output data rates, in the context of the readout chains incorporating all front-end ASICs: namely the Macro Pixel ASIC (MPA), Short Strip ASIC (SSA) for the Pixel-Strip (PS) modules and the CMS Binary Chip (CBC) for Strip-Strip (2S) Modules. The CIC ASIC receives high data rate (320 MHz) digital streams from eight Front-end ASICs via a total of 48 differential lines and transmits them through seven differential lines operating at 320 MHz or 640 MHz, depending on the occupancy of the detector module. A complex system level simulation environment based on the System-Verilog hardware description language and on the Universal Verification Methodology (UVM) platform has been adapted and extended to help the CIC development and verification simulating the complete readout chains from the particle event to the output of the modules. The paper is composed of four sections: the first one describes the $p_T$ module concept, the second presents the UVM environment for MPA/SSA ASICs adapted and extended to include the CIC, the third one shows the readout-chain forecasted performances and show some examples of usage of this framework. The last section presents the PS module efficiency as a function of the stub occupancy for different CIC output frequencies.

Topical Workshop on Electronics for Particle Physics (TWEBP2018)
17 - 21 September 2018,
KU Leuven - Campus Carolus, Antwerpen, Belgium

*Speaker.
†Main authors
1. CMS Outer Tracker readout chains description

For the CMS Outer Tracker Upgrade [1] two different modules will be used: namely the PS module and the 2S module. Both modules are composed of two closely (few mm) spaced silicon sensors. The particle trajectories are bent by the high magnetic field (3.8 T). Correlating the information from the two layers allows to evaluate the incident particle transverse momentum ($p_T$). High-$p_T$ particles, above a certain threshold, are more interesting for the scope of the research. The pairs of hits in the two sensors of a module, called stubs, are sent out synchronously at 40 MHz to be analyzed, and kept in a memory for 12.6 $\mu$s waiting for the Level-1 (L1) trigger decision. In Figure 1 a cross section of the PS module/2S module is shown. The readout of the two sensor layers is done by 16 SSA [2] and 16 MPA ASICs [3] in the PS module and by 16 CBC ASICs [4] in the 2S module. To reduce the module output bandwidth a concentrator chip has been developed, namely the CIC. The CIC has two independent paths called stubs data path and L1 raw data path and receives 48 input lines operating at 320 MHz:

- 40 lines (5 from each FE-Chip) for stub data sent out synchronously at each Bunch Crossing (BX) for the L1 trigger decision. The CIC has to perform stub data sorting according to the stub bend, collecting stubs from 8 FE-Chips (MPA or CBC) over 8 BXs.
- 8 lines (1 from each FE-Chip) for full sensor raw data only when a L1 trigger is received. The CIC stores L1 words coming from 8 FE-Chips over 8 BXs in 8 FIFOs and send out the full packet of L1 raw data when ready. The maximum L1 trigger frequency is 750 KHz.

The FE-Chips and CIC clock and fast commands are provided at the module level by the Low-power GigaBit Transceiver (LpGBT) ASIC [5] operating as a serializer at the module output and sending out data via the optical fiber. The CIC ASIC is mostly a digital chip, but it incorporates 12 phase aligners (analog blocks) in order to sample properly the incoming data streams. A custom System-Verilog model emulates the operation of the phase aligner. Each CIC can be configured for different output flavors. For instance, according to the supplied clock it can use a data output frequency of 320 or 640 MHz per line (this is valid both for stub data and L1 raw data). In addition, only for stub data, it can be configured via I$^2$C to use 5 or 6 lines in the output, and to transmit information related to the particle curvature in the CMS magnetic field, or instead to use the available bandwidth to provide a larger number of stub coordinates. This approach allows to optimize the system in terms of power budget and bandwidth requirements with respect to the module position within the CMS Outer Tracker.

Figure 1: PS module/2S module cross section.
2. UVM framework for PS module readout chain

An existing UVM framework has been adapted and extended to add the CIC ASIC to the MPA/SSA simulation environment [6] in order to complete the PS module readout chain. In Figure 2 a functional block diagram of the UVM framework for the PS module chain is shown. The Design Under Verification (DUV) is represented by the PS module ASICs composed of a configurable number of MPAs and SSAs (up to 16) plus 1 or 2 CICs. It is possible to run the simulation on the DUV RTL code to verify the implemented algorithm or on the DUV post-layout netlist with back-annotated delays. The latter case has a more realistic timing and allows to verify the ASICs operation in all the corners and to generate activity information for accurate power analysis taking into account all the parasitics. To verify the DUV an ideal behaviour of the readout chain is described (in System-Verilog language) in a block called Reference model. The Configuration block is able to perform $I^2C$ operations to the DUV and at the meantime allows to configure accordingly the Reference model that has to be developed carefully taking into account several operating modes. Fast command block follows the same principle and sends continuously the encoded fast command to all the ASICs and to the Reference model itself. Lastly, particle hit generation is needed to emulate particles hitting the two layers of the module in order to create stubs and clusters. These can be either from the Monte Carlo generation block, based on real physics data-sets containing event samples for the entire CMS Outer Tracker, or from the Stub generation block, which creates randomized hits that emulate high transverse momentum particles, representing the main primitives expected to be identified in the CMS Outer Tracker. Moreover, a third block called Combinatorial generation has been developed to create randomized hits with a uniform distribution to emulate detector noise. It can be added on top of the stub generation or be used separately. In the UVM environment these are considered to be the simulation stimuli. Each block composing the environment is described as a UVM Verification Component (UVC) and connected to the DUV via interfaces. The communication among different UVCs (green lines in Figure 2) is implemented

![Figure 2: PS module (MPA, SSA, CIC) readout chain.](image-url)
at TLM level in order to reduce the simulation run time. At the output of the DUVs, signals are parsed at run time and converted to data packets to be compared at a higher level of abstraction with predicted output from the Reference model. This comparison is carried out via another UVC called Scoreboard. There is a dedicated scoreboard for each ASIC. Thanks to this approach it is possible not only to find mismatches at run time between the DUV and the Reference model, but also to know if these mismatches are related to an ASIC hardware limitation (for instance, the ASIC reaching the bandwidth saturation or the transmitting FIFOs being full) or a bug in the RTL code. In the latter case the ASIC that is failing can be detected thanks to the display of output data from the Reference model and the DUV. Moreover, a report for each ASIC is created at the end of the simulation summarizing the total number of data packets processed from the DUV. In this way the single ASIC efficiency and the total readout chain efficiency for the particle recognition can be computed. The verification environment allows to perform clock-cycle accurate behavioral simulations. While this framework is stable, the user can change the above mentioned stimuli by applying different test cases chosen from the developed test library and create a list of them in the Vmanager Cadence tool that allows to keep track of the passed and failed tests. The most complete test case that exercises 8 SSA and 8 MPA ASICs and 1 CIC, with the maximum activity expected in the module, requires almost 3 GB of memory and a CPU time of 360 seconds to elaborate the design (RTL), other 2 GB and a CPU time of 125 seconds to elaborate the testbench, and around 2.4 GB and a CPU time of 110 seconds to simulate 1000 BXs.

3. DUV forecasted performances

In this section the benefits of using a UVM framework for such a big DUV will be described. Firstly, it is important to notice that the DUV is composed of many ASICs communicating among each other. A correct synchronization among all the ASICs is fundamental taking into account the possible delays of the hybrid signal distribution. The developed framework allows to detect synchronization issues and allows to send a synchronization request via a fast command signal to all the ASICs to recover synchronization. Another feature that has been implemented is the proper sampling at the input of the CIC. When it receives a data stream, it has to perform a proper sampling in the middle of the eye diagram to correctly read the information. This is solved in the CIC by using phase aligners surrounded by digital control circuitry. In order to align incoming data with the internal CIC clock an automatic procedure that requires specific patterns from the FE-chips has been set up and verified. Moreover, the CIC Stubs data path has to collect data over 8 BXs. Therefore, the CIC should identify exactly the first BX out of 8 BXs. A 8 BX data stream is called word. A digital block, namely the word alignment controller, has been developed to perform an automatic word alignment that needs specific training patterns from the FE-chips. Both phase alignment and word alignment procedures have been verified via the UVM framework.

4. CIC performance evaluation

The scoreboard, thanks to the UVM environment, allows to perform efficiency studies on the complete readout chain. For instance, applying two different test cases it is possible to simulate the CIC working at two different output frequencies. In Figure 3 efficiency results are reported for the PS module readout chain: it is noticeable from this plot that the MPA (blue line) has minimal losses, while a drop in CIC efficiency is expected at higher stub occupancy when the CIC output
Simulation environment for the CIC

Simone Scarfì

Figure 3: PS module efficiency at the MPA output and CIC output for two different output frequencies.

The frequency is set to 320 MHz. The drop seen in simulation is around 40% at the maximum expected stub occupancy. In the other test case doubling the CIC output bus frequency to 640 MHz, and consequently the bandwidth, these limitations are overcome and the readout chain works as expected with an efficiency close to 100% (green line). The green area represents the CMS Outer Tracker expected stub occupancy. Therefore, depending on the position of the PS module in the tracker, it will be needed to configure the CIC for the best trade-off between power budget and data bandwidth.

5. Conclusions

A system level simulation environment has been developed in order to simulate the entire PS module and 2S module readout chains to extract ASIC efficiencies for different test cases, perform Monte Carlo analysis, and to verify the final version of each ASIC in the complete readout chain. In the case of the 2S module, a behavioral model of the CBC ASIC was developed to enable the simulation of the readout chain and the verification of the CIC ASIC. The simulation environment was extensively used to verify interchip communication, alignment and synchronization.

References