The CMS Drift Tubes readout system has been upgraded during the 2017-2018 technical stop to a new MicroTCA-based system to deliver the performance required by the increase of LHC luminosity. It comprises 3 µTCA crates with up to 25 boards, each processing 3 sectors from each CMS wheel. The µROS board is built around a Virtex-7 FPGA, and is able to receive 72 input links. The 240-Mbps inputs are deserialized using oversampling and adaptive phase detection. Event building, synchronization, data integrity monitoring and error correction have been implemented. The uROS system is fully operational, taking collision data satisfactorily.
1. **Introduction**

The Compact Muon Solenoid (CMS) Drift Tubes (DT) system [2] is responsible for tracking and triggering muons in the CMS barrel. The electrical signal from 172200 wires is amplified, shaped and discriminated by the front-end boards to produce a digital pulse, whose leading-edge timestamp carries information on the trajectory of the muons traversing the detector. The pulse’s time of arrival with respect to the LHC clock is measured by CERN's HPTDC [1] (High-Performance Time-to-Digital Converter) chips in the Read-Out Boards (ROB). Each ROB processes the information from 128 tubes, performs window-matching after reception of the Level-1 Accept signal, and delivers this information over a 240 Mbps link.

Data goes then through a series of concentration electronics, ultimately reaching the DAQ (data acquisition system) for storage and later analysis. Originally, this chain was composed by 60 Read-Out Server (ROS) boards, each receiving the information from the 25 ROBs from one detector sector (there are 12 sectors in each of the 5 barrel wheels), and 5 Detector-Dependent Units (DDU), each concentrating the information generated in one wheel [3].

Studies showed that the ROS board has inefficiencies both from the increase of luminosity and the way the handling of the input links was done. Therefore, an upgrade was necessary.

2. **System architecture**

The architecture adopted in CMS for all Phase-1 upgrades consists of a µTCA (micro Telecommunications Computing Architecture) crate for up to 12 AMC-compatible boards, in which the redundant MCH (MicroTCA Carrier Hub) slot is populated by a custom-design board called AMC13 [4]. The AMC13 interfaces to the TCDS (Timing and Control Distribution System), receiving timing and trigger information and distributing it to the AMC (Advanced Mezzanine Card) slots, and reporting the system status to the trigger throttling system. It also acts as a data concentrator, receiving the event data produced by each of the AMC boards and building a crate-level event which is then delivered to the DAQ. The MCH in the main slot contains an Ethernet switch, connecting each AMC to the CMS network. IP communication between the control hosts and the AMC slots and AMC13 is based in the IPbus protocol [5].

![Figure 1: (a) TM7 board with µROS configuration for the 3-sector processor, with the 6 MTP receivers (72 input), and the gigabit minipod sockets unpopulated. (b) The positive wheels’ crate with 10 µROS boards, the MCH on the left and AMC13 on the right.](image-url)
The boards that populate the AMC slots are subsystem-specific: the TM7 for DT system trigger (TwinMux [6]) and readout (µROS) upgrades. It is a single-slot, double-width and full-height AMC, designed around a Xilinx Virtex-7 FPGA. It includes optical transceivers for up to 72 low-speed inputs, and 12 high-speed bidirectional communication links up to 13 Gbps. The function of the TM7 board is determined by the firmware and the optical transceivers present.

The µROS system has replaced both ROS and DDU systems. Each wheel’s data is processed by 5 µROS boards, 4 of them receiving 3 sectors each (24 channels per sector, 72 links), with the 5th receiving the 25th channel for each of the 12 sectors. The production system comprises three µTCA crates (central, positive and negative wheels) and 25 µROS boards.

2.1 Hardware installation

During the 2017 data taking, the so-called “slice crate” was installed. The slice crate receives the optically-split signals from a slice of the detector. It was used for the µROS system development, and currently is being used for development of Phase-2 upgrades.

The installation and commissioning of the new system was carried out over the 2017-2018 YETS (year-end technical stop). Before this, the ROS/DDU had been moved from the experimental cavern to the service cavern as part of the Sector Collector Relocation project [7].

3. Firmware

The firmware for shared TM7 functionality (IPbus, TTC, AMC13 interface, flash reprogramming) of µROS is inherited from TwinMux. This section highlights the most relevant aspects of the µROS-specific part and its comparison to the ROS/DDU system.

Special care has been placed in designing a firmware for data deserialization capable of recovering the input data stream with high quality and minimal data losses. For improved performance, the main logic of deserialization and bit alignment was implemented in the main FPGA fabric. It adapts automatically without data loss to phase shifts and can also perform asynchronous reception and some transmission error correction.

The incoming signal is oversampled by a factor 5. The ISERDES output is continuously monitored to detect bit transitions, so that samples can be grouped together, even during phase shifts (e.g. variations in LHC clock frequency due to increase of energy of the beams). The samples corresponding to a bit are merged using a majority criterion. The bits whose value is decided by a weak majority (3 vs 2) are marked as transmission error candidates. A gearbox prevents missing or duplicated bits during edge shifts. If the data frame’s parity check fails, the logic corrects the data if exactly one bit is marked as a transmission error candidate.

The FW implements full verification of the ROB protocol, and offers detailed statistics of every failure cause, having become a useful tool for diagnosing fault conditions in the ROB.

While the legacy system masked channels on transmission errors, only recovering after a resync, the event builder in the µROS is able to properly recover from all types of errors in data reception automatically when the condition disappears.

4. Online Software
The online SW applies CMS state transitions into the system and gathers useful monitoring information. Communication with the HW in the CMS µTCA architecture is done via Ethernet network, using the IPbus protocol. This makes the µROS system more robust with respect to the previous system, in which a failure in the hardware-connected hosts required physical intervention at the service cavern.

The monitoring information is rendered in a way that is useful for both quick-glance evaluation of the system status and deep expert analysis of failures. The main screen presents general run information, as well as some metrics that allow diagnosing some error conditions. Data is graphically presented and color-coded to quickly identify the boards in the system as well as the status of each FW module. Additionally, the status of each of the 1500 ROB links is given in several color mappings (link status, link occupancy, ratio of lost hits, both for recent performance and whole run information). By clicking in each FW block or ROB link, all the information available, including monitoring register values, is displayed below the diagrams.

The status information is updated every few seconds and stored to the database every 2 minutes. All the historical information can be navigated and visualized from the same interface.

Figure 2: the web monitoring interface. On the black bar, there’s an indicator of the data refresh age and access to historical data. On the pale blue boxes, general run information and main system metrics. On the main area, a representation of the 3 crates with their boards and status of their main functional blocks. The pink diagram represents the relative hit occupancy in each of the 1500 ROB links. On the bottom, the full monitoring information.

5. **Performance**

During the 2017 data taking, the slice crate was used for development. It was intermittently included in the global runs, and it underwent several performance tests under different high load scenarios (DAQ stress test): high trigger rate, high backpressure, etc.

The system has been running very stably with no data losses during 2018 proton-proton collisions, with no significant problems. The detector inactive fraction for the DT system has decreased in 2018 to 1/3 of its 2017 value (from 1.5% to 0.55%), due, mainly, to the automatic recovery of lost links performed, as well as minicrate repairs carried out over the YETS.
Figure 3: Detector Active Fraction for the different CMS subsystems from 2017 to 2018 (blue).

6. Conclusions

The second level of the readout electronics for the DT subdetector at CMS had to be upgraded to maintain its performance under the increasing LHC luminosity. The new system, µROS, has been successfully installed and commissioned during the 2017-2018 YETS.

The µROS board is able to receive 72 ROB links, and features an adaptative deserializer able to lock to an asynchronous signal. It also performs a complete ROB protocol analysis and reports detailed statistics, providing useful information for debugging. The online software interface was designed to convey maximum amount of information on the system in one glance, by color-mapping the most useful information into a table of indicators. It also allows easy navigation and inspection of the full monitoring registers information, and of the historical data.

The system has been running very stably with no data losses during 2018 proton-proton collisions, with no significant problems.

References


