A scalable gigabit data acquisition system for calorimeters for linear collider

F. GASTALDI¹, R. CORNAT, F. MAGNIETTE, V. BOUDRY

Laboratoire Leprince-Ringuet - CNRS
École Polytechnique, 91128 Palaiseau, FRANCE

E-mail: gastaldi@llr.in2p3.fr, cornat@in2p3.fr,
frederic.magniette@llr.in2p3.fr, Vincent.Boudry@in2p3.fr

Abstract: This article presents the scalable Data Acquisition (DAQ) system that has been designed for prototypes of ultra-granular calorimeters for the International Linear Collider (ILC). Our design is generic enough to cope with other applications with some minor adaptations. The DAQ is made up of four different modules, including an optional concentrator. A Detector InterFace (DIF) is placed at one end of the detector elements (SLAB) holding up to 160 ASICs. It is connected by a single HDMI cable which is used to transmit both slow-control and readout data over a serial link 8b/10b encoded characters at 50 Mb/s to the Gigabit Concentrator Card (GDCC). One GDCC controls up to 7 DIFs, distributes the system clock and ASICs configuration, and collects data from them. Each DIF’s data packet is encapsulated in Ethernet format and sent out via an optical or copper link. The Data Concentrator Card (DCC) is a multiplexer (1 to 8) that can be optionally inserted between the GDCC and the DIFs, increasing the number of managed ASICs by the GDCC. Using a single GDCC and 7 DCCs allows a single PC to control and readout up to 8960 ASICs (~ 500000 channels). The fourth card is the Clock and Control Card (CCC) that provides a clock and control fan-out to up to 8 GDCCs and therefore to the entire system. A software suite (named Calicoes) written in C and Python manages the overall system. This system have been used for several tests on the Silicon-Tungsten Electromagnetic Calorimeters (SiW-ECAL) prototype detector (1800 channels). The full design and test results will be detailed.

Technology and Instrumentation in Particle Physics 2014
2-6 June, 2014
Amsterdam, the Netherlands

¹Speaker
1. Introduction

Calorimeters planned for a future electron-positron linear collider [1] will be highly granular with cell sizes as low as 5×5 mm$^2$ [2]. This fine granularity results in calorimeter with about 100 million cells which need to be read out. In order to handle such flow of data, we need to conceive a high performance DAQ, which is able to scale to the dimension of detector.

2. ILD Calorimetry

The ILD, one of the detector concept for the ILC, features both a large tracker and the highly granular and compact calorimeters inside a magnetic coil to minimize dead material [3]. To optimise both cost and physics performances, the calorimeters must be as thin (and dense) as possible. Figure 1 shows a possible geometry of the calorimeter system.

![Figure 1: The octagonal geometry of the barrel of ILD detector with (from inner to outer) the ECAL, the HCAL and the coil. The inner radius of the ECAL barrel is typically 1.8m for a length of about 4 meters.](image)

The density is achieved by having the active sensors and associated readout electronics as thin as possible. The readout of such a huge amount channels (~100 million) requires the readout electronics to be embedded in the detector in the form of ASIC handling the amplification, auto-trigger, zero-suppression, storage and delayed readout over serial links. For the SiW-ECAL[4] designed at the LLR, the SKIROC2 ASIC [5] developed by the OMEGA group is used. For ILD, the SiW-ECAL will typically feature between 20 and 30 instrumented layers, interleaved with 24 radiation lengths (3.5m) of Tungsten.

The structure of the calorimeters has been designed to be highly modular for easier industrialization and handling; the instrumented layers are placed in drawers, dubbed SLAB’s, powered, cooled and read-out at the single end. Each instrumented layers is made of up-to 10 interconnected sub-unit holding between 16 and 48 ASIC’s each.

3. Data acquisition system

A block diagram of the overall structure of the DAQ is shown in Figure 2. This shows the modular structure of the system and the different levels of concentration as data passes from the SLAB’s to the PC. Each component is discussed briefly below to give an overall picture of
the DAQ system, with detailed functionality covered in next sections. The system is bi-directional so although the description below is for the transport of data collected by the detector and sent to the PC, it also applies to control data being sent in the reverse direction from a control PC to the detector units.

![Figure 2: Overview of the DAQ system](image)

At the end of the calorimeter layer, the DIF card aggregates the data from the SLAB’s. As the rest of the system after the DIF is calorimeter independent, the data are converted into a common format by unifying certain aspects of the firmware.

A DIF is connected to data concentrator card which is essentially like a hub and can be optional in DAQ chain. A DCC can take up to 8 DIFs as input and passing the data on one link to the next stage of the DAQ. The link between the DIFs and DCC is a HDMI cable of around 5 meters in length. As HDMI is a commercial standard for consumer electronics, high-bandwidth (our case : 50 MHz) connections are achieved at low cost. The following stage is the GDCC that can be seen like a switch which encapsulate or uncapulate data passed along an optical fiber or copper cable via gigabit Ethernet to a network card that is a commercial card housed in a PC. A GDCC can communicate until 7 DIFs or DCCs.

To ensure the electronics captures data from actual events, all GDCC, DCC, and hence all components on the detector are synchronized with a clock distributed by the CCC. The value of 50 MHz was chosen as it can also be used for the data links as it is sufficient for the bandwidth required in ILC.

4. Functionality of each cards

4.1 The DIF

At front-end level, close to the detector, the DIF connects the detector modules to the control and DAQ system. While the shape of the DIF card can be adapted according to the constraint of the integration of each sub-detector, the firmware can be common to the readout ASICs. The DIF is connected to the DAQ and control system using a customized 8b/10b serial link. All functions are embedded in the same cable and same protocol: fast control, slow control (configuration) and data readout. This link is synchronized using the beam clock of 50 MHz. For compatibility with test beam environment two other signals are distributed isochronously: an external trigger and a detector busy signal. Stringent height constraints have lead to choose HDMI standard as it can provide 5 differential pairs and some power connections within a...
rather small cross section. The same transceiver blocks (MAC layer) are used in every components of the system (firmware). The DIF prototypes (Figure 3) are based on low cost FPGA (XILINX spartan3 1000) and the room allocated to DIF in the current design is quite small, less than a credit card for connectors, power storage, regulator and buffers. A microchip, which could be shared with other detectors based on similar very front end chips, would be appreciated in order to integrate a simple SER/DES function, buffers and power management features. This chip could be power pulsed a therefore could contribute to both power and room saving. The DIF architecture (Figure 4) is versatile and extremely modular in order to make easier any update of functionality. A specific internal bus is used to interconnect all the functions. It has a similar but lightweight architecture of a system on chip design with well separated communication ports, storage, structure, peripherals (detector bus, external memory, ...) and a central supervisor (packet analysis, chip management, resource sharing, command handling). A DIF board is expected to handle several thousands of detector channels. It allows reading and to write configuration memories, to send control orders and to receive acknowledges and finally to read physics data and sent them through the DAQ system.

4.2 The GDCC

The GDCC card is an intermediate card in the DAQ system that allows multiple DIFs to be connected to a single PC. The link to the PC is expected to be 1 Gb/s. The links to the DCCs or the DIFs are expected to be 50 Mb/s. The board (Figure 5) is designed on 6U VME format, and shared in 2 pieces, the main and the mezzanine (Figure 6). This choice of VME size has been done to supply the power via a standard chassis used in particle physics environment.
The main card essentially contain two components that are the “heart” of the GDCC. The first is the XILINX FPGA spartan 6 XC6slx75 and the other is the MARVELL 88E1111. The mezzanine is equipped with the connectors for the DIFs connection. We have decided to put these components on mezzanine card in order to follow the evolution of future connection if needed. The Marvell 88E1111 is the transceiver component with the lowest power consumption (0.75 w). It performs all the physical layer functions for half and full-duplex 1000 BASE-T Ethernet on CAT 5 twisted pair cable.

On FPGA part, the firmware is built on XILINX reference designs and few parts on existing work done in the collaboration group. The bulk of Gemac interface is made up of VHDL blocks and FIFOs associated with that work. It encapsulates the sending and receiving raw Ethernet frames from the PHY layer. Its main purpose is to deal with the connection's speed auto-negotiation, preamble bits, and trailer check-sums.

The block “control unit” consists in a main state machine managing others. The master state machine will extract each field of the Ethernet frame and depending on the command sent will activate the corresponding state machine. Two kind of state machines can be activated. Either it is the state that manages the registers of the GDCC (version, link status, value of the MAC address, ...). Either it is a command to read or write register on the DIF, or send configuration to readout ASICs. The master state machine also manages data encapsulation to start building the Ethernet frame. As each packet from DIFs has a size of 1024 bytes, a token system like a circular round robin sets up each packet in the Ethernet frame before to be sent to the GEMAC block.

For the interface part with the DIFs, a custom protocol was implemented. For this, a 8b/10b coding is used for transmitting and receiving the DIFs data. The protocol is based on the use of K control characters to define the SOF and EOF packets or be specific to a particular command as reset and a state machine is used to lock each link between GDCC and DIFs.

The ser/des functionality comes mainly from Xilinx application notes [6], [7]. It runs with two clocks. The first is the bit clock, and the second is a 90 degree shift version of the bit clock used in phase recovery module. This module allows ignoring the incoming phase difference in the data. This is required as although the transmission is synchronous with the clock sent to the DIF, but no back clock is received. Thereby, we assume that bit clock used to send the data back from the DIF will be the same one we sent with the TX data. We also assume all logic in ser/des region is running at the same rate as the bit clock.

On the mezzanine part, two fan-out components are implanted. The first is the clock distribution from the CCC card. This clock is forwarded to each HDMI connector and to the FPGA. This is identical with the second component but for the trigger signal.

Currently, we improve our firmware by adding an UDP bloc in association with the MAC bloc. This will simplify the software interface with a communication based on a simple and fast protocol and a easy hardware description.

4.3 The DCC

The DCC card (Figure 7) allows to increase the number of DIF connections to a GDCC. It can multiplex up to 8 DIFs on a single HDMI cable. The data rate on upstream is identical to downstream, i.e. 50 Mb/s. The main specification for this card is to be connected or unconnected without changing the behavior on DAQ chain. To keep these compatibilities, the VHDL source code corresponding at the blocs interface from the GDCC and DIF have been reused (Figure 8). The card consists in a single economic Xilinx Spartan 3 FPGA with a minimum external components. The card is a 6U VME format, it may be connected in VME
chassis to power it. The signals and connectors are also compatible with the DIF and GDCC are based on HDMI.

4.4 The CCC

This card is a custom-made card that has been designed by the UK collaboration and take into account the needs of the different calorimeters to which it provides signals. It shown on Figure 9 and has dimensions 234 x 220 mm². The card contains various connectors with 8 HDMI connectors the main fan-outs to 8 GDCCs.

The main functionalities of the CCC is:

- Fanning out of common machine clock to all detectors
- Fanning out standalone clock
- Fanning out triggers
- Receive busy command if needed and depending of the calorimeter connected

5. The DAQ software

The software suite named Calicoes is based on the Pyrame framework, developed at LLR. It is composed of three modules: a multimedia high performance acquisition chain, a set of control-commands for all the hardware components of the detector (GDCCs, DIFs, Skiroc Chips...) and a centralized configuration system to dispatch all the parameters to the hardware through the modules.

The acquisition chain (Figure 10) receives real-time data from multiple media at the same time. For convenience, a system of plug-ins allow choosing the used medias (actually
implemented : Raw Ethernet, TCP server or client, UDP or USB). All this data go through a work-flow to be verified, uncapped and dispatched in files, TCP sockets or shared memory to the converting programs. On the top-level, an event-builder assemble the data together to produce various data formats through a plug-in system (ASCII, LCIO...).

The control-commands (Figure 11) of all the hardware components are programmed in Python and allow programming any registers of the electronics. On the top of it, a high level abstraction layer allow managing the whole system with a single state machine. This high level system can be controlled via a GUI or via a scripting system for complicated and automatic data taking (for example calibration scripts). It can also be interfaced with a SCADA like Tango, XDAQ or OPC-UA or any system in the following languages : C/C++, Python, R and also Labview.

Naturally, such a detector is a pretty complicated system. In order to configure it easily, a centralized configuration program allows configuring all the hardware and software with a single XML file describing all the parameters of the system. In order to reduce the size and the complexity of such a file, implicit declaration and default values mechanisms are implemented.

This software suite has been designed to be stable and reliable (working for weeks without error). The acquisition chain has been optimized for high performance. The control-command has been designed to be naturally distributed and flexible.

6. Results

This DAQ has been used on SiW-Ecal technical prototype for beam test at DESY. The setup of this testbeam was composed of 10 layers of detection with their own DIF connected to 2 GDCCs. For this setup, the DCC has not been used. This configuration has generated around 250 GB of data, confirming the ability of the DAQ to take a lot of data. During this beam test, the system has been validated for 10 Hz spill frequency (for reminder ILC requirement is 5 Hz). We performed a channel by channel calibration, injecting 120000 configurations in the system. It has been totally stable during all this time.
7. Conclusion

Our DAQ has worked finely during the testbeam. Our goal was to develop a generic DAQ system using commercial components. This goal has been mainly reached. In the context of the International Linear Collider Detector, the overall of cards should be certainly optimised for fitting in the limited space and power consumption. For our case on ECAL detector with 100 million of channels, the DAQ will require 12500 DCC, 2000 GDCC and 200 acquisition PC. To reduce these numbers, the main work must be done on front end modules for easiness of integration. As we developed our architectures on reusable functionality and being arranged at different levels of the DAQ, our system can evolve easily and could have the advantage of being used in another HEP project.

Acknowledgements

This project has been funded by the French Agency for research (ANR) under the name CALIIMAX-HEP (ANR-10-BLAN-0429). This work follows an initial R&D done at University College London, Manchester and Cambridge University that continued at LLR (Ecole polytechnique / IN2P3-CNRS). We also would like to thank DESY for the beam tests.

References


[6] N. Sawyer, Data Recovery, XILINX Application note, Xapp224

[7] N. Sawyer, Data to Clock Phase Alignment, XILINX Application note, Xapp225