



# **ATLAS FTK: The Fast Tracker**

# T. lizawa for ATLAS FTK Group\*

Waseda Reserach Institute for Science and Engineering, Waseda University, Tokyo, Japan E-mail: Tomoya.lizawa@cern.ch

The Fast TracKer (FTK) will perform global track reconstruction after each Level-1 trigger accept to enable for the software-based High Level Trigger (HLT) to have early access to tracking information. FTK is a dedicated system based on a mixture of advanced technologies. Modern, powerful Field Programmable Gate Arrays (FPGAs) form an important part of the system architecture, and the large level of computing power required for pattern recognition is provided by incorporating standard-cell ASICs named Associative Memory (AM).

FTK provides global track reconstruction in the full inner silicon detector in approximately 100 microseconds with resolution comparable to the offline algorithms. The track and vertex information is then used by the HLT algorithms, allowing highly improved trigger performance for important signatures such as b-jets.

In this paper, the architecture and the hardware development status of FTK system is given, along with the development of simulation.

The 23rd International Workshop on Vertex Detectors, 15-19 September 2014 Macha Lake, The Czech Republic

#### \*Speaker.

#### 1. Introduction

After a very successful data taking run, the ATLAS experiment [1] is being upgraded to cope with the higher luminosity and higher center-of-mass energy that the Large Hadron Collider (LHC) will provide in the next years. The higher instantaneous luminosity expected at the LHC Run2 will pose challenges for the trigger system. The existing ATLAS trigger system, consisting of a hardware-based Level-1 trigger and a CPU-based High Level Trigger (HLT), was designed to work well at the LHC design luminosity,  $10^{34} cm^{-2} s^{-1}$ . However after the planned luminosity upgrade, the detector environment will be complicated by the increase in detector activity arising from many simultaneous interactions. Additionally, the proposed upgrades to the Level-1 trigger will allow an increased rate into the HLT. Because of its fine resolution, tracking information is critical for distinguishing which events triggered by the Level-1 should be kept for further processing.

We propose to build a system of electronics, the Fast TracKer (FTK) [2], which will do global track reconstruction after each Level-1 trigger to enable for the HLT to have early access to tracking information. Figure 1 shows the functional overview of the FTK system. FTK will use data from the pixel and semiconductor tracker (SCT) detectors as well as the new Insertable B-Layer (IBL) pixel detector [3]. FTK will move track reconstruction into a hardware system with massively parallel processing that produces global track reconstruction with good resolution just after the start of HLT processing.



Figure 1: FTK functional overview

# 2. The FTK architecture

To deal with the large input rate as well as the large number of hit combinatorics at high luminosity, FTK is highly parallel, the system is segmented into  $64 \eta - \phi$  towers, each with its own pattern recognition hardware and track fitters. FTK uses 12 logical detector layers (4 pixel including IBL and 8 SCT layers) over the full rapidity range covered by the barrel and the disks. It operates in two stages. In the first stage, 8 of the 12 silicon layers are used to perform pattern recognition and do the initial track fitting. In the second stage, track fitting is performed with full 12 silicon layers.

More details on each subsystem are following.

#### 2.1 FTK Input Mezzanine

#### 2.1.1 System description

The functions of the FTK Input Mezzanine (FTK\_IM) are to receive the pixel and SCT data from the Inner Detector ReadOut Drivers (RODs), to perform clustering, and then to forward data to the Data Formatter main board. Clustering will be performed on both pixel and SCT data with the dual purpose of reducing the amount of data to be processed by the rest of FTK and improving the spatial resolution by determining the cluster center to improve spatial resolution. The detail of the pixel clustering is described in [4].

Each FTK\_IM will receive up to 4 S-LINK fibers from Inner Detector RODs. There are 2 FPGAs on board each receiving data from 2 S-LINK channels. The four data streams received by the FTK\_IM will be processed independently and sent over independent channels to the Data Formatter. In this way, event synchronization will be performed on the Data Formatter board.

#### 2.1.2 Hardware development and test status

5 prototype boards of production version are ready and being tested. It is confirmed that FTK\_IM can be linked-up with Inner Detector RODs and receive data. The clustering firmware is implemeted for both SCT and Pixel. Several monitoring registers are defined and in use, and will be added as necessary. The Double Data Rate (DDR) data transferring between Data Formatter is running at 200MHz. The dataflow works stably at 100kHz event rate with single-channel input. The dataflow with multi-channel inputs is being tested.

#### 2.2 Data Formatter

#### 2.2.1 System description

As noted previously, FTK is organized as a set of parallel processor units within an array of 64  $\eta$  -  $\phi$  towers. To avoid inefficiency at tower boundaries, the towers must overlap because of the finite size of the beam's luminous region in z and the finite curvature of charged particles in the magnetic field.

The Data Formatter (DF) system receives the hits from the FTK\_IM, remaps the ATLAS Inner Detector geometry to match the FTK  $\eta - \phi$  tower structure, performs data switching in overlap regions, and delivers the hits to the Processor Units. 32 Data Formatter boards will be used to handle 64 FTK  $\eta - \phi$  towers. Based on the design requirements, a system based on a Advanced Telecommunications Computing Architecture (ATCA) technology with the full-mesh backplane interconnect is found to be a natural solution for the Data Formatter design. Each board is connected to every other with multiple point-to-point links in the full-mesh ATCA backplane. The Rear Transition Module (RTM) is used to send the data to downstream of FTK as well as to perform inter-crate data switching. Figure 2 shows the DF implemented with IM (left), and ATCA full-mesh (right).

## 2.2.2 Hardware development and test status

A prototype board is being tested in ATCA shelf in CERN test lab. Tests for 8 boards in a crate is ongoing at FNAL. Initial switching firmware is ready and being tested. Interface for AUXiliary Card with 6.4Gb/s rate is established at required speed.



**Figure 2:** The Data Formatter board with 4 FTK Input Mezzanine (left). A graphical depiction of the 32 boards (in green) and high speed interconnect lines in four crate system. Blue lines represent backplane data paths. Orange lines represent inter-crate fiber links (right).

#### 2.3 AUXiliary board

#### 2.3.1 System description

The AUXiliary board (AUX) receives hits from the Data Formatters for the 8 silicon layers used in the first stage of track reconstruction. AUX has 3 main functionalities which are called Data Organizer (DO), Track Fitter (TF), Hit Warrior (HW). It stores the hits in the DO which sends the hits to the Associative Memory Board (AMB) with coarser resolution appropriate to pattern recognition (Super-Strip, SS). When the AMB finds a matched pattern, called a "road", with hits on at least 7 of the 8 layers, the road number is sent to the AUX, then Data Organizer retrieves all of the hits in the road. The hits, the road number, and the sector <sup>1</sup> number are transferred to the TF.

Next, track fitting is performed. Instead of an actual fit,  $\chi^2$  component is estimated from the linear calculation. The calculation is a set of scalar products of the hit coordinates and pre-calculated constants that take into account the detector geometry and alignment.

The tracks forwarded from the Track Fitter go to the HW function for duplicate track removal. If two tracks in the same road share more than a certain number of hits, only the higher quality track is kept. The quality is defined based on  $\chi^2$  and the number of hits in pixel and SCT layers. Tracks exiting the HW are forwarded to the Second Stage Board where hits on the other 4 detector layers are added.

#### 2.3.2 Hardware development and test status

A prototype board is ready and being tested on VME crate at CERN test lab. Figure 3 shows the picture of AUX (left) and functional sketch of the AUX (right). DO and TF functionality are implemented in the firmware, and combined test is underway. HW functionality implementation is ongoing. Interface for the Associative Memory board is being tested.

<sup>&</sup>lt;sup>1</sup>"sector" is a unit used in FTK, which consists of a silicon module in each layer, typically a few centimeters wide.



Figure 3: AUX board (left) and functional diagram showing the data flow on the board (right).

# 2.4 Associative Memory

### 2.4.1 System description

The Associative Memory (AM) system carries out pattern recognition at the high silicon detector readout rate by comparing hits at reduced resolution with a very large number of roads nearly simultaneously. The AM system consists of the Associative Memory chip (AMchip), an ASIC designed and optimized for this particular application, and a VME board, Associative Memory Board (AMB), on which are mounted local associative memory boards (LAMB). Figure 4 shows the AMBoard (left) and LAMB (right).





**Figure 4:** AM Board (left) and LAMB (right). AMchips are mounted on LAMB, and LAMB are mounted on AM Board. The size of LAMB is compatible with orange square shown in left picture, so 4 LAMBs can be implemented on 1 AM Board.

The parameters of the 3 versions of AMchips are shown in Table 1. The AMchip04 uses the parallel input link, which is replaced with serial input link. The AMchip05 has almost same functionality with the AMchip06, except for the smaller pattern bank. The AMchip06 maximizes pattern density, minimizes power consumption and improves functionality with respect to previous versions. A full custom cell is the most important design change for the AMchip06. It includes all the hardware necessary for the elementary functions of a single pattern layer. The full custom cell also offers the opportunity to implement important new strategies to reduce the power consumption of the chip. This is a crucial issue because the pattern density growth will eventually be limited by power consumption.

|          | Technology | Area               | Patterns | Detector<br>Layer | Size       | MHz | I/O      |
|----------|------------|--------------------|----------|-------------------|------------|-----|----------|
| AMchip04 | 65nm       | 14mm <sup>2</sup>  | 8k       | 8 (18b/layer)     | 1.2Mb/chip | 100 | Parallel |
| AMchip05 | 65nm       | 12mm <sup>2</sup>  | Зk       | 8 (18b/layer)     | 440kb/chip | 100 | SerDes   |
| AMchip06 | 65nm       | 160mm <sup>2</sup> | 128k     | 8 (18b/layer)     | 19Mb/chip  | 100 | SerDes   |

Table 1: Parameters for 3 versions of AMchip

#### 2.4.2 Hardware development and test status

A prototype board of AMB is running in the VME in CERN test lab. The links LVDS is working at 2Gb/s with a Bit Error Rate (BER) satisfying the ATLAS requirement. AMchips development: AMchip04 was fully tested at 100MHz. The AMchip05 is being tested for the validation of the AMchip06 design. The firmware can be downloaded via JTAG connector and the input/output serial link is working. The AMchip06 draft layout is ready.

# 2.5 Second Stage board

# 2.5.1 System description

In the second stage, tracks from the first stage are received from the AUX card and combined with hits from Data Formatters for the remaining layers. The second stage is needed to reduce fake tracks. It also improves helix parameter resolution since it performs fits using all 12 silicon layers. Each Second-Stage Board (SSB) receives the output through RTM from 4 AUX cards and the hits on the additional layers from the Data Formatter board for the 2  $\eta$  -  $\phi$  towers associated with those AUX cards. After the track fitting, the SSBs perform duplicate track removal based on the number of shared hits between tracks. SSBs also share track data with other SSBs for  $\eta$  -  $\phi$  overlap removal. The SSBs send FTK data within a core crate for output to the FTK to LVL2 Interface Crate (FLIC) via two fiber-optic connections on the RTM.



**Figure 5:** Prototype of Second Stage Board (left) and block diagram showing the functional layout of the SSB (right).

#### 2.5.2 Hardware development and test status

A prototype board has just been made ready and running in the VME crate at the CERN test lab. Figure 5 shows the SSB (left) and the functional sketch (right). The communication for the FLIC is being tested and established at 3Gb/s.

# 2.6 FTK to LVL2 Interface Crate

## 2.6.1 System description

After the SSB processing, the data is sent to the FTK to LVL2 Interface Crate (FLIC) from the SSBs. Figure 6 shows the FLIC (left) and its design concept (right). Each core crate has two data links to the FLIC. The FLIC receives the two data streams from each core crate, translate it into ATLAS ROD format, and then sends the data directly to the ReadOut Systems (ROS). The HLT requests these fragments individually and assembles them as necessary.



Figure 6: Prototype of FLIC board (left) and design concept of the FLIC (right)

# 2.6.2 Hardware development and test status

A prototype board is running in the ATCA shelf at the CERN test lab. The connection between downstream ROS is established and FLIC can send data at stably 50kHz event rate. Improvements are ongoing to establish stable dataflow at designed 100kHz event rate.

# 3. FTK Simulation

To evaluate FTK performance and test its algorithms, a software emulation of FTK is developed. It is challenging since it has to emulate a massively parallel hardware system. It is a functional emulation of the hardware, which reproduces the logic of each stage of FTK processing in detail and has been invaluable in designing the system and maximizing its capabilities. In this paper, the detail of timing simulation is described as an example.

# 3.1 Timing Simulation

A timing simulation was developed for tuning the system architecture and parameters and to ensure that FTK can handle a 100 kHz Level-1 trigger rate at high luminosity. For each functional block, the time of the first and last words into and out of the block are calculated. Since each core crate operates independently, FTK event execution time ends when the last word exits the busiest crate for that event. The execution time for a block depends on the number of input words, the processing time per word, and the number of output words. The processing time is estimated per word for each block type from the architecture and our experience with prototypes.

Figure 7 shows the results of timing simulation. If the input rate were too large for our system, the event latency would increase as FTK falls behind, working on a stack of previous events before getting to the current one. This does not happen as seen in right pane of Figure 7. Some events take longer than others to do global tracking, but the latency quickly returns to the typical range.



**Figure 7:** FTK latency for the  $Z \rightarrow \mu\mu$  event at 69 pile-up. The left plot is an example of 1 event. the timing of the functional blocks is given for the core crate (region) that takes the most time. The time for each of the 64 regions is shown below that, with the total latency shown in the bottom bar. The right plot is an event-by-event latency for 1000 events. For each event, the execution time starts when the event is available (10  $\mu$ s after the previous event, corresponding to a 100kHz Level-1 trigger rate) and ends when the FTK has completed analyzing that event.

# 4. Conclusion

FTK provides all track information for events accepted by Level-1 trigger, which allows more processing capacity for HLT. Prototype boards are ready and integration test is ongoing. FTK simulation is being developed to validate hardware performance and physics impact.

A part of FTK will be installed for the barrel region ( $|\eta| < 1.0$ ) at late 2015, full coverage ( $|\eta| < 2.5$ ) will be established in 2016.

#### References

- ATLAS Collaboration, The ATLAS Experiment at the CERN Large Hadron Collider, 2008 JINST 3 S08003.
- [2] ATLAS Collaboration, Fast TracKer Technical Design Report, https://cds.cern.ch/record/1552953/files/ATLAS-TDR-021.pdf, 2013.
- [3] ATLAS Collaboration, ATLAS Insertable B-Layer Technical Design Report, https://cds.cern.ch/record/1451888/files/ATLAS-TDR-019-ADD-1.pdf, 2012.
- [4] C.-L. Sotiropoulou, A. Annovi, M. Beretta, P. Luciano, S. Nikolaidis and G. Volpi, A multi-core FPGA-based clustering algorithm for real-time image processing, Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC), 2013 IEEE, pp.1,5, Oct. 27 2013-Nov. 2 2013.