WIDEFIELD SCIENCE AND TECHNOLOGY FOR THE SKA SKADS CONFERENCE 2009 S.A. Torchinsky, A. van Ardenne, T. van den Brink-Havinga, A.J.J. van Es, A.J. Faulkner (eds.) 4-6 November 2009, Château de Limelette, Belgium # **BEST** back end S. Montebugnoli<sup>1</sup>, M. Bartolini<sup>1</sup>, G. Bianchi<sup>1</sup>, G. Naldi<sup>2</sup>, J. Manley<sup>3</sup>, and A. Parsons<sup>4</sup> - <sup>1</sup> Istituto nazionale di astrofisica, Istituto di radioastronomia s.montebugnoli@ira.inaf.it mbartolini@med.ira.inaf.it g.bianchi@ira.inaf.it - <sup>2</sup> University of Bologna gnaldi@med.ira.inaf.it - <sup>3</sup> University of Capetown j.manley@hotmail.com - <sup>4</sup> University of California, Berkeley aparsons@astron.berkeley.edu **Abstract.** A modular programmable data processing system, developed by the CASPER group of the University of Berkeley, has been adopted for BEST demonstrator. This system is composed of a 1.2 GS/sec A/D converter for each receiver connected to a serializer board (Ibob) that can host up to 4 A/D boards. At present the configuration of the Bee2 FPGA cluster, offers approximately 500 Gop/sec. This provides enough computation power to implement a full 32 stations correlator. A beamformer for the BEST pathfinder is just under development. #### 1. Introduction Developing a digital backend for the BEST demonstrator has been a technological challenge which has posed several questions since the very early stage of the project. In this paper we outline the main issues, the choices they led us to and the development done in Medicina to implement a correlator and a beamforming application for BEST. First of all we consider the performance requirements to implement real time correlation and beamforming on the BEST array; based on this consideration we've chosen the technology on top of which implement the backend. Another important factor is the development time required and the reusability of both software/firmware and hardware developed or acquired for BEST backend. The second part of the paper is then dedicated to the explanation of the application design for both correlation and beamforming. # 2. Requirements The BEST array is composed of 32 single polarization receivers, acquiring a signal bandwidth of 16 MHz centered at 408 MHz; signal is then downconverted to the IF frequency of 30 MHz. As seen on Parsons et al. (2008) correlation and beamforming imaging systems computational demand depends on $$BMN^2$$ (1 where B represents the signal bandwidth, M is the number of independent beams and N is the number of antennas. In the BEST case, for the FX correlation application, the first project we developed, we had B = 16MHz, $N^2 \cong baselines = \frac{32*31}{2}$ , and we wanted our signal channelized into 2048 channels. This leads us to a computational charge of about 32 GFlops just for the X side of the correlator. Along with the X computation we also needed 32 A/D converters working in parallel and 32 F engines to channelize each receiver's signal. Fig. 1: IBOB and iADC boards. # 3. Hardware Our computational demand cannot be met just by commodity hardware so we need some adhoc solution. We've decided to adopt the approach proposed by the CASPER<sup>a</sup> group, led by the Berkeley university and now extended into an international collaboration involving most major countries and institutions in the astrophysics and engineering community. The aim of this consortium is to give scientists a timely way to develop DSP solutions by providing developers with open source, library formatted, hardware and firmware components, communicating with each other through standard commercial protocols as described in Parsons et al. (2009). In the BEST backend, A/D conversion is realized by 16 iADC boards, equipped with a Chip Atmel/e2V AT84AD001B 8-bit Dual 1Gsps ADC that can digitize two inputs in parallel at 8 bit each. Each two iADC boards are connected to an IBOB (internet breakout board) via a standard ZDOCK interface. The IBOB board is equipped with a VirtexII-pro FPGA (Field Programmable Gate Array), a programmable chip which can POS (SKADS a http://casper.berkeley.edu Fig. 2: BEE2 board be used to preprocess data from the iADCs in many different ways. Data are output from the IBOBs through standard 10 Gigabit ethernet CX4 connections. Further processing is then executed on the BEE2 board, a powerful FPGA cluster with 5 VirtexII-pro chips. Data are received and output by the BEE2 always through 10 gigabit ethernet connections. This modular architecture lets us plug a standard PC in the 10Gbe network using a simple Myricom 10G-PCIE-8A-C+E PCIe network card which can be programmed as a standard network interface so that any PC can act as data storage for our backend. It's easy to understand that it would be possible to plug into this kind of network many other different devices such as GPUs or ASICs logics. IBOB and BEE2 FPGA chips can be programmed very easily and in a very time efficient manner using the graphical tools developed by the CASPER consortium itself. In fact, there's no need to write code, but developers can use a very practical graphical programming environment in which insert and connect blocks together in order to design their system. The CASPER community, along these years, has developed a full set of pre-build components, such as FFT blocks, polyphase filter banks, accumulators engines ecc. which just need to be plugged into the design and work out of the box. Even the interaction with the FPGA I/O such as the ZDOCK interface or the CX4 connection is made really simple by prebuilt blocks which constitute the core of the CASPER library. The obvious advantage of this collaborative approach is that components are tested and used by a lot of users, even the resolution of the inevitable problems met in the development stage becomes easier thanks to the help of people who've previously had the same issues. Again, the group is populated by lot of people working on similar projects thus enhancing international collaboration and exchange of methodologies and ideas. # 4. Packetized Correlator The first application developed on the BEST digital backend has been an FX correlation of the 32 inputs. In the correlator architecture data are sampled by iADC boards and the F-engines are implemented on the IBOBs. So, in our configuration, each IBOB captures 4 digital signals in parallel, the signals are then downconverted at 1/4 of the sample fre- Fig. 3: Correlation fringes for a single baseline after RFI extraction Fig. 4: Radiomap of VirgoA obtained with the packetized correlator quency. Data are then pipelined into a polyphase filter bank which channelizes the input into 2048 frequency channels, Channelized data is sent to the BEE2 board over the ten gigabit ethernet connection. Cross multiplication for all baselines by frequency channel is implemented on the BEE2 board where packets are received from the IBOBs exploiting the addressing capabilities of the UDP network protocol. Simplifying a little bit, the antenna number together with the frequency channel form the network address of the X engine to be addressed. In this way the data flow after the acquisition stage needs not to be synchronized and flows independently and data-driven, once each X-engine is aware of which baselines to calculate for what frequency channel. Once X engines have performed a runtime settable integration period, data is output to the PC which stores the correlation result in Myriad data format. Data can then be visualized as in 3 and processed using a Python package called AIPY<sup>b</sup>. Our correlator can process a bandwidth of 40 MHz@30 MHz, dividing the signal in 2048 frequency channels for 32 single polarization inputs. In 4 is shown the ra- b http://setiathome.ssl.berkeley.edu/~aparsons/aipy/ aipy.cgi Fig. 5: Architecture of the beamformer for 16 single polarization antennas Fig. 6: Block diagram representing the signal processing performed inside the IBOB boards. dio map obtained from a full correlation of a transit of VirgoA, deconvolved and cleaned. # 5. Beamformer Project The project of a beamformer for 16 single polarization antennas is under development. For its implementation a system composed of: - 4 IBOB boards (8 iADC boards); - 1 BEE2 board; - 1 10GbE Fujitsu switch; - 1 PC equipped with a 10GbE network card #### is necessary. The architecture of the system is presented in 5. The IF analogue signals are digitized by the A/D converters (ADC) then they enter into the IBOB boards where they are pre-processed. In particular the signals are down-converted into baseband by the Digital Down Converter (DDC) and then multiplied by the complex beamformer coefficients *w*. Before the signals are transmitted to the BEE2 board through CX4 cables, they are filtered by FIR Filters and down sampled. All the signal operations performed inside the IBOB boards are represented schematically in 6. Fig. 7: Block diagram representing the signal processing performed inside the BEE2 board in the beamformer design The signal processing core of the beamformer is into the BEE2 board. Two FPGA devices are in charge of the sum of the signals coming from two IBOB boards (8 signals); the sum of the signals obtained with these partial sums is carried out by another FPGA device that allows to get a single signal that is the beamformer output. Finally data are packetized and transmitted to the PC through the 10GbE switch. In the architecture above described, the synchronization of the system is achieved thanks to first-in first-out (FIFO) buffers: all the input data that comes through the XAUI interfaces are aligned in every FPGA in order to compensate possible alignment errors. Before the beamformer output can be calculated, the array has to be calibrated in order to avoid array pointing errors. So the design of a calibration system is under development, based on a radio astronomical calibration technique. Basically, comparing the curve progress (as the time changes) of the measured interference fringes with the expected ones in relation to some pre-determined baselines, the phase corrections for every antenna are calculated and then applied to them. The architecture of the calibration system is the same as the beamformer one with regard to the IBOB design but it differs from it with regard to the BEE2 design. For the calibration purpose 15 signal multiplications (related to 15 baselines) have to be calculated and only one FPGA device is enough to perform these operations. This time the resulting data from the multiplications can be stored on board and sent to the PC via standard 10/100 Mbit ethernet connection. Some preliminary calibration tests with a small number of antennas have been carried out. Figures 9 and 10 refer to the calibration of an entire cylinder of the BEST-2 array (4 receivers). The baseband signals that are correlated (product) have a bandwidth of 3 MHz and the data obtained from the correlation are integrated for 1 sec. In 9 there are the interference fringes obtained with the Cassiopeia A (3C 461) transit with no phase correction applied to the signals. The interference fringes have been calibrated after having applied the proper phase correction to each signal (see 10). Fig. 8: Block diagram representing the signal processing performed inside the BEE2 board in the calibrator design Fig. 9: Interference fringes of 4 not-calibrated receivers of one BEST-2 cylinder Fig. 10: Interference fringes of 4 calibrated receivers of one BEST-2 cylinder # 6. Conclusions Very good results have been obtained in the frame of digital backend development for the BEST array, getting a radio map of VirgoA with 32 receivers and calibrating a subsection of the array for the beamforming application. Moreover we expect a good performance of the beamformer system that will be running in a short time. Our choice for backend digital hardware has proved its effectiveness by hosting different applications on the same devices, giving reasonably fast development times and a good support by the CASPER community. This system is perfectly suitable for a number of future possible applications. For instance testing the multi-beaming capabilities of the BEST array and testing post-processing hardware and algorithms while conducting a survey for dispersed transient signals of astrophysical origin. #### References Parsons, A. et al. 2009, Digital Instrumentation for the Radio Astronomy Community, arXiv:0904.1181v1, April 2009 Parsons, A. et al. 2008, A Scalable Correlator Architecture Based on Modular FPGA Hardware, Reuseable Gateware, and Data Packetization, PASP Vol. 120 Iss. 873 (2008) 1207-1221, arXiv:0809.2266v3, November 2008