Graphical Processing Units (GPUs) have recently become a valuable computing tool for the acquisition of data at high rates and for a relatively low cost. The devices work by parallelizing the code into thousands of threads, each executing a simple process, such as identifying pulses from a waveform digitizer. The CUDA programming library can be used to effectively write code to parallelize such tasks on Nvidia GPUs, providing a significant upgrade in performance over CPU based acquisition systems.
The muon $g$-$2$ experiment at Fermilab is heavily relying on GPUs to process its data. The data acquisition system for this experiment must have the ability to create deadtime-free records from 700 $\mu$s muon spills at a raw data rate 18 GB per second. Data will be collected using 1296 channels of $\mu$TCA-based 800 MSPS, 12 bit waveform digitizers and processed in a layered array of networked commodity processors with 24 GPUs working in parallel to perform a fast recording of the muon decays during the spill. The described data acquisition system is currently being constructed, and will be fully operational before the start of the experiment in 2017.