Reusable Integration Framework for FPGA Accelerators

RIFFA: Reusable Integration Framework for FPGA Accelerators

Before FPGA vendors provided standardized AXI interfaces to PCI Express (PCIe) IP, researchers were responsible for writing their own communication solutions for collecting data and publishing their work. Writing an PCIe solution is a significant amount of work that crosses stack abstractions and the hardware/software interface. At a minimum any solution must consider system calls, interrupts, virtual-to-physical memory translation, PCIe packet format, memory request reordering, and static timing analysis. When faced with this challenge most chose to pay to license an existing solution from a vendor instead of writing their own.

RIFFA (Reusable Integration Framework for FPGA Accelerators) is a high-bandwidth open source framework for communicating data from a host CPU to a FPGA via a PCI Express bus. The framework requires a PCIe enabled workstation and a FPGA on a board with a PCIe connector. RIFFA supports Windows and Linux, Altera and Xilinx, with bindings for C/C++, Python, MATLAB and Java. RIFFA allows users to focus on research and applications, NOT PLUMBING.

Software API

On the software side there are two main functions: data send and data receive. These functions are exposed via user libraries in C/C++, Python, MATLAB, and Java. The driver supports multiple FPGAs (up to 5) per system. The software bindings work on Linux and Windows operating systems. Users can communicate with FPGA IP cores by writing only a few lines of code. 

#include <stdio.h>;
#include <stdlib.h>;
#include <riffa.h>;
 
#define BUF_SIZE (1*1024*1024)
unsigned int buf[BUF_SIZE];
 
int main(int argc, char* argv[]) {
  fpga_t * fpga;
  int fid = 0; // FPGA id
  int channel = 0; // FPGA channel
 
  fpga = fpga_open(fid);
  fpga_send(fpga, channel, (void *)buf, BUF_SIZE, 0, 1, 0);
  fpga_recv(fpga, channel, (void *)buf, BUF_SIZE, 0);
  fpga_close(fpga);
  return 0;
}

Hardware API

On the hardware side, users access an interface with independent transmit and receive signals. The signals provide transaction handshaking and a first word fall through FIFO interface for reading/writing data. No knowledge of bus addresses, buffer sizes, or PCIe packet formats is required. Simply send data on a FIFO interface and receive data on a FIFO interface. RIFFA does not rely on a PCIe Bridge and therefore is not subject to the limitations of a bridge implementation. Instead, RIFFA works directly with the PCIe Endpoint and can run fast enough to saturate the PCIe link.  Both the software and hardware interfaces have been greatly simplified. 

Name I/O Description
RX Interface
CHNL_RX_CLK O Provide the clock signal to read data from the incoming FIFO.
CHNL_RX I Goes high to signal incoming data. Will remain high until all incoming data is written to the FIFO.
CHNL_RX_ACK O Must be pulsed high for at least 1 cycle to acknowledge the incoming data transaction.
CHNL_RX_LAST I High indicates this is the last receive transaction in a sequence.
CHNL_RX_LEN[31:0] I Length of receive transaction in 4 byte words.
CHNL_RX_OFF[30:0] I Offset in 4 byte words indicating where to start storing received data (if applicable in design).
CHNL_RX_DATA[DWIDTH-1:0] I Receive data.
CHNL_RX_DATA_VALID I High if the data on CHNL_RX_DATA is valid.
CHNL_RX_DATA_REN O When high and CHNL_RX_DATA_VALID is high, consumes the data currently available on CHNL_RX_DATA.
TX Interface
CHNL_TX_CLK O Provide the clock signal to write data to the outgoing FIFO.
CHNL_TX O Set high to signal a transaction. Keep high until all outgoing data is written to the FIFO.
CHNL_TX_ACK I Will be pulsed high for at least 1 cycle to acknowledge the transaction.
CHNL_TX_LAST O High indicates this is the last send transaction in a sequence.
CHNL_TX_LEN[31:0] O Length of send transaction in 4 byte words.
CHNL_TX_OFF[30:0] O Offset in 4 byte words indicating where to start storing sent data in the PC thread's receive buffer.
CHNL_TX_DATA[DWIDTH-1:0] O Send data.
CHNL_TX_DATA_VALID O Set high when the data on CHNL_TX_DATA valid. Update when CHNL_TX_DATA is consumed.
CHNL_TX_DATA_REN I When high and CHNL_TX_DATA_VALID is high, consumes the data currently available on CHNL_TX_DATA.

RIFFA communicates data using direct memory access (DMA) transfers and interrupt signaling. This achieves high bandwidth over the PCIe link. In our tests we are able to saturate (or near saturate) the link in all our tests. We have implemented RIFFA on the AVNet Spartan LX150T, Xilinx ML605, and Xilinx VC707, as well as the Altera DE5-Net, DE4 and DE2i boards. The RIFFA distribution contains examples and guides for setting up designs on the three development boards listed above. In addition, this website has examples of how to access your design from all the software bindings. RIFFA has been tested on Fedora 13 & 17 (32/64 bit vers.) and Ubuntu Desktop 10.04 LTS & 12.04 LTS (32/64 bit vers.). RIFFA relies on a custom Linux kernel driver which is supported on Linux kernels 2.6.27+ (tested on versions between 2.6.32 – 3.X). The Windows driver is supported on: Windows 7 in both 32 bit and 64 bit variants. Details regarding DMA transfers can be found in our paper.

RIFFA 2.1 is significantly more efficient than its predecessors. The RIFFA 2.1 is able to saturate the PCIe link for nearly all link configurations supported. The chart to the right shows the performance of designs using the 32 bit, 64 bit, and 128 bit interfaces. The colored bands show the bandwidth region between the theoretical maximum and the maximum achievable. PCIe Gen 1 and 2 use 8 bit / 10 bit encoding which limits the maximum achievable bandwidth to 80% of the theoretical. Our experiments show that RIFFA can achieve 80% of the theoretical bandwidth in nearly all cases. The 128 bit interface achieves 76% of the theoretical maximum.

RIFFA is funded by Cognex, Altera, Xilinx and the National Science Foundation.