Recently I have implemented an FPGA LDPC decoder for a commercial project. The belief propagation LDPC decoder algorithm admits many different approximations in the arithmetic, and other tricks that can be used to trade off between decoding sensitivity (BER versus Eb/N0 performance) and computational complexity. To help me benchmark the different belief propagation algorithms, I have extended my ldpc-toolbox project to implement many different LDPC decoding algorithms and perform BER simulations.
ldpc-toolbox is a Rust library and command line tool for the design of LDPC codes. I initially created this project when I was trying to design a suitable LDPC code for a narrowband 32APSK modem to be used over the QO-100 amateur GEO transponder. The tool so far supported some classical pseudorandom constructions of LDPC codes, computed Tanner graph girths, and could construct the alists for all the DVB-S2 and CCSDS LDPC codes. Extending this tool to support LDPC encoding, decoding and BER simulation is a natural step.
Currently, the BER simulation only allows a BPSK AWGN channel. The simulation output format is heavily inspired by the AFF3CT library. For example, after generating an alist file with ldpc-toolbox ccsds
, the CCSDS r=1/2 k=1024 code that was used in Artemis 1 can be simulated with
ldpc-toolbox ber --min-ebn0 0.0 --max-ebn0 2.05 --step-ebn0 0.1 \ --puncturing 1,1,1,1,0 ar4ja_1_2_1024.alist
The output is printed to the terminal, updating the partial progress in real time as the simulation progresses. At the end of the run, the output will look similar to this:
BER TEST PARAMETERS ------------------- Simulation: - Minimum Eb/N0: 0.00 dB - Maximum Eb/N0: 2.05 dB - Eb/N0 step: 0.10 dB - Number of frame errors: 100 LDPC code: - alist: /home/daniel/msss/ldpc-toolbox/ar4ja_1_2_1024.alist - Puncturing pattern: 1,1,1,1,0 - Information bits (k): 1024 - Codeword size (N_cw): 2560 - Frame size (N): 2048 - Code rate: 0.500 LDPC decoder: - Implementation: Phif64 - Maximum iterations: 200 Eb/N0 | Frames | Bit errs | Frame er | False de | BER | FER | Avg iter | Avg corr | Throughp | Elapsed --------|----------|----------|----------|----------|---------|---------|----------|----------|----------|---------- 0.00 | 100 | 15749 | 100 | 0 | 1.54e-1 | 1.00e0 | 200.0 | NaN | 0.124 | 0s 0.10 | 100 | 15051 | 100 | 0 | 1.47e-1 | 1.00e0 | 200.0 | NaN | 0.122 | 0s 0.20 | 102 | 15022 | 100 | 0 | 1.44e-1 | 9.80e-1 | 197.3 | 61.5 | 0.123 | 0s 0.30 | 101 | 14650 | 100 | 0 | 1.42e-1 | 9.90e-1 | 198.9 | 90.0 | 0.121 | 0s 0.40 | 107 | 13960 | 100 | 0 | 1.27e-1 | 9.35e-1 | 191.5 | 70.3 | 0.127 | 0s 0.50 | 123 | 13603 | 100 | 0 | 1.08e-1 | 8.13e-1 | 175.1 | 66.6 | 0.145 | 0s 0.60 | 137 | 12947 | 100 | 0 | 9.23e-2 | 7.30e-1 | 161.6 | 57.8 | 0.156 | 0s 0.70 | 168 | 13234 | 100 | 0 | 7.69e-2 | 5.95e-1 | 137.6 | 45.9 | 0.183 | 0s 0.80 | 211 | 12785 | 100 | 0 | 5.92e-2 | 4.74e-1 | 117.3 | 42.9 | 0.217 | 0s 0.90 | 333 | 11970 | 100 | 0 | 3.51e-2 | 3.00e-1 | 88.8 | 41.1 | 0.288 | 1s 1.00 | 575 | 11872 | 100 | 0 | 2.02e-2 | 1.74e-1 | 65.3 | 36.9 | 0.400 | 1s 1.10 | 923 | 12132 | 100 | 0 | 1.28e-2 | 1.08e-1 | 49.7 | 31.4 | 0.530 | 1s 1.20 | 1776 | 11960 | 100 | 0 | 6.58e-3 | 5.63e-2 | 38.1 | 28.5 | 0.696 | 2s 1.30 | 4355 | 12304 | 100 | 0 | 2.76e-3 | 2.30e-2 | 29.0 | 24.9 | 0.925 | 4s 1.40 | 9692 | 11682 | 100 | 0 | 1.18e-3 | 1.03e-2 | 23.7 | 21.9 | 1.127 | 8s 1.50 | 31577 | 12022 | 100 | 0 | 3.72e-4 | 3.17e-3 | 20.2 | 19.6 | 1.305 | 24s 1.60 | 97292 | 11918 | 100 | 0 | 1.20e-4 | 1.03e-3 | 18.0 | 17.8 | 1.434 | 1m 9s 1.70 | 335865 | 12183 | 100 | 0 | 3.54e-5 | 2.98e-4 | 16.3 | 16.3 | 1.516 | 3m 46s 1.80 | 1156333 | 13002 | 100 | 0 | 1.10e-5 | 8.65e-5 | 15.1 | 15.1 | 1.560 | 12m 39s 1.90 | 5794237 | 12374 | 100 | 0 | 2.09e-6 | 1.73e-5 | 14.0 | 14.0 | 1.616 | 1h 1m 11s 2.00 | 23053676 | 11983 | 100 | 0 | 5.08e-7 | 4.34e-6 | 13.2 | 13.2 | 1.708 | 3h 50m 18s
Additionally to the data that AFF3CT gives in its output table, ldpc-toolbox also shows the average number of iterations for all the frames and the average number of iterations for successfully decoded frames. These quantities are very important to determine the maximum throughput that an FPGA implementation will support.
The decoder allows a large number of arithmetic implementations. These are described in the documentation. The choice of available implementations is mainly intended to compare floating-point implementations of the mathematically ideal belief propagation with 8-bit fixed-point approximations suitable for FPGAs. The focus is not so much on decoding speed but on performing a bit exact simulation of what the final FPGA implementation will look like. This goal is what led me to write an LDPC decoder from scratch in ldpc-toolbox instead of extending AFF3CT to include the algorithms I wanted to benchmark. It was easier for me to have full control of the computations by doing this. Nevertheless, the decoding speed of some of the less complex (but not necessarily less sensitive) 8-bit algorithms is reasonably good (specially if the tool is built with RUSTFLAGS='-C target-cpu=native'
which enables some helpful auto-vectorization using AVX2 or AVX-512). The BER simulation is multi-threaded, using all the CPU cores, as AFF3CT does.
I have also implemented a GNU Radio out-of-tree module called gr-ldpc-toolbox that provides an LDPC encoder block and an LDPC decoder block using the implementation in ldpc-toolbox. To my knowledge, this is the first GNU Radio out-of-tree module which is mostly implemented in Rust. The way in which this works is relatively simple (but it doesn’t scale up well for more complex use cases). The LDPC encoder and decoder in ldpc-toolbox are exposed in a C API. This allows ldpc-toolbox to be built as a dynamic library or static library and to be called from any software that can call C. Corrosion is used to build the Rust library from the usual CMake build system of the GNU Radio out-of-tree module. The Rust library is built as a static library, so all the required code ends up in the dynamic library that implements the GNU Radio module. The GNU Radio blocks are simple wrappers that call the C API functions of the Rust library (see for instance the LDPC encoder implementation). It would be interesting to have a tighter integration between GNU Radio C++ block classes and Rust, but at the moment this approach is quite practical for this project.
The main reason for making gr-ldpc-toolbox was that the current LDPC decoder implementation is mostly broken, as I found when trying to use it to decode Artemis I. This will probably change soon, because there is a pull request that fixes these problems. However, I needed to have a way to test my FPGA implementation using GNU Radio, and having the ability to use the encoder and decoder from ldpc-toolbox in GNU Radio also seemed to open up many future possibilities.
An example flowgraph with a BER simulation of the CCSDS r=1/2 k=1024 LDPC code is included in gr-ldpc-toolbox. This is shown here.
The LDPC encoder and decoder blocks work with vectors of the appropriate size. The encoder handles unpacked bits. The decoder expects LLRs at the input (with the convention that a positive LLR means that the bit 0 is more likely) and outputs unpacked bits. The decoder only produces an output vector when the codeword is successfully decoded. If decoding fails, the decoder simply drops the input codeword (recall that successful decoding of an LDPC decoder can be checked by testing whether the candidate codeword satisfies the parity check equations; it is very unlikely for the decoder to converge to a codeword that belongs to the code but differs from the that was transmitted).
By comparing the rates of the two Probe Rate blocks in the flowgraph it is possible to get a crude idea of the frame error rate. Note that the Probe Rate block only produces measurements when it sees data, so if the Eb/N0 is set so low that all the codewords fail to decode, the Probe Rate at the decoder output will not produce any measurements.
Unlike the BER simulation in ldpc-toolbox, these decoders are single-threaded, so the performance is not as good. An improved implementation of the decoder could have a pool of worker threads, implemented either in Rust as part of ldpc-toolbox or in C++ as part of the GNU Radio block.
One comment