The DSN Telecommunications Link Design Handbook is a large document describing many aspects pertaining deep space communications and how they are implemented by the NASA Deep Space Network. One of the many things it contains is a description of a Reed-Solomon encoder for the CCSDS code using the Berlekamp bit-serial architecture. While following this description to implement an encoder, I have found an error. In this post, I explain the error and where I think it comes from.
The Telecommunications Link Design Handbook is divided into several modules, which are really stand-alone documents. Module 208 Telemetry Data Decoding, describes all the telemetry synchronization and coding formats supported by the DSN. In this aspect, it is a counterpart of the CCSDS TM Synchronization and Channel Coding books. Here I’m looking at revision B of this module (year 2013), which I think is the latest.
In page 21 of this module, there is the following figure, which describes a Reed-Solomon encoder for the (255, 223) CCSDS code using the bit-serial architecture invented by Berlekamp. I think the figure is mainly intended as a description of the Reed-Solomon code used by the DSN, but it is very helpful in many other contexts, because it contains essentially all the information that one needs to implement the encoder.
As I explained in my post about the Voyager Reed-Solomon code, in 1982 Berlekamp invented a particular form of Reed-Solomon codes that use a “dual basis representation” and can be encoded very efficiently in hardware, processing the message bit by bit. The CCSDS code originated as part of this work, so it is specially tailored to this encoder architecture. The figure shows all there is to the encoder: just some shift registers and XOR gates. It is also a rather clear description, since it doesn’t really get into any of the math that makes this approach work.
However, it turns out that there is an error in this figure. The XOR gate connecting \(z_3\) to \(T_3\) should actually connect \(z_3\) to \(T_4\), and moreover, the XOR gate connecting \(z_3\) to \(T_{13}\) should be removed. The corrected version of the encoder is shown below, with the changes highlighted in red.
I discovered this problem while validating an encoder that I had implemented following this figure. I was checking if my encoder generated the same results as Phil Karn’s libfec, and it didn’t. The libfec library, as most other software implementations, performs the encoding in a completely different way which works byte by byte. It uses the “conventional basis” and has look-up tables of logarithms and exponentials to implement the field multiplication. A basis change is done before and after the encoding, so the results should be equivalent. This approach is more efficient when implementing the encoder in software.
After testing with a few very simple messages, such as those having zeros everywhere except in one of the last bits, I concluded that the problem was actually with the figure, and that the figure should be corrected as shown above.
I found this error quite interesting, because it’s subtle. The error affects both \(T_3\) and \(T_{13}\) in the same way (note that \(T_3\) and \(T_{13}\) actually perform the same operations), and besides this, it amounts to a single XOR gate being moved. Thinking more about where this error could have originated, I have traced it to the 1983 JPL report The Reed-Solomon encoders: Conventional versus Berlekamp’s architecture, by Perlman and Lee. This report, which I already cited in my post about Voyager, gives a very detailed explanation of the advantages and mathematical background of the encoder architecture proposed by Berlekamp, and gives all the calculations for the code that would eventually be standardised as the CCSDS (255, 223) code.
In page 40 of this report, the following formulas for the calculation of the \(T_\ell\) terms are given (the formulas for \(T_{12}\), \(T_{14}\), \(T_{15}\) and \(T_{16}\) appear in the next page and are not shown here).
Note that \(T_3\) and \(T_{13}\) appear in the same row, since they have the same value (this mentioned earlier in the report, and comes from the fact that the 3rd and 13th coefficients of the code generator polynomial coincide). Also note that the formulas describe exactly the same calculations as the XOR gates as Module 208, including the error. The single mistake here is that the term \(z_3\) in the line for \(T_3\) and \(T_{13}\) should actually appear in the next line, which gives the formula for \(T_4\). It sounds reasonable that this error was introduced when typesetting the report. Or maybe it was already present in the manuscript. It would be fun to try to trace it further if the documents are still around in some library. In any case, it doesn’t seem so unlikely that this kind of mistake would happen, and also that it would get copied into other documents such as Module 208. The report is probably the best reference regarding Berlekamp’s ideas, and I haven’t seen many other places where these formulas for the \(T_\ell\) terms are listed.
This post wouldn’t be complete without actually showing that the correct formulas for \(T_3\) and \(T_4\) are\[T_3 = z_0 + z_2 + z_4 + z_5\]and\[T_4 = z_0 + z_2 + z_3 + z_7,\]as I have claimed above. With the help of Table 4 in page 42 of the report, which lists the traces of all the field elements, this something that can be even done by hand (without this table, the field trace is not something I would enjoy to compute by hand).
For the calculations, we need the values of the dual basis elements, which are given in (20) in page 39 as\[\begin{split}\ell_0 &= \alpha^{125},\quad \ell_1 = \alpha^{88},\quad \ell_2 = \alpha^{226},\quad \ell_3= \alpha^{163},\\ \ell_4 &= \alpha^{46},\quad \ell_5 = \alpha^{184},\quad \ell_6 = \alpha^{67},\quad \ell_7 = \alpha^{242}.\end{split}\]The expressions for the terms \(T_\ell\) can be computed as\[T_\ell = \sum_{j=0}^7 z_j \operatorname{Tr}(\ell_j G_\ell),\]where\[g(x) = \sum_{\ell=0}^{32} G_\ell x^\ell\]is the code generator polynomial. Therefore, we need the values for \(G_3\) and \(G_4\), which are given in (22) in page 40. These are\[G_3 = \alpha^{66},\quad G_4 = \alpha^4.\]
Now, using the fact that \(\alpha^{255} = 1\) and Table 4 to look up the values of the trace, we can simply calculate\[\begin{split}\operatorname{Tr}(\ell_0G_3) &= \operatorname{Tr}(\alpha^{125}\alpha^{66}) = \operatorname{Tr}(\alpha^{191}) = 1,\\ \operatorname{Tr}(\ell_1G_3) &= \operatorname{Tr}(\alpha^{88}\alpha^{66}) = \operatorname{Tr}(\alpha^{154}) = 0,\\ \operatorname{Tr}(\ell_2G_3) &= \operatorname{Tr}(\alpha^{226}\alpha^{66}) = \operatorname{Tr}(\alpha^{37}) = 1,\\ \operatorname{Tr}(\ell_3G_3) &= \operatorname{Tr}(\alpha^{163}\alpha^{66}) = \operatorname{Tr}(\alpha^{229}) = 0,\\ \operatorname{Tr}(\ell_4G_3) &= \operatorname{Tr}(\alpha^{46}\alpha^{66}) = \operatorname{Tr}(\alpha^{112}) = 1,\\ \operatorname{Tr}(\ell_5G_3) &= \operatorname{Tr}(\alpha^{184}\alpha^{66}) = \operatorname{Tr}(\alpha^{250}) = 1,\\ \operatorname{Tr}(\ell_6G_3) &= \operatorname{Tr}(\alpha^{67}\alpha^{66}) = \operatorname{Tr}(\alpha^{133}) = 0,\\ \operatorname{Tr}(\ell_7G_3) &= \operatorname{Tr}(\alpha^{242}\alpha^{66}) = \operatorname{Tr}(\alpha^{53}) = 0,\end{split}\]from which it follows that\[T_3 = z_0 + z_2 + z_4 + z_5.\]In the same way,\[\begin{split}\operatorname{Tr}(\ell_0G_4) &= \operatorname{Tr}(\alpha^{125}\alpha^4) = \operatorname{Tr}(\alpha^{129}) = 1,\\ \operatorname{Tr}(\ell_1G_4) &= \operatorname{Tr}(\alpha^{88}\alpha^4) = \operatorname{Tr}(\alpha^{92}) = 0,\\ \operatorname{Tr}(\ell_2G_4) &= \operatorname{Tr}(\alpha^{226}\alpha^4) = \operatorname{Tr}(\alpha^{230}) = 1,\\ \operatorname{Tr}(\ell_3G_4) &= \operatorname{Tr}(\alpha^{163}\alpha^4) = \operatorname{Tr}(\alpha^{167}) = 1,\\ \operatorname{Tr}(\ell_4G_4) &= \operatorname{Tr}(\alpha^{46}\alpha^4) = \operatorname{Tr}(\alpha^{50}) = 0,\\ \operatorname{Tr}(\ell_5G_4) &= \operatorname{Tr}(\alpha^{184}\alpha^4) = \operatorname{Tr}(\alpha^{188}) = 0,\\ \operatorname{Tr}(\ell_6G_4) &= \operatorname{Tr}(\alpha^{67}\alpha^4) = \operatorname{Tr}(\alpha^{71}) = 0,\\ \operatorname{Tr}(\ell_7G_4) &= \operatorname{Tr}(\alpha^{242}\alpha^4) = \operatorname{Tr}(\alpha^{246}) = 1.\end{split}\]This gives \[T_4 = z_0 + z_2 + z_3 + z_7.\]
While we are at it, it doesn’t hurt to check that all the other formulas in the report by Perlman and Lee are correct, although I’m confident that this is the case because after introducing the correction described here, my Reed-Solomon encoder matches Phil Karn’s libfec. I have done a small notebook using Sage to compute all the quantities used in the CCSDS Reed-Solomon code, starting from the definition. I have checked that the values for the coefficients \(G_\ell\) of the code generator polynomial, the dual basis elements \(\ell_j\), the linear functionals \(T_\ell\) (except for the error in \(T_3\), \(T_4\) and \(T_{13}\)), and the linear functional \(z_8\) all match correctly the values given in the report. I have also generated a table mimicking Table 4, but I haven’t checked that it matches the report, as that would be very tedious. In case of doubt, one could cross-check with the table in the notebook.