I have been preparing the slides for my future talk about reverse-engineering Outernet at FAQin 2018. While doing this, I have been re-reading some of the material about the work done on LDPC code and FEC in Outernet by George Hopkins in January 2017. One of the things I didn't do back then was to read carefully the LDPC decoding function implemented by George.
In my post I explained that the LDPC code used in Outernet followed RFC5170, and I wondered whether it used the staircase scheme or the triangle scheme. I also commented that erasure decoding with an LDPC code (or any other linear code, actually) was mathematically equivalent to solving a linear system where the unknowns correspond to the erased symbols. I observed that the decoding function looked very different from this mathematical procedure, but should do more or less the same thing. Now I have read the decoder implementation carefully and I have the answer to both questions.
Recall that an LDPC code following RFC5170 is an linear code over a field whose parity check matrix is of the form
where is and is . The matrix is generated pseudorandomly and its entries are 0's and 1's. The algorithm that generates ensures that every row of has at least ones. Also, every column of has at least ones, and usually most columns have exactly ones. Here is a parameter (Outernet uses ). The matrix is
for the staircase scheme and
for the triangle scheme.
Also recall the mathematical interpretation for erasure decoding that I gave in my post. A codeword is transmitted and we receive , where each element is either an unknown if the -th symbol was erased or otherwise. Then the equation is a (non-homogeneous) linear system for the unknowns . Decoding amounts to solving this system. The message can be decoded correctly if the system has a unique solution.
The decoding algorithm by George Hopkins works basically as follows:
- Take the smallest index such that is not an unknown (i.e., the first parity check symbol which was not erased).
- Consider the -th equation of the system . Look at the set of symbols with that appear in this equation with coefficient one. If exactly one of these symbols is unknown, go to step 3. Else, go to step 4.
- Precisely one symbol in is unknown, say, . Compute as . Henceforth, is regarded as a known symbol. Go to step 1.
- Take the next such that is not an unknown and go to step 2.
Decoding fails if at some point you loop over all known symbols , without ever getting into step 3, as the algorithm would then enter an infinite loop without making any further progress. If this does not happen, then all unknowns with are eventually solved by step 3, and so the message gets decoded.
From step 3 in the algorithm we note something quite interesting. The matrix is neither the staircase nor the triangle matrices given above. In fact, is the identity matrix. Indeed, note that only the parity check symbol appears in the equation that we use to compute the unknown . I wonder whether taking is a wise choice, i.e., whether there is any noticeable performance difference between this simpler code and the staircase and triangle codes as described in RFC5170.
The decoding algorithm above is best explained in the following manner. We look at the system . If one of the equations has only one unknown with non-zero coefficient, then we solve for that unknown trivially. By repeating this step, eventually we either solve the system or get stuck. The RFC mentions this algorithm, citing an article by Zyablov and Pinsker.
It is quite clear that this algorithm, while being simple and computationally fast, is not optimal, in the sense that it may fail even if the system has a single solution. For instance, consider a linear system where the matrix is
Then is invertible, but all of its rows have more than one , so the system cannot be solved by this algorithm. A system such as this one can arise when writing the system for one of the codes we are considering here.
Therefore, the LDPC decoder in free-outernet could work better if it used a proper algorithm to solve the system , such as Gaussian elimination. Note that this decoder was written by George by reverse-engineering the closed-source binary in the official receiver, so the official receiver also uses the same oversimplified algorithm.
Update: I have just realised that the matrix given above is only invertible when the characteristic of the field is not . Over fields of characteristic , which is the usual case in applications, is not invertible, so the system does not have a single solution. In fact, fact, we can prove the following lemma.
Lemma. Assume that is a field of characteristic , the matrix has at most two ones in each row and that . If the system that arises when considering erasure decoding has a single solution, then this solution can be found by the simplified algorithm described above.
Proof: We proceed by induction on the number of symbols , , that were erased. For there is nothing to prove. Assume that the lemma is proved for erasures. Let us prove the lemma for . Since , solving the system is equivalent to solving , where , is the vector of known parity check symbols and is the matrix formed by the rows of that correspond to the known parity check symbols. We write this system as , where is the matrix formed by the columns of that correspond to unknown symbols in . The system has a single solution if and only if has rank , since there are unknowns.
Note that is a submatrix of and it has columns. If one of the rows of has exactly one 1, then the simplified algorithm can solve an unknown using this row. We obtain a problem with erasures and we can apply the inductive step. Otherwise, each of the rows of has either zero ones or two ones. Since the characteristic of is , then the sum of the columns of is the zero vector, so the rank of is not and the system does not have a single solution. This proves the lemma.
Note that the lemma is no longer true when there are more than two ones in some of the rows of , as the matrix
(which has determinant ) shows. To which extent the matrix can have at most two ones on each row depends on the pseudorandom generation algorithm. An easy application of the pigeonhole principle shows that if the rate is greater than , then there must be some rows with at least three ones. This is the case for Outernet, which uses a rate of approximately and .