I have been preparing the slides for my future talk about reverse-engineering Outernet at FAQin 2018. While doing this, I have been re-reading some of the material about the work done on LDPC code and FEC in Outernet by George Hopkins in January 2017. One of the things I didn’t do back then was to read carefully the LDPC decoding function implemented by George.
In my post I explained that the LDPC code used in Outernet followed RFC5170, and I wondered whether it used the staircase scheme or the triangle scheme. I also commented that erasure decoding with an LDPC code (or any other linear code, actually) was mathematically equivalent to solving a linear system where the unknowns correspond to the erased symbols. I observed that the decoding function looked very different from this mathematical procedure, but should do more or less the same thing. Now I have read the decoder implementation carefully and I have the answer to both questions.
Recall that an LDPC code following RFC5170 is an \((n,k)\) linear code over a field \(F\) whose parity check matrix is of the form\[H = (L | R),\] where \(L\) is \((n-k) \times k\) and \(R\) is \((n-k) \times (n-k)\). The matrix \(L\) is generated pseudorandomly and its entries are 0’s and 1’s. The algorithm that generates \(L\) ensures that every row of \(L\) has at least \(2\) ones. Also, every column of \(L\) has at least \(N_1\) ones, and usually most columns have exactly \(N_1\) ones. Here \(N_1\) is a parameter (Outernet uses \(N_1 = 2\)). The matrix \(R\) is \[R=\begin{pmatrix}1 & 0 & 0 & 0 & \cdots & 0\\1 & 1 & 0 & 0 & \cdots & 0\\0 & 1 & 1 & 0 & \cdots & 0\\\vdots & \cdots &\ddots & \ddots & \cdots & \vdots\\0 & 0 &\cdots & 1 & 1 & 0\\0 & 0 & \cdots & 0 & 1 & 1\end{pmatrix}\]for the staircase scheme and \[R=\begin{pmatrix}1 & 0 & 0 & 0 & \cdots & 0\\1 & 1 & 0 & 0 & \cdots & 0\\1 & 1 & 1 & 0 & \cdots & 0\\\vdots & \cdots &\ddots & \ddots & \cdots & \vdots\\1 & 1 &\cdots & 1 & 1 & 0\\1 & 1 & \cdots & 1 & 1 & 1\end{pmatrix}\]for the triangle scheme.
Also recall the mathematical interpretation for erasure decoding that I gave in my post. A codeword \(c = (c_1,\ldots,c_n)\in F^n\) is transmitted and we receive \(r = (r_1,\ldots,r_n)\), where each element \(r_j\) is either an unknown \(x_j\) if the \(j\)-th symbol was erased or \(r_j = c_j\) otherwise. Then the equation \(Hr = 0\) is a (non-homogeneous) linear system for the unknowns \(x_j\). Decoding amounts to solving this system. The message can be decoded correctly if the system has a unique solution.
The decoding algorithm by George Hopkins works basically as follows:
- Take the smallest index \(j\) such that \(r_{k+j}\) is not an unknown (i.e., the first parity check symbol which was not erased).
- Consider the \(j\)-th equation of the system \(Hr = 0\). Look at the set \(S_j\) of symbols \(r_t\) with \(1 \leq t \leq k\) that appear in this equation with coefficient one. If exactly one of these symbols is unknown, go to step 3. Else, go to step 4.
- Precisely one symbol in \(S_j\) is unknown, say, \(r_l\). Compute \(r_l\) as \(r_l = r_{k+j} – \sum_{s \in S_j \setminus\{r_l\}} s\). Henceforth, \(r_l\) is regarded as a known symbol. Go to step 1.
- Take the next \(j\) such that \(r_{k+j}\) is not an unknown and go to step 2.
Decoding fails if at some point you loop over all known symbols \(r_{k+j}\), \(0 \leq j \leq n-k\) without ever getting into step 3, as the algorithm would then enter an infinite loop without making any further progress. If this does not happen, then all unknowns \(x_j\) with \(1 \leq j \leq k\) are eventually solved by step 3, and so the message gets decoded.
From step 3 in the algorithm we note something quite interesting. The matrix \(R\) is neither the staircase nor the triangle matrices given above. In fact, \(R = I\) is the identity matrix. Indeed, note that only the parity check symbol \(r_{k+j}\) appears in the equation that we use to compute the unknown \(r_l\). I wonder whether taking \(R = I\) is a wise choice, i.e., whether there is any noticeable performance difference between this simpler code and the staircase and triangle codes as described in RFC5170.
The decoding algorithm above is best explained in the following manner. We look at the system \(Hr = 0\). If one of the equations has only one unknown with non-zero coefficient, then we solve for that unknown trivially. By repeating this step, eventually we either solve the system or get stuck. The RFC mentions this algorithm, citing an article by Zyablov and Pinsker.
It is quite clear that this algorithm, while being simple and computationally fast, is not optimal, in the sense that it may fail even if the system \(Hr = 0\) has a single solution. For instance, consider a linear system \(Ax = b\) where the matrix \(A\) is\[A = \begin{pmatrix} 1 & 1 & 0\\ 0 & 1 & 1\\ 1 & 0 & 1\end{pmatrix}.\]Then \(A\) is invertible, but all of its rows have more than one \(1\), so the system cannot be solved by this algorithm. A system such as this one can arise when writing the system \(Hr = 0\) for one of the codes we are considering here.
Therefore, the LDPC decoder in free-outernet could work better if it used a proper algorithm to solve the system \(Hr = 0\), such as Gaussian elimination. Note that this decoder was written by George by reverse-engineering the closed-source binary in the official receiver, so the official receiver also uses the same oversimplified algorithm.
Update: I have just realised that the matrix \(A\) given above is only invertible when the characteristic of the field \(F\) is not \(2\). Over fields of characteristic \(2\), which is the usual case in applications, \(A\) is not invertible, so the system \(Ax = b\) does not have a single solution. In fact, fact, we can prove the following lemma.
Lemma. Assume that \(F\) is a field of characteristic \(2\), the matrix \(L\) has at most two ones in each row and that \(R=I\). If the system \(Hr = 0\) that arises when considering erasure decoding has a single solution, then this solution can be found by the simplified algorithm described above.
Proof: We proceed by induction on \(e\) the number of symbols \(r_j\), \(1\leq j \leq k\), that were erased. For \(e = 0\) there is nothing to prove. Assume that the lemma is proved for \(e = m\) erasures. Let us prove the lemma for \(e = m+1\). Since \(R = I\), solving the system \(Hr = 0\) is equivalent to solving \(\widetilde{L}\widetilde{r} = b\), where \(\widetilde{r} = (r_1,\ldots,r_k)\), \(b\) is the vector of known parity check symbols and \(\widetilde{L}\) is the matrix formed by the rows of \(L\) that correspond to the known parity check symbols. We write this system as \(Ax = c\), where \(A\) is the matrix formed by the columns of \(\widetilde{L}\) that correspond to unknown symbols in \(\widetilde{r}\). The system has a single solution if and only if \(A\) has rank \(e\), since there are \(e\) unknowns.
Note that \(A\) is a submatrix of \(L\) and it has \(e\) columns. If one of the rows of \(A\) has exactly one 1, then the simplified algorithm can solve an unknown using this row. We obtain a problem with \(e = m\) erasures and we can apply the inductive step. Otherwise, each of the rows of \(A\) has either zero ones or two ones. Since the characteristic of \(F\) is \(2\), then the sum of the columns of \(A\) is the zero vector, so the rank of \(A\) is not \(e\) and the system \(Ax = c\) does not have a single solution. This proves the lemma.
Note that the lemma is no longer true when there are more than two ones in some of the rows of \(A\), as the matrix\[\begin{pmatrix} 1 & 1 & 0\\ 0 & 1 & 1\\ 1 & 1 & 1\end{pmatrix}\](which has determinant \(1\)) shows. To which extent the matrix \(L\) can have at most two ones on each row depends on the pseudorandom generation algorithm. An easy application of the pigeonhole principle shows that if the rate \(k/n\) is greater than \(2/(N_1+2)\), then there must be some rows with at least three ones. This is the case for Outernet, which uses a rate of approximately \(5/6\) and \(N_1 = 2\).
2 comments