I’ve been looking at an erasure code by Luigi Rizzo which is based on Vandermonde matrices, since this code is used in Outernet. In fact, it is the code implemented by the zfec library. Luigi Rizzo describes his code in a paper from 1997, but the paper can be very confusing and misleading because it describes the mathematics in very little detail. I needed to go to the source code to understand how it works. Actually, the idea behind this code is very simple. Here I do a mathematical description of the code and show that it is the same as a Reed-Solomon code. This is rather weird, because Luigi Rizzo makes no mention of Reed-Solomon codes, which were first described in 1960.
Let \(F\) be a field. Recall that an \((n,k)\) linear code over \(F\) can be given in terms of a generator matrix \(G\), which is an \(n\times k\) matrix with entries in \(F\) and rank \(k\). A message \((x_1,\ldots,x_k) \in F^k\) is encoded as \(y = (y_1,\ldots,y_n) \in F^n\), where \(y\) is computed as \(y = Gx\) (here I always consider vectors as columns). The code is called systematic if \(G\) has the form\[G = \begin{pmatrix}I_{k\times k}\\G_0\end{pmatrix},\] so that \(y_j = x_j\) for \(j=1,\ldots,k\). Note that this is really a property of the matrix \(G\), since the code is defined just as the image of \(G\).
We are interested in designing an \((n,k)\) systematic code that works as an erasure code and is able to correct up to \(n-k\) erasures. This means that we should be able to compute \(x\) from \(y\) even if only \(k\) of the components of \(y\) are known. This condition is equivalent to the condition that every \(k \times k\) submatrix of \(G\) is invertible. To see this, note that if we regard the equation \(y = Gx\) as a linear system for \(x\) and consider only the \(k\) equations for which we know the value of \(y_j\), then we obtain a reduced linear system \(\widehat{y} = \widehat{G}x\), where \(\widehat{y}\) is the vector of the \(k\) components of \(y\) which we know and \(\widehat{G}\) is the correspondent \(k \times k\) submatrix of \(G\). Therefore, \(\widehat{G}\) needs to be invertible so that we can find \(x\) in terms of \(\widehat{y}\).
An idea to find such a matrix \(G\) is to use Vandermonde matrices. If \(\alpha_1,\ldots,\alpha_m \in F\), the Vandermonde matrix \(V(\alpha_1,\ldots,\alpha_m)\) is\[V(\alpha_1,\ldots,\alpha_m)=\begin{pmatrix}1 & \alpha_1 & \alpha_1^2 & \cdots & \alpha_1^m\\1 & \alpha_2 & \alpha_2^2 & \cdots & \alpha_2^m\\\cdots & \cdots & \cdots & \cdots & \cdots\\1 & \alpha_m & \alpha_m^2 & \cdots & \alpha_m^m\end{pmatrix}.\]Its determinant is\[\prod_{j < k}(\alpha_k – \alpha_j),\]so the Vandermonde matrix is invertible if and only if the elements \(\alpha_1,\ldots,\alpha_m\) are distinct.
The matrix \(G\) is constructed in the following way: we choose \(n\) distinct elements \(\alpha_1,\ldots,\alpha_n \in F\) and we put\[G_0 = \begin{pmatrix}1 & \alpha_1 & \alpha_1^2 & \cdots & \alpha_1^k\\1 & \alpha_2 & \alpha_2^2 & \cdots & \alpha_2^k\\\cdots & \cdots & \cdots & \cdots & \cdots\\1 & \alpha_n & \alpha_n^2 & \cdots & \alpha_n^k\end{pmatrix},\]write \(V = V(\alpha_1,\ldots,\alpha_k)\) and define\[G = G_0 V^{-1}\]so that \(G\) has the required form to give a systematic code. This matrix \(G\) satisfies the condition that every \(k \times k\) submatrix of \(G\) is invertible, because such a submatrix is of the form \(\widehat{G} = \widehat{G}_0 V^{-1}\), where \(\widehat{G}_0\) is the corresponding \(k \times k\) submatrix of \(G_0\). Note that \(\widehat{G}_0\) is a Vandermonde matrix constructed with distinct elements, and hence invertible, so \(\widehat{G}\) is also invertible. In the intended application, \(F\) is the finite field with \(q = p^r\) elements, so up to \(q\) distinct elements \(\alpha_j\) can be chosen, and \(n\) can be at most \(q\).
The keen reader will have noticed that this construction looks very much like a Reed-Solomon code. In fact, the code described by Luigi Rizzo is the same as a Reed-Solomon code. The only difference is that Luigi Rizzo is only concerned with recovering erasures, which is much easier than recovering errors. The fact that Luigi Rizzo doesn’t mention Reed-Solomon codes makes me suspect he didn’t have a good knowledge of coding theory. The authors of zfec seem more knowledgeable and they mention that zfec is a Reed-Solomon code, albeit only in some of the comments in the source code. This doesn’t seem to be widespread knowledge, so I’ll say it again clearly:
zfec is just a Reed-Solomon code used as an erasure code
To see that this code is the same as a Reed-Solomon code, let us recall the systematic encoding of Reed-Solomon codes. As above, \(F\) is the finite field with \(q = p^r\) elements. To construct an \((n,k)\) Reed-Solomon code over \(F\), we choose distinct elements \(\alpha_1,\ldots,\alpha_n \in F\). Given \(x = (x_1,\ldots,x_k)\), there is a unique polynomial \(p_x \in F[z]\) of degree less than \(k\) such that \(p_x(\alpha_j) = x_j\) for \(j=1,\ldots,k\). Indeed, \(p_x(z) = \sum_{j=0}^{k-1} a_j z^j\), where the vector \(a = (a_0,\ldots,a_{k-1})\) is obtained as \(a = V^{-1} x\). Here \(V\) denotes the Vandermonde matrix \(V=V(\alpha_1,\ldots,\alpha_k)\) as above. The message \(x\) is encoded to produce \(y = (y_1,\ldots,y_n)\), where \(y_j = p_x(\alpha_j)\) for \(j = 1,\ldots,n\). Thus, \(y = G_0 a = G x\), where \(G_0\) and \(G\) are defined as above. We see that the systematic generator matrix for this Reed-Solomon code is the same as the generator matrix considered by Luigi Rizzo.
One comment