Reverse-engineering the DSCS-III convolutional encoder

One thing I left open in my post yesterday was the convolutional encoder used for FEC in the DSCS-III X-band beacon data. I haven’t seen that the details of the convolutional encoder are described in Coppola’s Master’s thesis, but in a situation such as this one, it is quite easy to use some linear algebra to find the convolutional encoder specification. Here I explain how it is done.

What makes finding the convolutional encoder specifications quite easy is that the encoder is systematic, which means that its input is a subset of the output. As we saw yesterday, the input bits, which we will call \(x_n\) are sent as the even bits of the output, while the odd bits of the output, which we will call \(y_n\), are computed in terms of \(x_n\) by the encoder.

The general formula for such a convolutional encoder is simple:\[y_n = \sum_{j = 0}^{k-1} a_j x_{n-j},\]where all the arithmetic is done in \(GF(2)\). When applying this to a real world system, the data starts at some point, say at \(n = 0\), and we put formally \(x_n = 0\) for \(n < 0\) so that the formula above makes sense. In the usual implementation of the convolutional encoder by using a shift register, this corresponds to starting with the register filled with zeros.

In the formula above, it is usually assumed that \(a_{k-1} \neq 0\), so that the number \(k\), which is called the constraint length, is uniquely defined. Assume for a moment that we now the number \(k\), or at least have an upper bound \(l\) for it. Then we can try to collect \(l\) row vectors formed by \(l\) consecutive input bits \(v_j = (x_{n_j}, x_{n_j+1}, …, x_{n_j+l-1})\), \(j = 1,\ldots,l\), in such a way that \(\{v_j\}\) are linearly independent. In the unlikely case that this is not possible, then the conclusion is that the input has not “enough variety” to determine the convolutional code uniquely.

If we write the matrix \(A\) which has the vectors \(v_j\) as rows, put \(\alpha = (a_{k-1},\ldots,a_0)^T\) and \(\beta = (y_{n_1+l-1}, y_{n_2+l-1}, \ldots, y_{n_l+l-1})^T\), then \(A\alpha = \beta\). Since the matrix \(A\) is invertible and \(\beta\) is known, we can solve this linear equation to find \(\alpha\), which gives us the convolutional code specification.

What about finding an upper bound \(l\) for the constraint length? Well, first of all maybe this isn’t even necessary. Constraint lengths are usually not very large (the CCSDS convolutional code has a constraint length of 7), so we can proceed by guessing a large enough value for \(l\), say \(l = 20\). This is inefficient, but it will always yield a candidate solution. The candidate solution needs to be checked against the full set of input data \(x_n\) and output data \(y_n\). If it works, then we’ve found the convolutional code. If it doesn’t work, then the constraint length is actually greater than \(l\), so we can use a larger guess for \(l\) and try again.

Alternatively, we can use the following approach. We consider a large integer \(M\) (that should be guaranteed to be larger than the constraint length), an index \(t\), and the vectors \(\gamma = (y_t, y_{t+1}, \ldots, y_{t+M})^T\), and \(w_j = (x_{t – j}, x_{t-j+1},\ldots, x_{t-j+M})^T\), for \(j = 0, 1, \ldots\). Note that \(\gamma\) is a linear combination of \(w_0, \ldots, w_{k-1}\). So to find \(k\) we start only with \(w_0\) and keep adding vectors \(w_1, w_2, \ldots\) as necessary until we obtain a set of vectors such that \(\gamma\) is a linear combination of them. Note that this condition corresponds to whether an overdetermined linear system has a solution, so it can be checked by the usual methods of Gaussian elimination or computing determinants.

In the case of the DSCS-III beacon, finding the constraint length is more simple, since the data has lots of zeros. In this case, we can look at one point in the input where a one appears followed by many zeros (looking at the figures in the last post, the bit at position 80 is a good choice), and then look at which position in the output after this point a one appears for the last time (which happens at position 89). By doing so, we find the length of the shift register, which is precisely the constraint length.

So for DSCS-III the constraint length is 10. It is easy to choose a linearly independent set of 10 vectors formed by 10 consecutive input bits (choosing them about the point where ones start to appear in the input is enough). By doing so, we can solve the linear system introduced above and find that the convolutional code is\[y_n = x_n + x_{n-1} + x_{n-2}+ x_{n-5} + x_{n-6} + x_{n-9}.\]I have checked that this is correct by going over the full dataset. Since all the frames start by and end with a bunch of zeros, we need not care about (and cannot distinguish) whether this code is applied in a streaming fashion, or separately per data frame. I have updated the Jupyter notebook to include the relevant calculations.

The fact that the convolutional code was systematic has made our life much easier. Usually, convolutional codes are not systematic, so we don’t have direct access to the input of the encoder. Still, similar linear algebra techniques can be used to reverse-engineer the code. For more details see this post by Gonzalo Carracedo EA1IYR about some work regarding reverse-engineering of convolutional codes that he originally presented in STARcon 2019.

2 comments

Scott -- K4KDR says:

June 21, 2020 at 20:06 UTC

Thanks very much for the interesting report! I do not understand all of the theory (or math), but appreciate the explanation very much.

Do we know, or is it reasonable to assume, that the same polynomial might be used on other satellites from the same family?

In general is it common for the same code to be used across spacecraft within a generation of similar units?

1. destevez says:
  
  June 22, 2020 at 17:34 UTC
  
  From reading the Master’s thesis, I understand that all the DSCS-III satellites have the same kind of beacon. This type of convolutional code might also have been used in similar spacecraft, or maybe not. It’s the first time I find a systematic convolutional code being used in practise.
  
  It’s common for an engineering solution or technique to be re-used in related designs. This not only applies to convolutional codes, but to almost anything you can think of.
  
  Finally, these days pretty much everybody uses the same convolutional code: the CCSDS r=1/2, k=7 code. This code is sometimes called the Voyager code, because, well, it dates back to the Voyager missions.

2 comments

Leave a Reply to Scott -- K4KDR Cancel reply