While DELFI-PQ worked well, neither AMSAT-EA nor other amateur operators were able to receive signals from EASAT-2 or HADES during the first days after launch. Because of this, I decided to help AMSAT-EA and use some antennas from the Allen Telescope Array over the weekend to observe these satellites and try to find more information about their health status. I conducted an observation on Saturday 15 and another on Sunday 16, both during daytime passes. Fortunately, I was able to detect EASAT-2 and HADES in both observations. AMSAT-EA could decode some telemetry from EASAT-2 using the recordings of these observations, although the signals from HADES were too weak to be decoded. After my ATA observations, some amateur operators having sensitive stations have reported receiving weak signals from EASAT-2.

AMSAT-EA suspects that the antennas of their satellites haven’t been able to deploy, and this is what causes the signals to be much weaker than expected. However, it is not trivial to see what is exactly the status of the antennas and whether this is the only failure that has happened to the RF transmitter.

Readers are probably familiar with the concept of telemetry, which involves sensing several parameters on board the spacecraft and sending this data with a digital RF signal. A related concept is radiometry, where the physical properties of the RF signal, such as its power, frequency (including Doppler) and polarization, are directly used to measure parameters of the spacecraft. Here I will perform a radiometric analysis of the recordings I did with the ATA.

The recordings are already published in Zenodo, in the following datasets:

- Recording of DELFI-PQ/EASAT-2/HADES PocketQubes and other amateur satellites in the SpaceX Transporter 3 launch with the Allen Telescope Array on 2022-01-15 (antenna 1a)
- Recording of DELFI-PQ/EASAT-2/HADES PocketQubes and other amateur satellites in the SpaceX Transporter 3 launch with the Allen Telescope Array on 2022-01-15 (antenna 1f)
- Recording of DELFI-PQ/EASAT-2/HADES PocketQubes and other amateur satellites in the SpaceX Transporter 3 launch with the Allen Telescope Array on 2022-01-16

Here I will mainly be analysing the recording from 2022-01-15. I haven’t done an analysis of the recording from 2022-01-16 with the same level of detail yet, but I will show a few new things from this recording at the end of the post.

To do these recordings I used antennas 1a and 1f. The Allen Telescope Array antennas are 6.1 m dishes with a Gregorian structure and a logperiodic feed, giving very wide frequency coverage. The feed has dual linear polarization. The two polarization channels are called X and Y, which correspond to horizontal and vertical (or maybe the other way around, as I tend to forget).

Antennas 1a and 1f in particular have “old feeds”, which have a coverage between roughly 0.5 and 10 GHz. The newer feeds, which have been installed in most of the antennas, cover between 1 to 14 GHz, with much better performance at high frequencies, due to the fact that the feed is fully cryogenic, instead of only the LNA. However, for observations in the 435 MHz amateur satellite band, the old feeds are much better (though still this is outside the design frequency coverage of the feed).

Something to keep in mind is that the ATA antennas can only track above 16.8 degrees elevation. Moreover, when tracking a TLE, tracking will start fixed at the location where the skytrack crosses 16.8 degrees elevation when the satellite rises, and will also stop at the location where the skytrack crosses 16.8 degrees elevation as it sets. Both the azimuth and the elevation of the antennas will stay fixed when this happens, so there will be a pointing error both in azimuth and elevation.

The recordings I did used a sample rate of 3.84 Msps in order to cover the full 435 – 438 MHz amateur satellite band. Besides the three satellites we study in this post, many other satellites from the Transporter-3 launch, or that just happened to be in the sky at that moment, appear in the recordings, so they could be of interest to other satellite operators. The first step in processing the data is to channelize and downsample the recording for each of the three satellites we are studying. This GNU Radio flowgraph is used to downconvert the signals from each satellite (using centre frequencies of 436.650, 436.680 and 436.895 MHz for DELFI-PQ, EASAT-2 and HADES respectively) and decimate to 40 ksps. The resulting files are analysed in a Jupyter notebook.

The first step in the processing is to perform Doppler correction. One of the other things I have done with these recordings is to use STRF (see also my crash course) to measure the Doppler of these satellites and fit TLEs. The results are in this repository. Using these fitted TLEs and Skyfield, we can compute the Doppler for each of the satellites and perform Doppler correction.

When the recordings were done, the three satellites were still quite close together. In fact, their Doppler curves are only a few seconds apart, so it is difficult to tell them apart just by using Doppler measurements. On the other hand, this is helpful, because the half-beamwidth of the ATA antennas at 436 MHz is around 4 degrees, so we know that we had the three satellites close to the centre of the beam. Other satellites from the Transporter-3 launch were minutes behind these, putting them many degrees away on the sky, so they were received with the antenna sidelobes.

The next step is to do an amplitude calibration of the 4 channels (two antennas times two polarizations), which is necessary because the electronics of each channel have somewhat different gains. In radio astronomy this calibration is usually done by observing a (mostly unpolarized) radio source that might also serve for absolute amplitude calibration. Other sources such as a satellite with a stable, circularly polarized signal, could be used as well.

In our case, we do not have such kinds of signals. An alternative way to perform the calibration is to assume that all the channels have the same noise temperature, and calibrate the amplitudes so that the noise floor level is the same in all the channels. This is just an approximation because in reality each of the channels will have slightly different noise temperatures. Also, any polarized interference or interference that is local to one of the antennas will mess up with this technique.

Without a better alternative, this is the approach that we follow here. We use a frequency segment in the DELFI-PQ channel that is clean of interference and measure the average power for each channel over all the duration of the recording. This is used to determine a constant amplitude scaling for each channel.

We compute waterfall data for each of the channels using an FFT resolution of around 9 Hz. Adding the calibrated waterfalls from the four channels, we obtain the Stokes I brightness for each of the satellites. This is plotted below. The dynamic range in the plot is only 5 dB, so that we can see the weak signals of HADES. The signals from DELFI-PQ and EASAT-2 saturate this scale.

DELFI-PQ is sending a beacon packet every minute, as it usually does. EASAT-2 follows its transmission schedule, which is described in page 13 of this document. At the start of every minute (according to the on-board time, which is way off from UTC), a “fast” (short) 50 baud FSK telemetry packet is sent. Then, at 30 seconds in the minute some kind of data is transmitted. The nature of this data changes depending on the minute in question, following a cycle of 14 minutes. The transmissions we can see are the FM transmission of a voice beacon (1st, 5th, 8th, and 12th transmission), an FSK-CW beacon (3rd transmission), and the 50 baud FSK transmission “slow” (long) telemetry packet (10th transmission). The remaining transmissions we see are “fast” telemetry packets. The deviation of the FSK modulation is around 600 Hz. There is some frequency instability in the transmitter, which is best seen in the zoomed in waterfall below (which now uses a dynamic range of 20 dB). Note some rather fast occasional frequency “glitches”, besides the overall drift.

The signal of HADES is much weaker. The following zoomed in waterfall enhances it slightly (4 dB dynamic range). We can see that it follows the correct transmission schedule, albeit with a different time offset than EASAT-2 (this is just due to the on-board times of the satellites). In the waterfall we can see the FSK tones and also the carrier of the FM voice transmissions. The deviation is around 4.5 kHz, much larger than that of EASAT-2. This is by design, so it is to be expected.

By zooming in to one of the FSK tones, we can see qualitatively the same kind of frequency drift as in EASAT-2. However, we don’t see the fast frequency glitches (though maybe this is just because the signal is too weak to see them should they happen).

Studying the signal power is of great interest given the hypothesis that the antennas haven’t managed to deploy. Besides the fact that the gain of an undeployed antenna is much less than the gain of a properly deployed antenna, its impedance will also be quite far from the expected value. Therefore, the power amplifier might not be able to supply its full power to the antenna due to impedance mismatch. Moreover, it might happen that this impedance mismatch makes the amplifier fail (either instantly or eventually). Additionally, many more things can go wrong in the transmitter during launch, causing decreased signal levels. For instance, anything involving broken components or solder joints. Additionally, it should be kept in mind that even with failed components or broken solder joints, if the leakage of the transmit signal is enough, a sensitive instrument such as the ATA or another larger telescope might be able to detect the downlink signals.

All this makes it difficult to be sure about the exact nature of the failure just from observing the RF signal. Anechoic chamber measurements of the satellite with the antennas folded can be useful as a guide to compare with what is seen in orbit. However, these are not available for EASAT-2 and HADES.

When measuring the power of the signals of EASAT-2 and HADES, we can use the signal of DELFI-PQ as a reference. This satellite is working well and its signals are as strong as expected. We need to keep in mind that DELFI-PQ has a transmit power of 24 dBm EIRP (taking into account the gain of its dipole antenna). The transmitter of EASAT-2 and HADES only has 16 dBm of power (not taking into account the gain of the antenna).

To measure the signal power of DELFI-PQ and EASAT-2, we measure the signal plus noise power in some frequency sub-bands that contain the signals of interest. These have been marked with gray lines in the waterfall shown above. Additionally, we measure the noise spectral density in the sub-band marked with dashed lines in the EASAT-2 channel. This measurement is used to subtract the noise from the signal plus noise measurement, giving us only signal power. All this is done in the Stokes \(I\) data formed by averaging the data from the two antennas and polarizations.

These three measurements are shown below. Note that the amplitude has been scaled so that the noise in an FFT bin (~9.8 Hz bandwidth) averaged over the recording duration is one. Therefore, the noise spectral density (per FFT bin) we measure is close to one, except for some increases due to interference.

Now we can turn this data into an estimate for the CN0 of the signals. For the noise spectral density N0 here we use the average over all the recording, rather than the time varying measurement done above. This is better suited to our purpose of studying the signal power, since changes in the noise floor due to interference won’t impact our measurement. In this respect, N0 is just used as a constant for power scaling, which can be related to the noise temperature of the instrument.

We see that the signals of DELFI-PQ are between 70 and 80 dB·Hz, while the signals of EASAT-2 are between 40 and 45 dB·Hz. Additionally, EASAT-2 has between 30 and 35 dB less power than DELFI-PQ. If we assume that EASAT-2 is transmitting at its full 16 dBm power, the conclusion here is that its antenna has a gain on the order of -20 to -25 dBi, because DELFI-PQ has an EIRP of 24 dBm. Note that this EIRP figure for DELFI-PQ, which has a dipole antenna, probably refers to the peak antenna gain (which should be around 2 dBi), so in reality the EIRP we’re seeing from DELFI-PQ could be a few dB less than 24 dBm.

It’s worth to compute the link budget for DELFI-PQ to check that our measurements are on the correct ballpark. However, we do not have any data for the performance of the ATA antennas at 436 MHz, since the feeds are only specified down to 500 MHz. Assuming a system temperature of 100 K (the system temperature at higher frequencies is maybe 30-50 K) and an antenna gain of 26.7 dBi (which gives a G/T of 6.7 dB K`⁻¹`

), for a distance of 700 km (corresponding to the closest approach) and a transmitter of 26 dBm EIRP, we would see an SNR of 82.2 dB·Hz. Here we have taken into account the fact that here we are using Stokes \(I\), so the noise power is double from what we would have in a single-polarization system. The maximum power we’re seeing in our measurements of DELFI-PQ is slightly less than 80 dB·Hz, so everything seems reasonable giving or taking a few dB.

The measurement of the signal power of HADES can’t be done in the same way, because the signal is much weaker. We do a single measurement by averaging over the duration of one of the packets that seems strongest in the waterfall. The figure below shows the PSD of this packet.

Measuring the signal plus noise about the two FSK tones, and taking some free portion of the spectrum to measure noise spectral density, we get a CN0 of 22.9 dB·Hz. This is 20 dB weaker than EASAT-2, and all in all, between 50 and 55 dB less than DELFI-PQ. These losses might seem too high to be only caused by the geometry of the undeployed antennas. It might be the case that the power amplifier is not delivering the power it should, or perhaps it has failed completely. AMSAT-EA thinks that the reason for the signal power difference between EASAT-2 and HADES is that the antennas of the two satellites were folded in a somewhat different manner, but there are no on-ground measurements of the effects of this difference.

One of the advantages of using the Allen Telescope Array is that it is a dual polarization instrument, and can be used to measure the polarization of the RF signal. This can give us interesting information about the antennas and the attitude of the satellite. Many small satellites have dipole or monopole antennas, which are linearly polarized. The polarization angle of the RF signal will depend on the angle with which the antenna is seen from ground, and will change as the satellite rotates.

Other satellites have turnstiles or other kinds of circularly polarized antennas. Typically, these only have their intended circular polarization when seen from boresight. Seen from the side they often have linear polarization, and from the back they will typically have the opposite circular polarization. This topic has already appeared in my studies about the polarization of Chang’e 5, although in this case we had the additional difficulty that we didn’t now the types of antennas used nor how they were placed in the spacecraft body.

EASAT-2 and HADES were launched with their antennas wrapped around the spacecraft body, as seen in this image. The antenna for the 436 MHz downlink is a monopole built with metric tape. If the antennas were correctly deployed, we would expect to see linear polarization. However, with the antennas folded in this way, the polarization will be some kind of elliptical polarization that depends on the angle with which we view the spacecraft.

To measure the polarization correctly we need to do some calibrations. Gains were calibrated as indicated above. For polarimetry it is important that the gain calibration is accurate, since any gain errors will make Stokes \(I\) leak into Stokes \(Q\). Another parameter that needs to be determined is the phase offset between the X and Y channels of each antenna. This is a constant for each antenna that is relatively stable over frequency. Without knowing this offset, it is impossible to separate the cross-correlation of the X and Y channels into Stokes \(U\) and \(V\).

There are several methods of calibrating the X-Y phase offset. The most straightforward is to observe a linearly polarized source, for which we know that \(V = 0\). Note that this only solves the phase offset up to a 180º ambiguity, since we don’t know whether \(U > 0\) or \(U < 0\) unless we know the polarization angle of our calibrator source. Additionally, the polarization angle of the calibrator shouldn’t be 0 or 90 degrees, since in this case we wouldn’t see anything in the correlation of X and Y.

In radio astronomy other approaches are followed, since sources are not strongly polarized (typically 10% or less polarization degree), so that any polarization leakage can swamp out the source polarization. Therefore, polarization leakage is typically determined previously or together with the X-Y phase offset. Since we’re dealing with fully polarized signals, we do not need these additional complications, and in fact we will not calibrate and remove the polarization leakage.

Here DELFI-PQ is again useful as a reference, since its downlink antenna is a dipole, with linear polarization. We will use its signal for X-Y phase offset calibration, by taking a strong packet, and determining the phase offset that makes \(U > 0\) and \(V = 0\). Note that the choice of \(U > 0\) is arbitrary, so the sign of \(U\) and \(V\) in our results could be wrong, as well as the sign of the polarization angle that we measure.

Another effect that involves the polarization is the Faraday rotation in the ionosphere, which at 436 MHz is significant. We will ignore this, which means that we will not be able to determine absolute polarization angles of the transmitted signal (astronomers like to think that the ionosphere rotates their telescopes). The change in Faraday rotation as the satellite crosses the sky is also important, but this happens relatively slowly, so it is of no concern for the interpretation we will do of our results.

Finally, some care needs to be taken with this results, since the instrumental polarization of the ATA antennas depends on the position of the objects within the antenna beam. More information can be seen in ATA memo 88 about wide field polarization calibration. In our case, we do not know if the satellites were exactly in the centre of the beam, due to possible errors in the TLEs and the fact that the tracking of fast LEO passes may not be as accurate as the tracking of objects moving at sidereal rate (although the ATA antennas can track most LEO passes). Still, judging the results we get from DELFI-PQ, our measurements seem of reasonable quality.

Besides calibrating the X-Y phase offset, we consider the frequency sub-band in the EASAT-2 channel that we used to measure noise. Taking this sub-band we can measure the Stokes parameters of the noise spectral density. The \(I\) accounts for the noise temperature of the system, but \(Q\), \(U\), \(V\) are non-zero because of instrumental polarization leakage and perhaps polarized interference. This is then subtracted from the Stokes parameters of the signal plus noise measurements of DELFI-PQ and EASAT-2 in order to cancel out the contribution from the noise. This correction is only significative in the case of EASAT-2, since the signal of DELFI-PQ is much stronger than the noise.

The figure below shows the Stokes parameters and polarization angle of the DELFI-PQ packets. In order to show each packet in detail, only time segments around each of the seven packets in the recording have been plotted. The legend for the plot is as follows: Stokes parameters are referred to the left axis, using blue for \(I\), orange for \(Q\), green for \(U\), and red for \(V\); the polarization angle is referred to the right axis and plotted in grey. The upper and lower rows show the data for each of the two antennas. We can see that they are quite consistent.

What we see here is reasonable for a linearly polarized satellite that is tumbling. Stokes \(V\) is very small, indicating little circular polarization. Inside each packet, which lasts for about 4 seconds, we can see the polarization angle changing roughly as a sinusoid. This is to be expected. For a short timescale, we can assume that the satellite is rotating with constant frequency about a constant axis. Then the angle with which we would see the antenna on ground follows a sinusoid whose frequency equals the rotation frequency and whose amplitude and offset can be determined from the relative angles between the antenna axis, the rotation axis, and the line of sight.

Glancing at the plots, we can see that the rotation period is on the order of perhaps 6 to 10 seconds. Moreover, the amplitude and offset of the sinusoid change in each packet, indicating a changing geometry for the tumbling of the satellite. Keep in mind that part of this change is due to the rotation of the line of sight vector with respect to the inertial frame as the satellite crosses the sky.

Now, for EASAT-2 we get the following plot, which shows all the packets present in the recording, using the same kind of legend. Again, we see good consistency between each of the two antennas.

The results are now much harder to interpret. We see that now \(V\) is non-zero and changes sign depending on the packet. This shows the presence of circular polarization, which supports the idea that the antennas are not deployed correctly. The curves traced by each of the Stokes parameters are much less regular than for DELFI-PQ. Without a knowledge of the polarization radiation pattern of the folded antennas (which could be measured in an anechoic chamber), it isn’t easy to understand what is going on here. Still we can say something very approximate about the tumbling rate of the satellite, looking at the time scale of the polarization variations within each packet. These seem to indicate a tumbling rate similar to that of DELFI-PQ, with a period of perhaps 5 to 10 seconds.

Since the signal of HADES is very weak, I have not attempted to study its polarization, but it would be interesting to check if there is a significant circular polarization.

Here I comment on some new signal aspects that appeared in the recording from 2022-01-16. The waterfalls for the three satellites corresponding to this recording are shown here. We can see that DELFI-PQ and HADES look roughly the same as the previous day. However, EASAT-2 has a very large frequency instability.

This is seen better in the plot below, which shows only the time and frequency range over which there are packets from EASAT-2.

Zooming in to the long transmission around 17:59 UTC, we see the following. This is a 50 baud FSK packet. The drift is as high as 3 kHz. Additionally, we can see that at times the FSK tones are much wider than expected, and indeed with this FFT resolution their power seems to split into two sidebands. Compare the tones around 17:58:51, which are normal, with those at 17:58:56, which are anomalous.

The reason for this can be seen better in inspectrum, where we can play interactively with the FFT resolution. The figure below shows the FSK tones for one of the good parts of this packet. Note that the tones are clean and they show up as straight lines.

Now, this is how one of the bad parts of the packet looks like (click on the image to see it in full). The FSK tones have a superimposed sinusoidal frequency ripple. The frequency of this ripple is approximately 75 Hz. The cause of this ripple is not known, but this is something that didn’t happen during the previous observation.

This ripple comes and goes in a smooth fashion, slowly increasing in amplitude over the course of 5 seconds or so, and then decreasing again until it disappears. This happens throughout all packets in the recording. In the FM voice beacons it can be seen in the carrier.

It is not known if the much larger frequency instability and the presence of this ripple are related or if they are two independent problems. Additionally, this ripple hasn’t been observed in HADES, although if it were present it would be harder to see it, due to the much weaker signal.

Since the recordings have two antennas, interferometry could be used to try to distinguish the spacecraft for orbit determination. The signals are only a few kHz wide, so we would need to do narrowband interferometry (see Section 13.7.1 in the book by Moyer). This is equivalent to measuring the Doppler difference between the two antennas. The baseline formed by antennas 1a and 1f is only 79 m long, and is roughly oriented north-south, the same as the direction of the pass of these satellites.

I haven’t thought carefully whether this narrow band interferometry gives results of any interest. I have done some simplified calculations for a satellite with orbital velocity \(v\) and height \(h\) passing directly overhead a baseline of length \(l\), and with the orbital plane containing the baseline. Around the closest approach, to first order estimate the narrowband interferometry gives \(vl/h\). This means that it is not sensitive to along-track errors, and is only able to estimate the orbital velocity and height, which are often better constrained in terms of the timings of successive passes (since errors in the orbital period accumulate over time). Higher order terms might have more interesting information, however.

The recordings used here have been linked at the beginning of the post. The GNU Radio flowgraph and Jupyter notebooks can be found here.

]]>I have also published an excerpt of the recording of James Webb Space Telescope that I did on December 26. This is just the first 25 minutes of the recording, so that both polarizations fit into maximum 50 GB of a Zenodo dataset. The sample rate is still 3.84 Msps, so the sequential ranging tones are present in these files. The dataset is called “James Webb Space Telescope S-band recording with Allen Telescope Array (wideband excerpt)“. In some days I will also publish a decimated version (containing the telemetry but not the ranging tones) of the full recording.

**Update 2022-01-03:** I have now published the full recording decimated to 320 ksps. This is available in the dataset “James Webb Space Telescope S-band recording with Allen Telescope Array (320 kHz bandwidth)“.

After looking at the waterfall of the recording carefully, I saw that there are sequential ranging signals present almost all the time. This is expected. Since the recording was done 7 hours after the first correction manoeuvre, the DSN would be doing ranging to compute accurate ephemerides. Often, ranging signals are not used every time that a spacecraft is tracked, but only when the ephemerides need to be refined, such as when planning a manoeuvre or shortly after executing one.

In this post I analyse these sequential ranging signals. I still haven’t had time to publish the recordings in Zenodo. After seeing that the wideband recording is of interest, due to the presence of these signals, I’m planning to publish a shorter segment of the wideband recording (the full recording is 241 GB per polarization) and publish a decimated version of the full recording where only around 100 kHz of spectrum are present (which is enough for the telemetry signal).

Sequential ranging, as performed by the DSN, is described in this module of the Telecommunications Link Design Handbook. Briefly speaking, it involves including a tone (either a sinusoid or square wave) in the phase-modulated uplink signal, which then gets remodulated into the downlink by the spacecraft transponder, and finally its phase is measured when the downlink arrives back at a groundstation. This serves to measure two-way (or three-way) light-time delay through the spacecraft transponder.

However, since phase measurements are ambiguous, there is an ambiguity of an integer number of wavelengths in the measurement. To solve this ambiguity, rather than using a tone of a single frequency, a sequence of tones of different frequencies are transmitted successively. The frequency of each tone in the sequence is half of that of the previous tone, so the ambiguity (in distance units) is twice as large. The sequence contains as many tones as needed so that the frequency of the last tone is low enough that the ambiguity it gives is larger than the a-priori estimate on the light-time delay. This allows solving the ambiguities of all the tones and producing an unambiguous measurement. Only the first tone, which has the highest frequency, and hence the highest “resolution”, is used to produce the final measurement. The remaining tones are only used to solve the ambiguities.

Another important detail is that if we were to produce the sequential ranging signal exactly as I have described, at some point the frequency of the tones would get too low and they would interfere with the telecommand and telemetry signals. To avoid this, chopping is used after some point in the sequence. This consists in sending the product of a fixed tone in the sequence, called the chop component, and the tone that should be sent. This produces tones at the sum and difference frequencies. If the frequency of the chop component is high enough, then the resulting frequencies are also high and do not interfere with the telecommand and telemetry. Often, the first tone in the sequence, which is called the range clock, is used as chop component.

The frequency of the range clock, and hence of all the other tones, is locked coherently to the uplink carrier frequency. This is done so that the tones on the downlink are coherent with the downlink carrier (since the spacecraft uses a coherent transponder). On the groundstation receiver, the phase information from the PLL that locks to the downlink carrier can be used to derive an accurate frequency reference for the ranging tones and integrate them coherently for several seconds.

This implies that the frequencies of the tones are not nice round values. In fact, for an S-band uplink, as is the case of JWST, their frequencies are of the form \(f_n = 2^{-7-n} f_U\), where \(n \geq 0\) and \(f_U\) is the uplink frequency. Often, \(n = 4\) is chosen for the range clock to give a range clock frequency of approximately 1 MHz. The next tones in the sequence are then approximately 500 kHz, 250 kHz, etc.

The sequential ranging tones in the recording of JWST are rather weak. The waterfall below, done with inspectrum shows the spectrum around the ranging clock frequency in the phase demodulated downlink. We can see the ranging clock towards the right side of the plot. It lasts for ~8 seconds. Before it, we see the the last components of the previous sequence. These are chopped with the ranging clock, so they appear as two tones symmetric above and below the ranging clock frequency. They are harder to see because the power is split among the two tones. These components lasts for ~4 seconds.

The duration of the ranging clock is chosen to give the desired ranging precision, according to the ranging clock frequency and the SNR of the ranging tone. The duration of the remaining components is chosen long enough to give a high probability of solving the ambiguities correctly. Usually, the ranging clock is longer than the other components.

Since it is hard to see the ranging tones in the waterfall, I have made a GNU Radio flowgraph to make them easier to process and see. This flowgraph extracts a 100 Hz spectrum slice around each of the tone frequencies. The output of the flowgraph can then be processed with the appropriate FFT parameters to make the tones more visible.

The flowgraph is a bit cumbersome, because there are many tone frequencies. JWST uses \(n = 4\) as ranging clock (~1 MHz) and the ranging sequence goes up to \(n = 17\) (~125 Hz). All the components starting with the second (\(n=5\)) are chopped with the ranging clock, so they produce two tones. Thus, there is a total of 27 tone frequencies.

To generate the local oscillators for the tone frequencies, I start with the lowest frequency one (~125 Hz) and successively square it to double its frequency until I arrive at the ranging clock. This ensures that all the local oscillators are coherent. If for instance, different Signal Source blocks were used for each of the frequencies, then rounding errors could make them non-coherent. Although in this post we will not use this coherence, because we will not measure the phase of the tones, this property could be interesting for a future study.

After mixing with the appropriate local oscillator, each of the tones is decimated from 3.84 Msps to 100 sps by using two FIR filters, performing decimation by 384 and 100.

The output of the flowgraph is processed in this Jupyter notebook. To show the ranging tones we use a 32 point FFT, giving a frequency resolution of ~3 Hz. A waterfall of each of the tones near the start of the recording is shown here.

Note that all the tones are shown with the same vertical spacing in this plot, but this is just for practicality. In reality the frequency separation between each successive tone is halved, so they follow exponential curves in the waterfall, as shown above in the inspectrum plot.

We see that the full ranging sequence lasts 60 seconds. The ranging clock lasts ~8 seconds, and the remaining 13 components last ~4 seconds each. We will look at the timing more precisely below. Another thing to notice is that the upper chop tone of the second component, \(f_4 + f_5\), which is approximately 1.5 MHz, is quite weak. We will look at the power of each of the tones later, and see that this is probably due to a low-pass filter in the transmitter and/or spacecraft transponder.

The plot shown above is not appropriate to show the data for the full recording, since the tones would be so short that they won’t be visible. To be able to show the full recording in one plot, we take advantage of the 60 second measurement cycle. First we use a 10 point FFT and use the 3 central bins to measure the power of each of the tones. Then we can plot this power in a 2D plot where each of the lines represents one minute of data and the x axis corresponds to the second within the minute. The plot for the ranging clock is shown here.

We see that the ranging clock happens at the start of the minute until approximately 10:25 UTC. Then there are no ranging clocks for some minutes, until at around 10:35 the ranging clock appears again, but now starting at ~:45 seconds in the minute. The measurement sequence is still 60 seconds long. These tones are rather weak, but then there are some very strong tones. Next, at 10:45 the position of the ranging clock shifts to :30 seconds in the minute. There are also a few very strong tones around 10:55, and the ranging continues in this manner until the spacecraft sets for ATA. The plots for the remaining tones in the sequence can be seen in the Jupyter notebook. They are similar to this one, except that each tone starts in its corresponding position in the sequence.

I think that the stop at 10:30 is due to the hand off between Goldstone and Canberra. However, it is quite interesting that the ranging timing has moved from the start of the minute.

The interesting part between approximately 10:35 UTC and 11:00 is shown below. It is apparent that there are large changes in the power of the ranging tones and that at 10:47 the sequence changes timing.

Using the data from these plots, we can measure the power of each of the tones quite accurately, using the first hour of data to average. The results are shown below. The power of all the tones except for the ranging clock has been multiplied by two to account for the fact that the total power is split in two due to chopping. We see that the power of the tones decreases with frequency, so apparently there is a low-pass filter, which is quite common.

Now let us look to the timing of the ranging tones more in detail. Keen readers might have observed in the plot of the ranging clock that its start time is moving slightly to the right. In fact, if we zoom in to the first second in the minute of this plot, this is clear, even though we only have a resolution of 100 ms in this plot.

According to the ephemerides from NASA HORIZONS, at 07:30 UTC, JWST was at 182831 km from Goldstone and 183168 km from the ATA, giving a two-way light-time of 1.221 seconds (here we are ignoring the relative motion of the spacecraft and groundstations during the signal travel time for simplicity).

In Section 2.2.4.1 in the sequential ranging documentation, the timing of the ranging signal is described. The ranging clock always starts at an integer second. Actually it starts one second before the intended transmit time \(\mathrm{XMIT}\), so that the ranging clock is already present when its reception starts (which is also done at an integer second). This matches what we see, because we are seeing the ranging clock arriving at ~0.2 seconds in the minute, so it must have been set to an intended transmission time of 0 seconds in the minute, and actually start at 59 seconds in the minute.

Four hours later, at 10:30 UTC, the distance between JWST and Goldstone was 204223 km, and the distance between JWST and ATA was 204150 km. This gives a light-time delay of 1.362 seconds, which represents an increase of 141 ms in the light-time delay. Even though we only have 100 ms resolution in the plot above, we can see that the plot is in-line with what the ephemerides describe.

Sequential ranging measures the light-time delay not by detecting when the tones start (as we are seeing here, it is not possible to do this with good resolution when the signal is weak), but rather by measuring their phase. Still, I think that the plot above is a nice demonstration of how the light-time delay increases as the spacecraft travels away from Earth.

Regarding the timing of the remaining tones, the documentation states that the ranging clock is present from \(\mathrm{XMIT} – 1\) to \(\mathrm{XMIT} + T_1 + 1\) and that then at some point within that second the next component starts. As we have seen, \(\mathrm{XMIT}\) must be 0 seconds in the minute (in the first part of the recording, when Goldstone is transmitting). Since the ranging clock duration is somewhat short of 8 seconds, we see that \(T_1 = 5\) seconds. The second component lasts until at least \(\mathrm{XMIT} + T_1 + T_2 + 2\), and then changes to the next component at some point within the following second. Since this component lasts approximately 4 seconds, we see that \(T_2 = 3\) seconds. The formula for the cycle time\[T_1 + 3 + (n_L-n_{RC})\cdot(T_2+1)\]checks out, because with these values and \(n_L = 17\), \(n_{RC} = 4\) we get 60 seconds.

To summarize, the parameters used by the JWST sequential ranging in this recording are the following.

Ranging clock component \(n_{RC}\) | 4 |

Last component \(n_L\) | 17 |

Ranging clock transmit time \(\mathrm{XMIT}\) | :00, :45 and :30 |

Ranging clock duration \(T_1\) | 5 seconds |

Other components duration \(T_2\) | 3 seconds |

Cycle time | 60 seconds |

Chop component | 4 |

First chopped component | 5 |

After launch, the first groundstation to pick the S-band signal from JWST was the 10 m antenna from the Italian Space Agency in Malindi, Kenya. This groundstation commanded the telemetry rate to increase from 1 kbps to 4 kbps. After this, the spacecraft’s footprint continued moving to the east, and it was tracked for a few hours by the DSN in Canberra. One of the things that Canberra did was to increase the telemetry rate to 40 kbps, which apparently is the maximum to be used in the mission.

As JWST moved away from Earth, its footprint started moving west. After Canberra, the spacecraft was tracked by Madrid. Edgar Kaiser DF2MZ, Iban Cardona EB3FRN and other amateur observers in Europe received the S-band telemetry signal. When Iban started receiving the signal, it was again using 4 kbps, but some time after, Madrid switched it to 40 kbps.

At 00:50 UTC on December 26, the spacecraft made its first correction burn, which lasted an impressive 65 minutes. Edgar caught this maneouvre in the Doppler track.

Later on, between 7:30 and 11:30 UTC, I have been receiving the signal with one of the 6.1 metre dishes at Allen Telescope Array. The telemetry rate was 40 kbps and the spacecraft was presumably in lock with Goldstone, though it didn’t appear in DSN now. I will publish the recording in Zenodo as usual, but since the files are rather large I will probably reduce the sample rate, so publishing the files will take some time.

In the rest of this post I give a description of the telemetry of JWST and do a first look at the telemetry data.

The lower rate configurations of the telemetry signal of JWST use PCM/PSK/PM with a 40 kHz subcarrier, while the 40 kbps configuration uses PCM/PM/NRZ. The spacecraft uses CCSDS concatenated coding, so the 4 kbps configuration actually corresponds to exactly 8 kbaud, while 40 kbps is 80 kbaud.

According to the data recorded by Iban, the 4 kbps telemetry uses a single (252, 220) Reed-Solomon codeword. This choice is interesting, because it gives a frame size of 2048 bits at the input of the convolutional encoder, taking into account the ASM. Several Chinese missions such as Tianwen-1 and Chang’e 5 have used this codeword size, because the baudrates they use are powers of two, such as 2048 or 16384 baud. By having 2048 bit frames, the get a nice round value for the time it takes to transmit a frame, such as 2 seconds, or 0.25 seconds. However, in the case of JWST the baudrate is not a power of 2, but rather a “nice round number in base 10”, such as 4000 or 40000. Therefore, they don’t get these nice round durations for the frames. Thus, it is curious that they have chosen the shortened size of (252, 220) rather than the full size of (255, 223).

The 40 kbps telemetry uses 5 interleaved (252, 220) Reed-Solomon codewords, so the total frame size is 1100 information bytes.

The GNU Radio decoder flowgraph that I used with Iban’s recording, which contains 4 kbps telemetry, can be downloaded here. It is shown in the figure below.

Unfortunately, the SNR in Iban’s recordings is slightly less than needed for decoding, and I haven’t been able to get a single Reed-Solomon frame decoded correctly. Still, taking into account that the data has errors, one can look at the frames and learn some things, since the Reed-Solomon code is systematic. This is what r00t.cz has been doing with the recordings.

For the 40 kbps telemetry in the ATA recordings I have used this flowgraph, which is shown here.

The antenna feeds in the ATA use dual linear polarization (X and Y), so the Auto-polarization block combines the two polarizations to maximize the SNR. The signal from JWST is nominally circularly polarized (RHCP, I think), but since the low gain antenna is a patch antenna and we don’t see it directly from its boresight direction, in general we will see some elliptical polarization (see my post about the Chang’e 5 polarization). I observed that at the beginning of the recording there was much more signal power in the X polarization than in the Y polarization. I will have to check how this evolves throughout the recording. The figure below shows the spectrum of the signal using only the X polarization.

The SNR in the X polarization is barely enough to decode, and some of the frames can be decoded, but others not. By combining both polarizations we gain some SNR and it is possible to decode a large fraction of the frames.

The figure below shows the GUI of the GNU Radio decoder running with the beginning of the ATA recording. We can see the difference of SNRs between the X and Y polarizations in the upper left spectrum plot. We see that the symbols are very noisy, so it almost seems magic that the Viterbi and Reed-Solomon decoder are able to decode so many correct frames.

The frames transmitted by JWST are CCSDS AOS frames. The spacecraft ID is 170 (0xaa), which matches the ID used in NASA HORIZONS and the SANA registry. There are two virtual channels in use, virtual channel 0, which carries the telemetry, and virtual channel 63, which is Only Idle Data. The Only Idle Data frames have all the payload (everything besides the AOS primary header) filled with `0x78`

bytes, which is ASCII for `x`

.

Approximately 95% of the frames belong to virtual channel 0. The frames in virtual channel 0 contain CCSDS Space Packets using the M_PDU protocol. The last 4 bytes of the frame are a trailer. It seems that the contents of the trailer cycle through the values `0x010001eb`

, `0x0904019e`

, and `0x09080100`

every three frames (at least near the beginning of the recording). I am not sure what this trailer represents. I don’t think it is a Communications Link Control Word (as described in the CCSDS TC Space Data Link Protocol) because one of the reserved bits is set to one instead of zero as it should. However I can’t rule out the possibility completely, since the values of the rest of the fields could make sense.

The figure below shows the number of frames lost in virtual channel 0 according to the jumps in the virtual channel frame counter. We can see that towards the end of the recording, as the spacecraft elevation decreases and eventually goes under the elevation mask, the error rate increases. Overall, 76% of the frames in virtual channel 0 have been decoded correctly.

Many Space Packet APIDs are active. As usual, I have done raster plots of each of them in a Jupyter notebook. Glancing through these raster plots, my impression is that there are many that have complex data structures, though there are also many zones padded with zeros. The two figures below show some examples of how the APIDs look like. The full list of plots can be seen in the Jupyter notebook.

I have also seen that there are many fields with floating point numbers. These usually have a distinct “texture”, so they are not so difficult to spot in these raster plots.

I have tried to go through all the APIDs, plotting the values of all the floating point fields, though I haven’t tried to be exhaustive and might have left some. I haven’t seen any that look as interesting as the state vector data of Tianwen-1. Most of the floating point fields are 32-bit wide (using IEEE 754 big-endian representation), but there are a few that are 64-bit wide.

Perhaps the floating point channels that I have found more interesting are these adjacent three, which appear in APID 1201, and also in APIDs 1404 and 1727 (in several cases it seems that the same or very similar data appears in some fields of several different APIDs).

Another set of floating point channels which looks interesting are the following 6 adjacent channels in APID 1755.

The full list of these plots is also in the Jupyter notebook. At the moment I have no idea of what kind of data any of them are showing. One should be careful when interpreting the data, because there is even the chance that some of the fields are not really floating point, but integers, even though they have reasonable values when interpreted as floating point.

Probably I’ll come back to these recordings in the next few days, but for now I wanted to publish what I have so far. The decoded frames are available in the Github repository and can be obtained using git-annex as described in the README.

]]>I participated in this activity with my HF station, which consists of a Hermes-Lite 2 beta2 DDC/DUC SDR transceiver and an end-fed random wire antenna about 17 metres long. I used a 10 MHz reference from a GPSDO as described in this post to lock the Hermes-Lite 2 sampling clock. Instead of measuring frequency in real time, I recorded IQ data at 200 sps for the WWV carrier at 5000, 10000 and 15000 kHz and for the RWM carrier at 4996, 9996 and 14996 kHz, so that the data could be post processed later with any kind of algorithms. I have published my recordings in the “December 2021 Eclipse Festival of Frequency Measurment IQ recording by station EA4GPZ” dataset in Zenodo.

In this post I process the IQ recordings to produce waterfalls that give us an overview of the data. The frequency measurement will be done in a later post.

The data was recorded using the `hl2_freq_hf.grc`

flowgraph, which is shown below.

Three FPGA receivers are used in the Hermes-Lite 2, tuned to 5, 10 and 15 MHz. The receivers use 48 ksps IQ. The data for WWV is obtained directly by lowpass filtering. The data for RWM is obtained by mixing with a vector source that contains a periodic complex exponential. Using a vector source instead of a signal source to generate this exponential avoids any frequency errors due to rounding. The IQ data is recorded using 16 bit integers. After recording, the files were converted to SigMF format.

The waterfalls are computed with this `make_waterfalls.py`

Python script. It uses an 8192 point FFT with a Blackman window to compute the spectra, saving only the bins corresponding to the central +/-5 Hz, which is enough for the analysis of the carrier. The time resolution is therefore 40.96 seconds, and the frequency resolution is 24.4 mHz. The mHz level frequency errors due to the FPGA NCO resolution (see this post) have not been corrected, since they are an order of magnitude below the FFT resolution. The waterfalls are saved to npy files.

The waterfall files are read and plotted in this Jupyter notebook. The plots are organised by rows, with each row corresponding to 24 hours aligned to the UTC days, since the propagation follows a daily pattern. There is one plot for each of the 6 signals (3 WWV frequencies and 3 RWM frequencies) that have been recorded. The power scale in all the plots is the same, and encompasses a total range of 80 dB. This allows us to see how the noise floor power decreases when going to the higher frequencies.

The first plot correspond to RWM at 4996 kHz. The plots for RWM don’t look good because the station only has a continuous carrier for 7 minutes 55 seconds starting at 0 and 30 minutes in the hour. The remaining time is spent sending either 1 Hz pulses or 10 Hz pulses, which cause sidebands in the spectrum.

The 0 Hz reference has been marked with a faint white line to make easier to see the Doppler of the carrier. The start and end of the eclipse on December 4 have been marked with white vertical lines. There are some gray blocks in the waterfall that correspond to gaps in the data.

Additionally, the last segment of data, starting at 2021-12-08 12:57 UTC, should be taken with a grain of salt, because due to a problem (described in more detail in the Zenodo dataset) the Hermes-Lite 2 was using its internal TCXO rather than the external 10 MHz reference. I have corrected by hand the frequency offset of the TCXO, but there is still some frequency drift in the TCXO. This will be seen better in the plot corresponding to WWV at 10000 kHz.

We see that the propagation at 5 MHz is nocturnal. Additionally, in the morning we see a positive Doppler drift as the signal disappears, and in the evening we see a negative Doppler drift as the signal reappears. This makes sense because the virtual ionospheric layer height decreases at dawn, causing a positive Doppler shift, and increases at dusk, causing a negative Doppler shift.

The next plot shows WWV at 5000 kHz. This is the signal that has worked better for my station, because the signal is relatively strong and the continuous carrier makes much easier to see what is going on in contrast with RWM.

As in the case of RWM, propagation is nocturnal, and we also see the characteristic Doppler shifts in the morning and evening. I am amazed by how much richness there is in the data. Besides the Doppler frequency and signal power, we have the Doppler spread, which causes really complex and interesting patterns. We can see that not only the total spread is sometimes larger and sometimes smaller (and this is typically correlated with the signal strength), but also at times the spread is noticeably asymmetric. It is not so clear how to process this data to try to extract some characteristics of the Doppler spread, but I am thinking of an approach based on estimating the central moments of the power spectral density.

For RWM at 9996 kHz we see that the propagation is strong during daytime, while there is sometimes also a weaker propagation at night. On the morning, the daytime propagation at 9996 kHz starts more or less at the same time that the 4996 kHz signal disappears, and the same happens in reverse during the evening.

The results for WWV at 10000 kHz are bad because the 10 MHz reference from the GPSDO leaks in with noticeable power (it can be seen as a thin steady line here) and the signal from WWV is not particularly strong and has a lot of spread. I’m not even certain that I’m only receiving WWV, since there are several stations in the world using 10000 kHz.

We can use this plot to assess the quality of the last segment of data, where the TCXO of the Hermes-Lite 2 was used instead of the GPSDO. Since we have the 10 MHz reference leaking in, any frequency drift we see in this reference is due to a drift in the TCXO. On December 8 the 10 MHz reference is very close to 0 Hz because I have corrected the frequency offset by hand. However, as time advances we can see the 10 MHz getting slightly higher in frequency. This is caused by the TCXO drifting down in frequency, probably due to ageing.

At 14996 kHz, RWM only shows diurnal propagation. In fact we see that the propagation in this band starts about 1 hour later than on 9996 kHz, and often ends 2 hours before. Also notice how the noise floor power has decreased in comparison with the lower frequency bands.

The recording of WWV at 15000 kHz is interesting because we see a signal between approximately 6:00 UTC and 9:00 UTC. Then we see a signal again between 14:00 and 18:00 UTC. The first signal stops abruptly, so that means that the transmitter shuts down at 9:00 UTC. I don’t think this first signal is WWV. It might be the Chinese BPM time signal, which also transmits at 15000 kHz. I haven’t found any source mentioning a time signal that shuts down at 9:00 UTC. The 200 Hz span used in the IQ recordings is too narrow to identify the stations (they often have time codes at 1 kHz offsets from the carrier), so I will need to conduct additional observations to identify the stations and also to verify that the signal we see at 5000 kHz is WWV rather than BPM.

I think that the second signal, which is visible between 14:00 and 18:00 UTC is indeed WWV, since this period corresponds to the time when the propagation path between WWV and my station (which is half of the USA and Canada and the North Atlantic) is in daytime.

]]>The technique I have used to study the data has been basically the same as in this post. GMAT is called from a Jupyter notebook using its Python API to propagate the time gaps between the state vectors that Bochum has decoded, thus obtaining a continuous trajectory that follows the state vectors. The figure below shows the ground track, with the state vector data marked in blue and the propagated trajectory marked in grey. It is no coincidence that most of the received state vector data is in the southern hemisphere. Since the periapsis is on the northern hemisphere, Tianwen-1 spends most of the time over the southern hemisphere.

The next figure shows the orbit radius versus time. The periapsis and apoapsis radiuses have stayed almost constant.

Recall that the remote sensing orbit has a period which is approximately 140 seconds longer than 2/7 Mars sidereal days. This means that the ground track almost repeats every 2 Mars sidereal days, after 7 revolutions, but shifted a few degrees to the west.

The figure below tries to show this behaviour. It depicts the longitudes of the equator crossings (nodes) of the orbit. The nodes of each revolution are given a colour according to the orbit number modulo 7, so there are a total of 7 colours. The ascending node and descending node of the same revolution are marked with the same colour. As usual, we consider that the revolution starts at the periapsis. Since the periapsis is above the equator, the descending node happens before the ascending node of the same orbit.

Ascending nodes are marked with a triangle pointing up, and descending nodes are marked with a triangle pointing down. Triangles of the same shape and colour are joined with a line to show clearly the drift to the west of the nodes. Besides this, there is a dotted line that joins all the nodes in chronological order, to make it possible to see the order in which they happen.

In the plot we can see that the longitude of the descending node of one revolution is quite close to the longitude of the ascending node of the next revolution. Grouping the nodes in pairs like so, we get 7 groups which are equispaced by 51.43 degrees. The groups drift slowly to the west. In slightly over 20 days, the drift has accumulated to 51.43 degrees and so the groups have shifted places. The consequence of this is that the whole surface of Mars is scanned every 20 days.

The next figure shows the ground track according to the elevation at which Tianwen-1 can be seen from the rover Zhurong. Points where the spacecraft is above 30 degrees elevation are shown in green. These are good for communications between the rover and orbiter. Points where the spacecraft is above the horizon but below 30 degrees are shown in orange. For these, communication might be possible, but perhaps not recommended. Finally, points for which the spacecraft is below the horizon are shown in black. Additionally, the nodes of the orbit are marked as in the previous plot.

Since the plot is rather busy, it is better to show only the first 10 days of data, which is done below.

We see that if only the passes that have a maximum elevation over 30 degrees are used for communications, then typically only 2 communications passes will be possible every 2 Mars sidereal days. In the plot shown above, on the first day we have the ascending orange node and the descending purple node as possible communications passes on the first few days. Eventually the purple node drifts too far west, but more or less at the same time the blue descending node drifts in and starts being useful for communications. If the lower elevation passes are also used for communications, then up to 4 communications passes will be possible every 2 Mars sidereal days (blue, orange, purple and brown in the figure).

In the previous post I commented about a news article that explains that the latitude of the periapsis keeps changing due to orbital perturbations. First it moves from north to south, and then it will move from south to north again. This allows the orbiter to eventually scan all the surface of Mars from a low altitude, which is necessary for high detail.

The figure below shows how the periapsis latitude changes with time. We see that in some 36 days it has decreased by about 18 degrees, which gives a rate of 0.5 degrees per day. Therefore, in order to cover all the surface of mars around a year will be needed so that the periapsis can move from the north to the south pole.

The calculations and plots for this post have been done in this Jupyter notebook. As shown by the additional plots there, the periapsis and apoapsis radiuses, and hence the orbital period have stayed quite constant for all this time.

]]>My source for stating that Reed-Solomon wasn’t used was some private communication with DSN operators. Since the XML files describing the configuration of the DSN receivers for Voyager 1 didn’t mention Reed-Solomon either, I had no reason to question this. However, the DSN only processes the spacecraft data up to some point (which usually includes all FEC decoding), and then passes the spacecraft frames to the mission project team without really looking at their contents. Therefore, it might be the case that it’s the project team the one who handles the Reed-Solomon code for the Voyagers. This would make sense specially if the code was something custom, rather than the CCSDS code (recall that Voyager predates the CCSDS standards). If this were true, the DSN wouldn’t really care if there is Reed-Solomon or not, and they might have just forgotten about it.

After looking at the frames I had decoded from Voyager 1 in more detail, I remarked that Brett might be right. Doing some more analysis, I have managed to check that in fact the Voyager 1 frames used Reed-Solomon as described in the references that Brett mentioned. In this post I give a detailed look at the Reed-Solomon code used by the Voyager probes, compare it with the CCSDS code, and show how to perform Reed-Solomon decoding in the frames I decoded in the last post. The middle section of this post is rather math heavy, so readers might want to skip it and go directly to the section where Reed-Solomon codewords in the Voyager 1 frames are decoded.

One of the references that Brett gave is the paper Reed-Solomon Codes and the Exploration of the Solar System, by McEliece and Swanson. This is a rather interesting paper from 1993 that gives a historic overview of the use of Reed-Solomon codes in deep space missions. In Section 3.1 it speaks about the Voyager mission. The summary is that the convolutional code was a fine solution for the transmission of uncompressed digital images, as these could tolerate a relatively high BER. However, some advances in compression theory made it possible to implement compression on-board and transmit compressed images. These required a much lower BER, so using concatenated coding with a Reed-Solomon outer code was an attractive solution. The image compression and concatenated coding were deemed too risky for the main mission to Jupiter and Saturn, but they were used for the extended mission after the Saturn fly-bys.

The paper gives a detailed description of the Reed-Solomon code used by the Voyagers. As we will see, I may describe it as “the first Reed-Solomon code that one could think of”. This predates the CCSDS Reed-Solomon code, and it is interesting to see how the Voyager Reed-Solomon code later evolved into the CCSDS code.

The Voyager Reed-Solomon code uses the field \(GF(2^8)\), so that the codewords have \(n = 255\) bytes. The number of parity check symbols was 32, so that up to \(E=16\) byte errors could be corrected, and the code was a (255, 223) linear code, just like the current CCSDS code. These code parameters may seem too natural nowadays, but perhaps Voyager was one of the first missions to use them. For instance, the same paper mentions that Mariner used a (6, 4) code over \(GF(2^6)\). The paper doesn’t explain exactly why these parameters were chosen. Working with \(GF(2^8)\) might seem reasonable in byte-oriented computers. Usually a Reed-Solomon code over \(GF(2^8)\) will have \(n=255\). However, the choice to correct up to \(E=16\) byte errors might have been somewhat arbitrary.

The field \(GF(2^8)\) is constructed by using the primitive polynomial\[p(x) = x^8+x^4+x^3+x^2+1,\]which the paper mentions to be the first one of degree 8 in the tables of irreducible polynomials given in Appendix C in the book Error-Correcting Codes, by Peterson and Weldon. The generator polynomial of the Reed-Solomon code should have as roots 32 consecutive powers of a primitive element. The chosen generator is the obvious one,\[g(x) = (x-\alpha)(x-\alpha^2)\cdots(x-\alpha^{32}),\]where \(\alpha\) is a root of \(p(x)\). This is a valid choice because since \(p(x)\) is a primitive polynomial, then \(\alpha\) is a primitive element in \(GF(2^8)\).

It is interesting to see how this code is still in use nowadays, since it has probably ended up in many popular Reed-Solomon libraries. For instance, a quick search in gr-satellites shows that exactly the same Voyager code is used in the cubesat 3CAT-1, and variations with a different error correction capability are used in ESEO and TT-64 (\(E=8\)), Swiatowid (\(E=5\)), and ÑuSat (\(E = 2\)). All the other satellites supported in gr-satellites that use Reed-Solomon use the (255, 223) CCSDS code, either with the conventional or dual basis.

Even though it is somewhat of a diversion from the main topic of this post, it is interesting to see what changes in the Voyager Reed-Solomon code lead to the current CCSDS Reed-Solomon code, which is defined in Section 4 of the TM Synchronization and Channel Coding Blue Book. The CCSDS code was the result of a series of optimizations done by Berlekamp with the goal of obtaining a code that had the same error correction properties as the Voyager code, but for which much simpler encoders could be built in hardware. The JPL report Reed-Solomon Encoders – Conventional vs Berlekamp’s Architecture gives a detailed explanation of these changes.

The CCSDS Reed-Solomon code is also a (255, 223) code over \(GF(2^8)\). The first difference with respect to the Voyager code is really customary. In the CCSDS code, the field \(GF(2^8)\) is defined using the primitive polynomial\[q(x) = x^8 + x^7 + x^2 + x + 1\]instead of the polynomial \(p(x)\) given above. However, the root \(\beta\) of \(q(x)\) is only used indirectly to define other elements of \(GF(2^8)\), which are really the ones that determine the code. Therefore, the same code could have been defined using any other irreducible polynomial of degree 8 to define \(GF(2^8)\). Note two caveats: First, this remark is only true when using the dual basis (which is one of Berlekamp’s central ideas); when using the conventional basis, the element \(\beta\) is used to encode the field elements as 8 bit numbers, so a different choice of \(q(x)\) would give a different code. Second, here we are using \(\beta\) in order to distinguish the root of \(q(x)\) from \(\alpha\), the root of \(p(x)\) in the Voyager code. In the CCSDS documentation \(\beta\) is used to denote another field element (we will use \(\delta\) below to denote that field element).

The first main idea of Berlekamp is that instead of constructing the generator polynomial of the code using the powers \(\gamma\), …, \(\gamma^{32}\) of a field element \(\gamma\) (as done in the Voyager code), if we use\[h(x) = (x-\gamma^{112})(x-\gamma^{113})\cdots(x-\gamma^{143}),\]then \(h(x)\) is a palindromic polynomial, meaning that \(x^{32}h(1/x) = h(x)\). This happens because the roots of \(h(x)\) are pairwise inverses, i.e., \(\gamma^{112} = \gamma^{-143}\), \(\gamma^{113} = \gamma^{-142}\), etc.

Having a palindromic polynomial as the code generator is helpful because the encoder works by multiplying field elements by each of the coefficients of the code generator polynomial. Since palindromic polynomials have pairwise equal coefficients, we save half of the multiplications. Therefore, making the 32 consecutive powers of \(\gamma\) be symmetric about 127.5, rather than about 16.5 (which is what happens if we use \(\gamma\), \(\gamma^2\), …, \(\gamma^{32}\)) is advantageous to simplify the encoder.

Another remark that we have used tacitly is that it is not necessary to use a root of the polynomial that generates the field (\(\beta\) in the case of \(q(x)\)) to construct the roots of \(h(x)\) using consecutive roots. In fact, it is possible to choose any \(\gamma\) that is a primitive field element, as we have done above. Berlekamp will use this freedom to his advantage.

The second key idea of Berlekamp, and what is probably the most ingenious aspect of his Reed-Solomon code optimization, is the use of the dual basis. If \(\delta\) is a field element that doesn’t belong to \(GF(2^4)\), then the elements \(1\), \(\delta\), \(\delta^2\), …, \(\delta^7\) form a basis for \(GF(2^8)\) as a linear space over the prime field \(GF(2)\). There is a dual basis \(\ell_0\), \(\ell_1\), …, \(\ell_7\) defined by the property that \(\operatorname{Tr}(\ell_j \delta^k)\) equals 1 when \(j = k\) and 0 otherwise. Here \(\operatorname{Tr}\) denotes the field trace defined by\[\operatorname{Tr}(x) = \sum_{j=0}^7 x^{2^j} \in GF(2)\]for \(x \in GF(2^8)\).

We can use the dual basis to represent an element \(x \in GF(2^8)\) as a linear combination\[x = z_0\ell_0 + z_1 \ell_1 + \cdots + z_7\ell_7,\]with \(z_j\in GF(2)\). The coefficients \(z_j\) can be computed as \(z_j = \operatorname{Tr}(x\delta^j)\).

An interesting property of the dual basis is that the multiplication by \(\delta\), which defines a \(GF(2)\)-linear operator on \(GF(2^8)\), has a simple expression in the coordinates given by the dual basis. The element \(\delta^8\) can be written in terms of \(1\), \(\delta\), …, \(\delta^7\) as\[\delta^8 = c_0 + c_1\delta + c_2\delta^2 + \cdots + c_7\delta^7.\]This implies that\[x\delta = z_1\ell_0 + z_2\ell_1 + \cdots + z_7\ell_6 + (c_0z_0 + c_1z_1 + \cdots c_7z_7)\ell_7.\]Thus, given the dual basis coordinates of \(x\), to compute the dual basis coordinates of \(x\delta\) we just need to left shift the coordinates and use a linear map to compute the new component that is shifted in.

Compare this with what happens if we instead use coordinates with respect to the basis \(1\), \(\delta\), …, \(\delta^7\). Then, if\[x = u_0 + u_1 \delta + u_2 \delta^2 + \cdots u_7 \delta^7,\]we have\[x\delta = u_7c_0 + (u_0 + u_7c_1)\delta + (u_1 + u_7c_2)\delta^2 + \cdots + (u_6 + u_7c_7)\delta^7.\]Thus we see that to compute the coordinates of \(x\delta\) from the coordinates of \(x\) we need perform a right shift of the coordinates followed by adding \(u_7\) multiplied by a vector that depends only on \(\delta\).

There is a nice way to relate the expressions for the operator of multiplication by \(\delta\) in the dual and conventional bases: since this operator is self-adjoint, its matrix with respect to the dual basis is the transpose of its matrix with respect to the conventional basis (we say that a \(GF(2)\)-linear map \(A\) on \(GF(2^8)\) is self-adjoint if \(\operatorname{Tr}(A(x)y) = \operatorname{Tr}(xA(y))\) for all \(x\) and \(y\) in \(GF(2^8)\)).

Note that the operator of multiplication by \(\delta\) in these two bases is related to the theory of linear feed-shift registers. In fact, the multiplication by \(\delta\) in the dual basis gives the transition map for the Fibonacci form of an LFSR, while the multiplication in the conventional basis gives the transition map for the Galois form of an LFSR. Using this, we see that the output of the Fibonacci LFSR at step \(n\) is \(\operatorname{Tr}(x_0\delta^n)\), where \(x_0\) is the reset state (encoded in dual basis coordinates in the register), while the output of the Galois LFSR at step \(n\) is \(\operatorname{Tr}(y_0\delta^n\ell_7)\), where \(y_0\) is the reset state (encoded in the conventional basis in the register). This gives us that the relation required to generate the same sequence with the Fibonacci and Galois forms of the LFSR is \(x_0 = y_0 \ell_7\) (taking into account that each element should be written in the correct basis in the register).

Now let us see what happens when we consider the operator of multiplication by a fixed element \(g\in GF(2^8)\). This is the basic operation that the Reed-Solomon encoder performs, taking as \(g\) each of the coefficients of the code generator polynomial. As above, we consider the multiplication by \(g\) as a \(GF(2)\)-linear map on \(GF(2^8)\). Working in dual basis coordinates, we know that\[xg = \operatorname{Tr}(xg)\ell_0 + \operatorname{Tr}(xg\delta)\ell_1 + \operatorname{Tr}(xg\delta^2)\ell_2 + \cdots + \operatorname{Tr}(xg\delta^7)\ell_7.\]Now, the functional that sends \(x\) into \(\operatorname{Tr}(xg)\) can be written in coordinates with respect to the dual basis as\[\operatorname{Tr}(xg) = a_0z_0 + a_1z_1 + \cdots + a_7z_7,\]where\[x = z_0\ell_0 + z_1 \ell_1 + \cdots + z_7\ell_7,\]and \(a_0,\ldots,a_7\in GF(2)\) depend only on \(g\) (in fact \(a_j = \operatorname{Tr}(g\ell_j)\) are the coordinates of \(g\) with respect to the conventional basis). This expression for \(\operatorname{Tr}(xg)\) is simple to compute in digital logic, provided that the input element \(x\) is already given to us in dual basis coordinates.

To compute \(\operatorname{Tr}(xg\delta)\), we replace \(x\) by \(x\delta\), using the formula given above to compute the dual basis coordinates of \(x\delta\) from the dual basis coordinates of \(x\), and then apply the formula for \(\operatorname{Tr}(xg)\). Proceeding iteratively in the same manner, we can compute \(\operatorname{Tr}(xg\delta^2)\), …, \(\operatorname{Tr}(xg\delta^7)\). These are the coordinates for \(xg\) in the dual basis, which is what we wanted to compute.

Thus, we have seen that when we have the dual basis coordinates for \(x\), there is a simple way to compute the dual basis coordinates for \(xg\). This is Berlekamp’s key idea for reducing the digital logic usage of a Reed-Solomon encoder. Note that in a Reed-Solomon encoder we need to compute \(xg_0\), …, \(xg_{32}\), where \(g_j\) are all the coefficients of the generator polynomial. Each of these will have its own dedicated expression to compute \(\operatorname{Tr}(xg_j)\), but then the passage from \(x\) to \(x\delta\) can be shared between all the computations, since it does not depend on \(g_j\).

It is worth spending a moment to see why this kind of approach needs the dual basis, and doesn’t really work if we use coordinates with respect to the basis \(1\), \(\delta\), …, \(\delta^7\) instead. In this case we have\[xg = \operatorname{Tr}(xg\ell_0) + \operatorname{Tr}(xg\ell_1) \delta + \operatorname{Tr}(xg\ell_2) \delta^2 + \cdots + \operatorname{Tr}(xg\ell_7)\delta^7,\]and\[x = u_0 + u_1 \delta + u_2 \delta^2 + \cdots + u_7 \delta^7,\]where \(u_j = \operatorname{Tr}(x\ell_j)\). As before, we could compute \(\operatorname{Tr}(xg\ell_0)\) as a linear combination of the coefficients \(u_j\). However, to compute the next trace we need, \(\operatorname{Tr}(xg\ell_1)\), if we want to apply the same linear combination we would need to replace \(x\) by \(x\ell_1\ell_0^{-1}\). This not only doesn’t have a simple expression in coordinates, but also is not the same calculation that when we need to replace \(y = x\ell_1\ell_0^{-1}\) by \(y \ell_2\ell_1^{-1}\) in order to compute \(\operatorname{Tr}(xg\ell_2)\) (and so on for all the other traces we need to compute).

The final optimization done by Berlekamp consists in noticing that \(\gamma\) can be chosen to be any primitive element in \(GF(2^8)\) (there are \(\varphi(255) = 128\) such elements, where \(\varphi\) denotes Euler’s totient function), and \(\delta\) can be chosen to be any element in \(GF(2^8)\setminus GF(2^4)\) (there are 240 such elements). By performing a computer aided search, he selected a \(\gamma\) and \(\delta\) that minimized the digital logic gates required to perform the calculations for the multiplication operators by \(g_0\), …, \(g_{32}\) in the dual basis coordinates. He settled on \(\gamma = \beta^{11}\) and \(\delta = \beta^{117}\). These choices have been maintained in the CCSDS Reed-Solomon code.

The paper by McEliece and Swanson states that Voyager uses an interleaving depth of 4 in its Reed-Solomon frames. However, the paper doesn’t mention how much virtual fill is used to shorten the Reed-Solomon codewords. Observe that it is necessary to have some shortening, because 4 interleaved (255, 223) Reed-Solomon codewords amount to a total of 8160 bits, while the Voyager 1 frames that I decoded in last post have only 7680 bits (including the ASM). I haven’t found any source mentioning how much amount of virtual fill is used in the Voyager 1 frames. However, it is not too difficult to find the correct virtual fill with some trial and error.

Indeed, if we look at the raster plots of the frames I decoded in the previous post, it is easy to find where the Reed-Solomon parity check bytes are. The parity check bytes look much more random than the rest of the data, and they can be spotted visually. The figure below shows the parity check bytes starting at around bit 6625. Compare the structured data before this point with the rather random data afterwards.

We continue through several figures containing 128 bits of random looking Reed-Solomon parity check bytes, like this.

Until we arrive to the final figure, where the Reed-Solomon parity check bytes end at around bit 7650 and we have some rather deterministic bits afterwards.

Thus, in total, we see that we have approximately 7650-6625 = 1025 bits worth of Reed-Solomon parity check bytes. This is just what we expected, since 32 bytes times 4 interleaved codewords equals 1024 bits.

The last portion of the frame is approximately 30 bits long (from 7650 to 7680). Therefore, we assume that it is really 32 bits long, and so the last 4 bytes of the frame do not form part of the Reed-Solomon codewords.

The question now is where do the Reed-Solomon codewords start. At the beginning of the frame we have two repetitions of the 32-bit ASM. Typically these do not form part of Reed-Solomon codewords, since they are a fixed pattern that doesn’t need to be error corrected (for instance, in CCSDS, the ASM is outside of the Reed-Solomon codeword). Therefore, the question is basically how many bytes we need to throw away from the beginning of the frame in order to be left with just the 4 interleaved Reed-Solomon codewords. The amount of bytes that we throw should be a multiple of 4 bytes, so that the 4 interleaved codewords have the same length.

Here is where we use trial and error. If we make a small mistake in the size of the Reed-Solomon codewords, the decoder will correct our mistake as corrected byte errors. For instance, if we include an extra byte, then we are typically supplying a non-zero data byte instead of a zero byte as virtual fill. The decoder will correct this byte to a zero. Conversely, if we miss a byte, we are supplying a zero byte as virtual fill instead of a data byte which is typically non-zero. The decoder will correct this zero to the correct data value. See my post about a Reed-Solomon bug in Queqiao for some related things we can do with Reed-Solomon codes.

Thus, by changing the codeword size to try to minimize the number of Reed-Solomon decoder byte errors (and striving to get zero errors), we can find the correct codeword size. My initial guess was that the first 128 bits of the frame (which amounts to 4 bytes for each of the 4 interleaved codewords) didn’t form part of the Reed-Solomon codewords, since these contain two repetitions of the ASM and some counters. However, it turns out that the the Reed-Solomon codewords start right at the beginning of the frame. They even include the ASMs.

Therefore, only the last 4 bytes of the frame are not part of the Reed-Solomon codewords. The remaining 7648 bits are 4 interleaved (239, 207) Reed-Solomon codewords.

I have used a Jupyter notebook and Phil Karn KA9Q’s libfec to find the correct codeword size and decode the Reed-Solomon frames. The number of corrected bytes in each of the frames of each recording is listed below (each row lists the byte errors of each of the 4 interleaved codewords corresponding to a frame).

Recording 11 [ 1, 3, 0, 0] [ 1, 0, 0, 0] [ 1, 1, 0, 0] [ 0, 1, 0, 0] [ 2, 2, 2, 1] Recording 15 [ 3, 2, 2, 1] [ 7, 7, 5, 4] [ 2, 6, 6, 3] [ 2, 2, 2, 1] [ 0, 1, 1, 0]

We see that in all the frames that appear fully in the recording (some frames are cut short at the beginning and end of each recording) all the byte errors can be corrected and the frames can be successfully decoded. We also note again that the SNR of recording 11 is better, since there are fewer byte errors. Since we get zero byte errors in several of the codewords, we know we are using the correct codeword size.

After successful Reed-Solomon decoding, we are sure of the correct values of each of the bits in the Reed-Solomon codewords, so we can now pass from the soft bits we used in the last post to hard bits that only have the values 0 and 1. The raster plots now look like this one, which shows the first 128 bits of the frames. I have thrown away those frames which couldn’t be decoded by the Reed-Solomon decoder.

Since the last 32 bits of the frame do not form part of the Reed-Solomon codeword, we cannot error correct those, so I have left the soft bits there. The result looks like this.

The Jupyter notebook contains all the plots of the Reed-Solomon corrected frames. Now it should be slightly easier to look carefully through the data to try to find patterns.

]]>Queqiao transmits telemetry in S-band, using the frequency 2234.5 MHz. The modulation and coding is similar to other recent Chinese probes, such as Chang’e 5 and Tianwen-1. Here I report an interesting bug that I found in the Reed-Solomon encoding performed by Queqiao.

Queqiao transmits telemetry using PCM/PM/PSK modulation with a subcarrier frequency of 65536 Hz and a baudrate of 2048 baud. CCSDS concatenated coding is used with a frame size of 220 information bytes, so that each frame takes exactly 2 seconds to transmit. This is very similar to the modulation and coding used by Chang’e 5 and Tianwen-1. In particular, Chang’e 5 was using exactly this modulation and coding during the lunar mission. Interestingly, Queqiao doesn’t seem to transmit CCSDS TM Space Data Link or AOS frames, but rather uses a custom framing.

In this post we will use a recording done with one of the Allen Telescope Array 6.1 metre dishes on 2021-09-19. The recording is published in this dataset in Zenodo.

The GNU Radio decoder flowgraph that we use is shown below. It can be downloaded here.

This flowgraph doesn’t perform Reed-Solomon decoding, and outputs the Reed-Solomon codewords to a file, since we’re interested in looking at the codewords in detail. This is done using Phil Karn KA9Q’s libfec through ctypes from a Python Jupyter notebook. The flowgraph is run on the recording of the X polarization of the signal only, since the SNR is good enough that we don’t need to combine both polarizations to synthesize circular polarization.

Queqiao uses for its Reed-Solomon code the dual basis, as specified in the CCSDS documents, so we must use the `decode_rs_ccsds()`

function from libfec to perform decoding.

As a first step, we perform Reed-Solomon decoding and look at the number of byte errors corrected by the decoder. A value of -1 corresponds to a decode error. What we see is that most of the frames have 4 byte errors, while we would expect 0 byte errors, since the signal is strong enough.

There are a bunch of decode errors that probably correspond to false ASM detections, since 4 bit errors are allowed in the ASM detection. These are harmless, since they don’t interfere with the correct ASM detections.

There are also some parts of the recording where there are more than 4 byte errors, and at the end we see that decoding suddenly stops. Perhaps the satellite went out of the antenna beam, since I was tracking a fixed right ascension and declination during this recording.

The question we are concerned with here is why there are always 4 errors. The reason is interesting, and there are some puzzling things along the way.

The first we do is to look at one of the frames and its decoded version, to try to see which 4 bytes the Reed-Solomon decoder has changed. This is the original frame:

03 79 16 0a 32 98 02 81 27 55 9a aa bb 02 c5 00 00 f9 6b 00 00 00 00 a4 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 21 00 28 00 3e a7 02 20 ee 58 01 04 08 24 00 00 63 00 00 00 62 00 00 00 07 92 92 00 00 db cf 25 53 0f 00 00 08 c6 02 3b 12 95 0c 0c 30 23 fe f0 58 28 ca 53 1a 10 1f 35 ac 4f da f8 00 00 00 00 00 00 00 00 00 00 00 00 0f 38 2a 29 d9 49 07 07 f9 0d 00 00 00 00 00 00 80 49 04 43 00 00 00 00 ae 7f af ae ac af 00 00 00 00 04 04 ff c1 00 ff 11 00 96 55 ff 01 01 01 01 01 01 01 01 01 01 76 8f 80 02 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ee ee ee ee ee ee ee ee ee ee d2 1e ec e9 a3 5c ae bf 0f 1e 66 47 e3 46 82 ff bf de d4 a9 87 1e b9 26 67 58 f2 e9 4c f9 62 ab

And this is the decoder output, including the parity check bytes:

03 79 16 0a 32 98 02 81 27 55 9a aa bb 02 c5 00 00 f9 6b 00 00 00 00 a4 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 21 00 28 00 3e a7 02 20 ee 58 01 04 08 24 00 00 63 00 00 00 62 00 00 00 07 92 92 00 00 db cf 25 53 0f 00 00 08 c6 02 3b 12 95 0c 0c 30 23 fe f0 58 28 ca 53 1a 10 1f 35 ac 4f da f8 00 00 00 00 00 00 00 00 00 00 00 00 0f 38 2a 29 d9 49 07 07 f9 0d 00 00 00 00 00 00 80 49 04 43 00 00 00 00 ae 7f af ae ac af 00 00 00 00 04 04 ff c1 00 ff 11 00 96 55 ff 01 01 01 01 01 01 01 01 01 01 76 8f 80 02 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ee ee ee ee ee ee ee ee ee ee d2 1e ec e9 a3 5c ae bf 0f 1e 66 47 e3 46 82 ff bf de d4 a9 87 1e b9 26 67 58 f2 e9 4c f9 62 b1

If we look closely, we see that only the last byte has changed, from `0xab`

to `0xb1`

. That’s somewhat weird, since the decoder tells us it has corrected 4 bytes rather than 1.

Here I’m using just the first decoded frame as a concrete example, but in the Jupyter notebook we can see that this happens with all the frames for which the decoder says that it has corrected 4 bytes.

Since we have seen that none of the information bytes have been changed by the decoder, something else we might try is to take these and run them through the encoder. We would expect to obtain the same parity check bytes that we’re seeing here. This is not what happens, though. Below we show the output of the encoder. Note that all the 32 parity check bytes are completely different from those that we have seen above.

03 79 16 0a 32 98 02 81 27 55 9a aa bb 02 c5 00 00 f9 6b 00 00 00 00 a4 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 21 00 28 00 3e a7 02 20 ee 58 01 04 08 24 00 00 63 00 00 00 62 00 00 00 07 92 92 00 00 db cf 25 53 0f 00 00 08 c6 02 3b 12 95 0c 0c 30 23 fe f0 58 28 ca 53 1a 10 1f 35 ac 4f da f8 00 00 00 00 00 00 00 00 00 00 00 00 0f 38 2a 29 d9 49 07 07 f9 0d 00 00 00 00 00 00 80 49 04 43 00 00 00 00 ae 7f af ae ac af 00 00 00 00 04 04 ff c1 00 ff 11 00 96 55 ff 01 01 01 01 01 01 01 01 01 01 76 8f 80 02 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ee ee ee ee ee ee ee ee ee ee a1 e7 33 a8 2a cb ce 5e 76 b4 10 5c 6e e4 84 07 6b f6 50 ac c3 d5 f4 18 52 3a e4 18 cc 7f 4b ec

At first I found this very confusing. How could it be that we have two different versions of a Reed-Solomon codeword, having the same information bytes but totally different parity check bytes, and that the decoder is able to work with both of them? (In one of the versions it mysteriously corrects 4 errors, while in the other one it corrects 0 errors).

The explanation has to do with how virtual fill of Reed-Solomon codewords works. The CCSDS Reed-Solomon code is a (255, 223) code, which means that we take 223 information bytes, compute 32 parity check bytes, and concatenate them to obtain a 255 byte codeword. An important property of the Reed-Solomon code is that it is cyclic, meaning that every circular shift of a valid codeword is also a valid codeword.

The way to use shorter frame sizes with this Reed-Solomon code is through virtual fill, which is just zero-padding. Basically, the transmitter zero pads the information bytes it wants to send in order to obtain 223 information bytes and then encodes using the (255, 223) code. The padding bytes are not sent over the air. Only the original information bytes plus 32 parity check bytes are sent. The receiver knows the size of the frame, so it knows how many padding bytes were added, and can add the padding back before decoding.

Now, the convention about where to add this padding is that in the full 255 byte encoded codeword the original information bytes and the 32 parity check bytes should be adjacent, with the parity check bytes following the information bytes. It doesn’t really matter if the padding is at the beginning or at the end of the 255 byte codeword. Since the code is cyclic, we can move from one arrangement to the other by using a circular shift, and once we strip the padding the result is the same.

Typically, the transmitter will add the padding before the original information bytes (and this is usually the way that virtual fill is explained), since by doing this and running the encoder it obtains a codeword that has the structure padding-information-checkbytes. The information and check bytes are adjacent and it can just strip the padding. If it added the padding after the information bytes, it would obtain a codeword with the structure information-padding-checkbytes. This doesn’t really work, because the information and check bytes are not adjacent, and there is not a way to fix this using circular shifts. On the other hand, the receiver receives something that has the structure information-checkbytes. It can add the padding either at the beginning (before information) or at the end (after checkbytes), and then decode normally, since both options are a circular shift of each other.

Coming back to Queqiao, we see that the Reed-Solomon encoder says it is correcting 4 bytes, but we only see that one byte has been changed. Moreover, the number of padding bytes used by Queqiao is 3. So we suspect something strange is going on with these padding bytes.

To investigate what happens with the padding, we take the 252 byte codewords that our flowgraph has obtained, add 3 `0x00`

bytes before them to obtain 255 byte codewords, and run these through the decoder. This is what we get for the first codeword:

cf fc 1d 03 79 16 0a 32 98 02 81 27 55 9a aa bb 02 c5 00 00 f9 6b 00 00 00 00 a4 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 21 00 28 00 3e a7 02 20 ee 58 01 04 08 24 00 00 63 00 00 00 62 00 00 00 07 92 92 00 00 db cf 25 53 0f 00 00 08 c6 02 3b 12 95 0c 0c 30 23 fe f0 58 28 ca 53 1a 10 1f 35 ac 4f da f8 00 00 00 00 00 00 00 00 00 00 00 00 0f 38 2a 29 d9 49 07 07 f9 0d 00 00 00 00 00 00 80 49 04 43 00 00 00 00 ae 7f af ae ac af 00 00 00 00 04 04 ff c1 00 ff 11 00 96 55 ff 01 01 01 01 01 01 01 01 01 01 76 8f 80 02 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ee ee ee ee ee ee ee ee ee ee d2 1e ec e9 a3 5c ae bf 0f 1e 66 47 e3 46 82 ff bf de d4 a9 87 1e b9 26 67 58 f2 e9 4c f9 62 b1

Here we readily see the 4 bytes that the Reed-Solomon decoder has corrected: the last one (which we already knew about), and the three `0x00`

padding bytes at the beginning, which have been corrected to `0xcffc1d`

.

This is quite weird and interesting. When there is virtual fill being used, the decoder shouldn’t need to change the padding bytes. These are known to be zeros, and are not transmitted over the air. So this indicates that there is a bug in the encoder, and it isn’t really using zeros as padding, so the receiver’s decoder must correct the padding bytes to account for that.

This also explains why when we re-encoded the information bytes we obtained some completely different parity check bytes. Our encoder is really using zeros as padding, so this changes completely the parity check bytes, as we know that a small change in the 223 information bytes will completely change all the 32 parity check bytes.

The three bytes we have obtained, `0xcffc1d`

, are the end of the 32-bit ASM `0x1acffc1d`

that is used with the CCSDS concatenated frames (and that Queqiao is transmitting right before the Reed-Solomon codeword). This doesn’t seem to be a coincidence. It looks like a buffer handling problem where instead of having the three `0x00`

bytes before the information bytes in the Reed-Solomon encoder, we have `0xcffc1d`

, which are the three bytes that should be sent immediately before the codeword.

However, this doesn’t explain why the last byte in the codeword is also wrong. Perhaps the last byte doesn’t really belong to the 255 byte codeword. If so, the codeword would start one byte earlier, and we would have 4 instead of 3 virtual fill bytes. Perhaps by doing so we will see the first virtual fill byte being corrected to `0x1a`

(the first byte of the ASM).

To test this idea, we drop the last byte of each 252 byte codeword and add 4 `0x00`

bytes to the beginning of the codeword. This is what we get for the first codeword:

b1 cf fc 1d 03 79 16 0a 32 98 02 81 27 55 9a aa bb 02 c5 00 00 f9 6b 00 00 00 00 a4 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 21 00 28 00 3e a7 02 20 ee 58 01 04 08 24 00 00 63 00 00 00 62 00 00 00 07 92 92 00 00 db cf 25 53 0f 00 00 08 c6 02 3b 12 95 0c 0c 30 23 fe f0 58 28 ca 53 1a 10 1f 35 ac 4f da f8 00 00 00 00 00 00 00 00 00 00 00 00 0f 38 2a 29 d9 49 07 07 f9 0d 00 00 00 00 00 00 80 49 04 43 00 00 00 00 ae 7f af ae ac af 00 00 00 00 04 04 ff c1 00 ff 11 00 96 55 ff 01 01 01 01 01 01 01 01 01 01 76 8f 80 02 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ee ee ee ee ee ee ee ee ee ee d2 1e ec e9 a3 5c ae bf 0f 1e 66 47 e3 46 82 ff bf de d4 a9 87 1e b9 26 67 58 f2 e9 4c f9 62

We see that the first byte we get in the decoder output is not `0x1a`

, but rather `0xb1`

. The byte we get in this position changes in every frame and it seems quite random. It appears to be uniformly distributed and I haven’t been able to find any relationship to the bytes in adjacent frames.

In summary, it seems that there is a bug in the Reed-Solomon encoder of Queqiao that causes the encoder to use `0xcffc1d`

instead of zeros for three virtual fill bytes, and additionally there is another wrong byte in the codewords for which we don’t have a good explanation.

The material for this post is available in this repository, including the Jupyter notebook and the file with the Reed-Solomon codewords.

]]>Some time after writing that post, Steve Croft, from BSRC, pointed me to another set of recordings of Voyager 1 from 16 July 2020 (MJD 59046.8). They were also made by Breakthrough Listen with the Green Bank Telescope, but they are longer. This post is an analysis of this set of recordings.

The recordings follow the usual observing cadence of Breakthrough Listen, described in Section 2.1 in this paper. Six scans of 5 minutes each are done. The primary target (in this case Voyager 1) is observed in three of the scans, called ON scans. In the three other scans, called OFF scans, other targets or the empty sky are observed. The ON and OFF scans alternate, starting with an ON scan. The goal of this schedule is to discard as local interference signals that are present both in an ON and OFF scan.

I think that these recordings have not been published yet in the Breakthrough Listen open data archive. I guess they will be published at some point when the data is curated.

The files I used are from compute node BLC23, which processed the data in a 187.5 MHz window around the frequency 8345.21484375 MHz. A total of 24 compute nodes were used in this observation to cover the span between 7501.5 and 11251.5 MHz approximately (a few of the 187.5 MHz windows were duplicated into two nodes).

The files are in GUPPI format, which I described in my previous post. The header of the first file in the dataset is as follows:

```
BACKEND = 'GUPPI '
DAQCTRL = 'start '
TELESCOP= 'GBT '
OBSERVER= 'Steve Croft'
PROJID = 'AGBT20A_999_53'
FRONTEND= 'Rcvr8_10'
NRCVR = 2
FD_POLN = 'CIRC '
BMAJ = 0.02198635849376383
BMIN = 0.02198635849376383
SRC_NAME= 'VOYAGER-1'
TRK_MODE= 'UNKNOWN '
RA_STR = '17:12:40.4400'
RA = 258.1685
DEC_STR = '+12:24:14.7600'
DEC = 12.4041
LST = 45322
AZ = 92.7495
ZA = 66.3982
DAQPULSE= 'Thu Jul 16 18:13:55 2020'
DAQSTATE= 'record '
NBITS = 8
OFFSET0 = 0.0
OFFSET1 = 0.0
OFFSET2 = 0.0
OFFSET3 = 0.0
BANKNAM = 'BLP23 '
TFOLD = 0
DS_FREQ = 1
DS_TIME = 1
FFTLEN = 512
CHAN_BW = -2.9296875
BANDNUM = 2
NBIN = 0
OBSNCHAN= 64
SCALE0 = 1.0
SCALE1 = 1.0
DATAHOST= 'blr2-3-10-3.gb.nrao.edu'
SCALE3 = 1.0
NPOL = 4
POL_TYPE= 'AABBCRCI'
BANKNUM = 3
DATAPORT= 60000
ONLY_I = 0
CAL_DCYC= 0.5
DIRECTIO= 1
BLOCSIZE= 134217728
ACC_LEN = 1
CAL_MODE= 'OFF '
OVERLAP = 0
OBS_MODE= 'RAW '
CAL_FREQ= 0.0
DATADIR = '/datax/dibas'
OBSFREQ = 8345.21484375
PFB_OVER= 12
SCANLEN = 300.0
PARFILE = '/opt/dibas/etc/config/example.par'
OBSBW = -187.5
SCALE2 = 1.0
BINDHOST= 'eth4 '
PKTFMT = '1SFA '
TBIN = 3.41333333333333E-07
BASE_BW = 1450.0
CHAN_DM = 0.0
SCAN = 11
STT_SMJD= 80036
STT_IMJD= 59046
STTVALID= 1
DISKSTAT= 'waiting '
NETSTAT = 'receiving'
PKTIDX = 0
DROPAVG = 1.27322e-06
DROPTOT = 0.719169
DROPBLK = 0
PKTSTOP = 27459584
NETBUFST= '1/24 '
STT_OFFS= 0
SCANREM = 0.0
PKTSIZE = 8192
NPKT = 16383
NDROP = 0
END
```

Each 5 minute scan is divided in time into 14 GUPPI files. There are no gaps in the data between these 14 files, so an IQ file that is continuous in time can be obtained by concatenating the IQ samples extracted from these 14 files.

To extract the IQ data in the polyphase filterbank channel that contains the Voyager 1 signal, I have used blimpy and the following Python script:

This can be run as

for file in *.raw; do ~/extract_voyager.py $file ~/vgr1_$file; done

The scans can be concatenated by using `cat`

on the resulting files to obtain a single IQ file per scan. The resulting files are signed 8-bit IQ, since the data in the GUPPI files uses 8-bit sampling. The frequency axis in the files is inverted, so when reading the files, the complex conjugate of the data needs to be computed (or alternatively, I and Q can be swapped).

To assess the quality of the recordings we compute a waterfall (time-frequency representation) using GNU Radio. This is done using the `voyager1_spectrum.grc`

flowgraph.

This flowgraph uses an FFT size of \(2^{20}\) points, which gives a frequency resolution of 2.79 Hz. The PSDs are integrated 30 times, so the resulting time resolution is 10.74 seconds. The waterfall data is written to a file, which is then plotted with this Jupyter notebook.

The following figures correspond to the first scan in the dataset, which has the filename `blc23_guppi_59046_80036_DIAG_VOYAGER-1_0011.raw`

. Note that scans are numbered starting by 11.

The first figure is the spectrum of the polyphase filterbank channel that we have extracted. The residual carrier of Voyager 1, as well as the data subcarrier can be seen well above the noise (note that the scale is in linear units of power).

The next figure shows the waterfall of the carrier. The Doppler drift of the signal is apparent.

A measurement of the signal power of the residual carrier and the noise is done in order to estimate the C/N0 of the carrier. The measurement is done independently for each spectrum (time slice) in the waterfall. The bin where the carrier has maximum power is selected as centre and a few bins about this centre are used to measure the signal power (plus the noise in those bins). The noise power per bin is measured by averaging some 200 bins of noise to each side of the carrier (leaving some small guard band so as not to measure any power leaked from the carrier). The appropriate noise power is subtracted from the measurement of the signal power to obtain the signal power alone.

In the plot we see that there are large variations in the power of the carrier, of up to 1.5 dB. The SNR of the carrier will be measured later in a different way, confirming these variations.

Finally, the next figure shows the waterfall of one of the sidebands. As we remarked in the previous post, the spectral lines are due to some long runs of zeros in the data, which is sent unscrambled. When these zeros pass through the convolutional encoder, an alternating 0101 sequence is produced due to the inverter in one of the branches of the encoder.

The power of each of the two sidebands is also measured in order to estimate the Eb/N0 that is present in the data modulation.

As expected, the three ON scans show the Voyager 1 signal, while the three OFF scans show only noise. The estimates of the C/N0 of the carrier and the Eb/N0 of the data subcarrier (adding up the power of both sidebands) are shown in the table below.

Scan | Carrier C/N0 (dB) | Data Eb/N0 (dB) |

11 | 23.74 | 6.33 |

13 | 20.49 | 3.13 |

15 | 23.21 | 5.90 |

It is noteworthy that the middle ON scan has much worse SNR, approximately 3 dB less than the other scans. The figure below shows the signal and noise power measurement. The gain normalization is the same in all the recordings, so we see that the reduction of SNR is mainly due to an increase in the noise, which is around 3 dB stronger than in scan 11, and shows a large variation during the 5 minutes that the scan lasts. I do not know why the noise in this scan has increased so much.

The last ON scan shows a behaviour that is similar to the first scan, although the SNR is some 0.5 dB worse. Again, there are large variations in the estimate of signal power.

The SNR of the middle ON scan (number 13) is in fact too low to decode the telemetry signal, so we will only use the first and last ON scans (numbers 11 and 15).

The GNU Radio decoder flowgraph is based on the flowgraph that I used in the previous post. It can be seen in the figure below (click on the figure to view it in full size). The flowgraph can be downloaded here.

In comparison to the recording from 2015, which was easy to decode, these recordings are more difficult to decode, since the SNR is closer to the decoding threshold. The figure below, which is taken from the CCSDS TM Synchronization and Channel Coding Green Book, shows the BER of the rate 1/2 convolutional code used by Voyager 1. Frames are 7680 bits long, so for a FER of 1% we need a BER of \(10^{-6}\). This is achieved at 4.8 dB Eb/N0. We see that we have somewhat more than 1 dB of margin, so in principle things look good.

However, in practice the decoder doesn’t work so well. First, there’s always going to be some implementation losses in the decoder. Also, as we will see below, the large variations in signal power (of around 1 dB) that we have noticed above will give problems, causing burst errors in the Viterbi decoder output.

To try to understand the performance of the decoder better and see if it could be improved, I have tried to monitor the SNR of the data subcarrier in different points of the decoding chain. The GNU Radio flowgraph outputs to a File Sink in several intermediate steps. The signal is then analyzed in a Jupyter notebook to estimate its Eb/N0. These points are described in the sections below.

The first intermediate step where the SNR is measured is the output of the PLL. Assuming an ideal carrier phase recovery, the two sidebands of the data subcarrier should add coherently, yielding the Eb/N0 that has been estimated above by adding the power of the two sidebands. However, the PLL does not track the carrier signal perfectly, so some power will be lost in the coherent combination of the data sidebands.

The calculations and plots are done in this Jupyter notebook. The figure below shows the data subcarrier at the output of the PLL. The FFT frequency resolution is 1.28 Hz. This makes visible not only the two tones due to the 010101 sequences in the symbols, but also their odd harmonics. These are caused by the square pulse shape filter.

In fact, the power decay of these harmonics matches that of a square wave, which is \(20 \log_{10} k\), so that the 3rd harmonic of a square wave is 9.54 dB below the fundamental, the 5th is 13.98 dB down, the 7th is 19.08 dB down, and the 11th is 20.83 dB down. We need to take into account that the PSD shows (S+N)/N rather than S/N. Assuming an (S+N)/N of the fundamental of 13 dB (which is more or less what we see in the plot), the (S+N)/N of the square wave odd harmonics would be 4.92, 2.45, 1.42 and 0.91 dB, which agrees with what we see here.

The area marked in dark blue in the plot is used to measure the signal plus noise power, and the area marked in light blue is used to measure the noise power. This gives the following Eb/N0 estimates for the data subcarrier after the PLL. The estimates done in the waterfall are shown for comparison.

Scan | PLL output Eb/N0 (dB) | Waterfall Eb/N0 (dB) | Loss (dB) |

11 | 5.97 | 6.33 | 0.36 |

15 | 5.32 | 5.90 | 0.58 |

The cause of losses in the PLL are the phase errors in the carrier phase recovery. To put things in perspective, it is good to see what constant phase error would give the losses we observe. Since the loss in dB units is \(20 \log_{10}(cos \theta)\), where \(\theta\) is the phase error in radians, we see that a loss of 0.36 dB corresponds to a phase error of 16.4 degrees, and a loss of 0.58 dB corresponds to a phase error of 20.7 degrees.

The variance for the PLL phase error is approximately \(\sigma^2 = 1/\rho\), where \(\rho\) denotes the loop SNR, which can be computed as \(\rho = C/(N_0B_L)\), with \(B_L\) the loop bandwidth. More precisely, the PLL phase error distribution is a Tikhonov distribution with parameter \(\kappa = \rho\). For large \(\rho\), the Tikhonov distribution can be approximated by a normal distribution with variance \(1/\rho\). For more details about the PLL error, see for instance this paper. The corresponding carrier C/N0’s, loop bandwidths and phase error standard deviations are shown in this table.

Scan | Carrier C/N0 (dB) | Loop bandwidth (Hz) | Phase error \(\sigma\) (deg) |

11 | 23.74 | 2.5 | 6.26 |

15 | 23.21 | 5 | 8.85 |

Note that a loop bandwidth of 5 Hz was used in scan 15 because the loop bandwidth of 2.5 Hz didn’t lock the loop properly.

These phase errors due to AWGN are significantly smaller than the phase errors of 16.4 and 20.7 degrees that we have mentioned above.

We can be more precise and compute the average power loss using the phase error Tikhonov distribution\[f(\theta) = \frac{\exp(\rho \cos \theta)}{2\pi I_0(\rho)}.\]Assuming that the subcarrier has power one at the input of the PLL, the average power of the subcarrier at the PLL output is\[\int_{-\pi}^\pi \cos^2(\theta) f(\theta)\, d\theta.\]This integral can be evaluated numerically for the values of \(\rho\) corresponding to each of the scans we are studying. We obtain losses of 0.05 dB and 0.10 dB for scans 11 and 15 respectively. These are much smaller than the losses we are observing.

Another source of error in PLLs is the steady state error due to higher order dynamics that are not modelled by the loop filter. For a loop filter of order 2, such as the one used by the PLL blocks in GNU Radio, the steady state error is\[\widetilde{\varphi}_{ss} \approx \frac{1}{2B_L^2}\varphi”.\]The exact value depends on the of the placement of the loop poles, but it is proportional to \(\varphi”/B_L^2\).

In our case, the Doppler drift rate is approximately 110 Hz over 300 seconds. This gives 2.3 rad/s². With a loop bandwidth of 2.5 Hz, we get a steady state error of 10.5 degrees, and with a loop bandwidth of 5 Hz we get a steady state error of 2.6 degrees. In the case of scan 11 this might account for the losses we see, since, roughly speaking, 11 degrees of steady state error plus 6 degrees of error due to noise would give the 17 degrees of error that correspond to the 0.36 dB loss we are seeing. In the case of scan 15 this doesn’t account for the losses, since we only have 3 degrees of steady state error plus 9 degrees of error due to noise, but we need a total of 21 degrees of error to explain the 0.58 dB loss.

It might be a good idea to remove the Doppler drift (which is almost a constant) to reduce the stress on the PLL and check if this reduces the PLL losses.

Something else that can be measured at the output of the PLL is the subcarrier frequency. By using the two strongest spectral lines, we can measure a frequency of 22497.3 Hz. This is an error of -117.9 ppm with respect to the nominal 22.5 kHz subcarrier. According to NASA HORIZONS, the range-rate at the time when the recording was made was 31.599 km/s. This would give a Doppler of -105.4 ppm, which agrees with the observed value. However, we need to take into account that the FFT resolution is not good enough to measure this accurately, as one FFT bin corresponds to 56.9 ppm of 22.5 kHz.

The data subcarrier is processed with the Symbol Sync GNU Radio block to perform pulse shape filtering and symbol clock recovery. The maximum likelihood time error detector is used. This forces us to use a pulse shape with continuous derivate, since the maximum likelihood TED needs the derivative filter. Thus, instead of using a square pulse shape we use a root-raised cosine pulse shape. The excess bandwidth is set to 1.0, since that seems to work best regarding output SNR. The loop bandwidth is set to a low value. After symbol synchronization, a Costas loop is used to recover the residual subcarrier phase and frequency. It uses a low loop bandwidth.

At the output of the Costas loop we should have the optimally sampled and filtered symbols in the I component, and noise in the Q component. The figure below shows the symbols obtained from scan 11. The first 5000 symbols have been thrown away, since the loops haven’t locked yet.

To estimate the SNR of the signal at this point we use the \(M_2M_4\) estimator described in the paper “A Comparison of SNR estimation techniques for the AWGN channel“. For a complex signal, this estimator is\[\frac{\sqrt{2M_2^2-M_4}}{M_2 – \sqrt{2M_2^2-M_4}},\]where \(M_2\) and \(M_4\) denote the second and fourth order moments respectively:\[M_2 = E[|x_n|^2],\quad M_4 = E[|x_n|^4].\]

In the table below we compare the results of this SNR estimator with the SNR estimates done before symbol synchronization.

Scan | Costas output Eb/N0 (dB) | PLL output Eb/N0 (dB) | Loss (dB) |

11 | 5.40 | 5.97 | 0.57 |

15 | 4.77 | 5.32 | 0.55 |

Again, we have noticeable losses in this step of the decoder chain. Some of the losses can be explained by the mismatch between the transmit pulse shape filter (which is is a square shape) and the receiver filter (for which we are using an RRC filter). Here there is an opportunity for improvement by trying to use a square pulse shape in the receiver. Loop jitter can also explain some of the losses, but however the loop bandwidths are already set quite narrow to minimize jitter.

Usually, we would send the soft symbols to a Viterbi decoder, since Voyager 1 uses the typical CCSDS \(k=7, r=1/2\) convolutional code. However, the SNR is not good enough for error-free decode. When trying to study the contents of the frames to do reverse engineering, it can be rather hard to do so when there are bit errors, because some of the patterns we spot may be caused by bit errors rather than by the actual contents of the frames. It is much more helpful to have some soft output FEC decoder, since that gives us a level of confidence in the decoded data that we can use as a guide when interpreting the output.

Instead of using a soft output Viterbi algorithm, I have decided to use the BCJR algorithm. The whole output from each scan is processed at once by the BCJR decoder. The implementation of the decoder follows Algorithm 4.2 in the book “Iterative Error Correction” by Sarah Johnson. As in this book, the implementation I have done favours readability rather than efficiency. The BCJR decoder and the related plots are implemented in this Jupyter notebook.

To try to achieve the best performance with the BCJR decoder, we attempt to give it the best possible estimate on the noise variance \(\sigma^2\). To do so, we run the decoder with an initial estimate. We run hard decision on the BCJR output and encode the result with the convolutional code. We use this to wipe-off the data in our symbols. There are some symbol errors still, but most of the symbols are correct. Then we can measure the mean and variance of the wiped-off symbols, and use that to supply the BCJR decoder an improved estimate of \(\sigma^2\). We repeat this process a few times to improve the results.

The figure below shows the soft output of the BCJR decoder, which is the log-likelihood ratio for the bits. At the beginning the output is around zero because the loops haven’t locked yet. Then we see that most of the time the decoder is able to correct all the errors, but there are several times when the log-likelihood ratio drops close to zero. As we will see later, these seem to correspond to drops in the CN0 of the carrier.

The next figure shows the wiped-off symbols, using the re-encoded hard decision on the output from the BCJR decoder. We see that most of the symbols have been wiped off correctly. This is used to estimate the amplitude and noise variance of the signal.

We can use the amplitude and noise variance of this wiped-off signal to estimate the Eb/N0. If we denote by \(y_n\) the wiped-off symbols, an estimate of the SNR is given by \(E[y_n]/(2E[|y_n|^2])\). Alternatively, we can use equation (31) in the paper “A Comparison of SNR estimation techniques for the AWGN channel“, which is known as the SNV TxDA estimate for a real signal. This gives a very similar result.

The table below compares the results of this estimate with the \(M_2M_4\) estimate obtained at the output of the Costas loop. We see that the results are slightly lower. Perhaps the reason is due to the occasional symbol errors after wipe-off, or just to the different estimators used

Scan | Wipe-off Eb/N0 (dB) | Costas output Eb/N0 (dB) | Difference (dB) |

11 | 5.26 | 5.40 | 0.14 |

15 | 4.61 | 4.77 | 0.16 |

To assess the performance of the BCJR decoder at this SNR, we can build a simulated signal on AWGN and run it through the decoder. These are the results of decoding a simulated signal at the same SNR as scan 11. We see that the decoder works very well, and there should be no errors at its output. This matches the fact that a convolutionally encoded signal can be decoded without problems at an Eb/N0 of 5 dB.

Therefore, we see that the problem with our recordings is that the SNR is not stationary. There are some sudden SNR drops that cause errors in the BCJR decoder (or in the Viterbi decoder).

These SNR drops might be caused by several factors. The most obvious cause are drops in the SNR of the phase-modulated signal. There are other possible explanations, such as instability in the phase/clock of the signal, which would cause larger errors in the loops.

To try to understand the cause of these drops, we perform an SNR estimate on the residual carrier. We have already such an estimate using the waterfall, but the time resolution was not high enough to see the drops clearly.

To perform the SNR estimate, we take the output of the PLL and run it through a low pass filter with a noise bandwidth of 10 kHz. The output of this filter is integrated coherently down to a 10 Hz rate. Additionally, the complex magnitude squared of the filter output is taken and integrated down to a 10 Hz rate. These two outputs are processed in a Jupyter notebook.

The next figure shows the 10 Hz coherent integrations of the PLL output corresponding to scan 11. As expected, the residual carrier appears in the I component, while the Q component contains noise and leakage from the carrier due to phase noise. In fact we see that the noise variance in the Q component is much larger than in the I component, so the extra noise variance must be due to phase noise (see above for some notes on the PLL phase jitter).

We also see some sudden drops in the amplitude of the I component. There is an AGC in the flowgraph, but since the AGC acts on the full 83.7 kHz bandwidth, the noise power dominates the AGC input, so the changes in amplitude we see here correspond to changes in SNR (which actually are caused by changes in the signal power).

The plot below shows an estimate of the CN0 of the residual carrier with a resolution of one second. This has been obtained by averaging the 10 Hz measurements in groups of 10 and then noting that the coherent integrations measure signal power plus noise in 10 Hz, while the power at the filter output measures signal power plus noise in 10 kHz.

The changes in the CN0 show the same behaviour as in the plot with lower temporal resolution obtained from the waterfall. The CN0 has large variations of up to 2.5 dB. It seems that the drops in the log-likelihood ratio of the BCJR are somewhat related to these drops in CN0, but the match is not completely perfect.

The two figures below show the corresponding plots for scan 15.

We can take the log-likelihood ratios produced by the BCJR decoder and detect the ASM, which is `0x03915ED3`

, in order to correct the for the possible 180º phase ambiguity in the Costas loop and to align with the telemetry frames. Then we can do a raster plot of the log-likelihoods of the bits in each of the frames.

The raster plot is done by sections of 128 bits. In the raster plots each of the frames is shown as a row, while each column corresponds to the bit position in the frame. The red colours correspond to ones (positive log-likelihood ratios), and the blue colours corresponds to zeros (negative log-likelihood rations). Lighter colours correspond to log-likelyhood ratios with smaller magnitude, which mean that the BCJR decoder is not so sure about the correct output. The first plot is shown below.

First we can see the 32 bit ASM. Then we see an interesting pattern around bit 40 that changes every other frame. Around frame 50 we see a counter which is at least 6 bits wide. After the first 64 bits we have almost a repetition of the same data. The same 32 bit ASM appears on position 64, we have the changing pattern around bit 105, and also the counter around bit 115. We note that some of the bits with a lighter colour have decoding errors, but the log-likelihood information from the BCJR decoder helps us spot and discard these.

The Voyager 1 frames are 7680 bits long, so there are a total of 60 plots of segments of 128 bits. All these are shown in this Jupyter notebook. Other than finding a few binary counters and observing the general structure of the frames, I haven’t been able to figure out the meaning of any of this data.

Most of the data used in this post (excepting large files), as well as the code used for the calculations and figures can be found in this repository.

]]>Here I show an approach that I first learned from Wei Mingchuan BG2BHC two years ago during the Longjiang-2 lunar orbiter mission. While writing our paper about the mission, we wanted to compute a closed expression for the BER of the LRTC modulation used in the uplink (which is related to \(m\)-FSK). Using a clever idea, Wei was able to find a formula that involved an integral of CDFs and PDFs of chi-squared distributions. Even though this wasn’t really a closed formula, evaluating the integral numerically was much faster than doing simulations, specially for high \(E_b/N_0\).

Recently, I came again to the same idea independently. I was trying to compute the symbol error rate of \(m\)-FSK and even though I remembered that the problem about LRTC was related, I had forgotten about Wei’s formula and the trick used to obtain it. So I thought of something on my own. Later, digging through my emails I found the messages Wei and I exchanged about this and saw that I had arrived to the same idea and formula. Maybe the trick was in the back of my mind all the time.

Due to space constraints, the BER formula for LRTC and its mathematical derivation didn’t make it into the Longjiang-2 paper. Therefore, I include a small section below with the details.

We will assume that the symbols sent by the transmitter are a set of orthonormal vectors \(\{v_1,\ldots,v_m\}\) in a space which is either \(\mathbb{C}^N\) in the discrete time case, or \(\mathbb{C}^{[0,T]}\), the set of functions from the interval \([0,T]\) to \(\mathbb{C}\), in the continuous time case (though the space where these vectors live is not so important, as long as one is able to define white Gaussian noise on it, in the precise sense given below).

Note that this is the case in \(m\)-FSK whenever the tones are orthogonal, meaning that the frequency spacing between each pair of tones is an integer multiple of the symbol rate, and there is no pulse shape filtering. In particular this setting does not cover GFSK.

The transmitter sends a vector \(v_k\) with \(1 \leq k \leq m\) and the receiver receives that vector plus noise, which we denote by \(w = v_k + n\). Without loss of generality, we will assume below that \(k = 1\).

The noise \(n\) will be assumed to be Gaussian and white, so that in particular, the scalar products \(\langle n, v_1\rangle\), …, \(\langle n, v_m\rangle\) are independent complex normal random variables with mean zero and variance \(\sigma^2 = (E_s/N_0)^{-1}\).

In the case of coherent detection, the receiver performs the maximum-likelihood detection of the symbols. It computes \(\text{Re} \langle w, v_j\rangle\) for \(j = 1,…,m\), and chooses as output the symbol \(v_j\) that maximizes this expression. Since \(w = v_1 + n\), we have\[\text{Re} \langle w, v_1\rangle = 1 + \text{Re}\langle n, v_1\rangle.\]For \(j \geq 2\) we have\[\text{Re}\langle w, v_j\rangle = \text{Re}\langle n, v_j\rangle.\]The real parts of the scalar products involving the noise are distributed as independent real normal variables with mean zero and variance \(\sigma^2/2\). Dividing by \(\sigma/\sqrt{2}\) and putting \(\alpha = \sqrt{2E_s/N_0}\), we see that the probability of correct detection (which is \(1-\mathrm{SER}\), where \(\mathrm{SER}\) denotes the symbol error rate) is equal to the probability that \(\alpha + N_1\) is greater than the maximum of \(N_2, \ldots, N_m\), where \(N_1,\ldots, N_m\) are independent real normal variables with zero mean and variance one. This can be computed using conditional expectation as we indicate now.

By conditioning to the fact that \(N_1 = t – \alpha\), we have to compute\[\begin{split}1-\mathrm{SER} &= E[\mathbb{P}(N_2 \leq t, \ldots, N_m \leq t\, |\, N_1 = t – \alpha)] \\ &= E[\mathbb{P}(N_2 \leq t\, |\, N_1 = t-\alpha)^{m-1}] \\ &= \int_{-\infty}^{+\infty} F(t)^{m-1} f(t-\alpha)\, dt,\end{split}\]where \(F\) denotes the cumulative distribution function of the normal distribution and \(f(t) = e^{-t^2/2}/\sqrt{2\pi}\) denotes the probability density function of the normal distribution. The second equality above is due to the fact that \(N_2,\ldots,N_m\) are independent and equidistributed.

The integral above cannot be evaluated in closed form, so we can write the symbol error probability for the coherent detection case as\[\mathrm{SER} = 1 – \int_{-\infty}^{+\infty} F(t)^{m-1} f\left(t-\sqrt{2E_s/N_0}\right)\, dt.\]

In the non-coherent detection case, the receiver computes the powers \(|\langle w, v_j\rangle|^2\) for \(j = 1,…,m\) and chooses as output the vector \(v_j\) that maximizes this expression. The calculations for this case are similar to those for the coherent detection, but now we have chi-squared variables instead of normals.

Indeed, the power\[|\langle w, v_1\rangle|^2 = |1 + \langle n, v_1\rangle|^2\]equals \(\sigma^2 X_1\), where \(X_1\) is a non-central chi-squared variable with two degrees of freedom and non-centrality parameter \(\lambda = 2E_s/N_0\). The remaining powers\[|\langle w, v_j\rangle|^2 = |\langle n, v_j\rangle |^2\]are equal to \(\sigma^2 X_j\), where \(X_j\) are central chi-squared variables with two degrees of freedom. Moreover, the variables \(X_1, X_2,\ldots, X_m\) are independent.

By conditioning to \(X_1 = t\), we have that the probability of correct detection equals\[\begin{split}1-\mathrm{SER}&= E[\mathbb{P}(X_2 \leq t, \ldots, X_m \leq t\, |\, X_1 = t)] \\ &= E[\mathbb{P}(X_2 \leq t\, |\, X_1 = t)^{m-1}] \\ &= \int_{-\infty}^{+\infty} G(t)^{m-1} g_\lambda(t)\, dt.\end{split}\]Here \(G(t)\) denotes the cumulative distribution function of a central chi-squared variable with two degrees of freedom and \(g_\lambda(t)\) denotes the probability density function of a non-central chi-squared variable with two degrees of freedom and non-centrality parameter \(\lambda = 2E_s/N_0\).

Thus, the symbol error probability can be computed as\[\mathrm{SER}=1 – \int_{-\infty}^{+\infty} G(t)^{m-1} g_{2E_s/N_0}(t)\, dt.\]

The LRTC (low-rate telecommand) modulation used in the Longjiang-1/2 mission could be described as a 2-FSK modulation where the tones are spread with a GMSK PN code. Alternatively, it can be described as transmitting a GMSK PN code either on one of two frequencies, with the symbol encoded by the choice of frequency. After de-spreading the GMSK PN code, what remains is a 2-FSK modulation. The tone separation is much larger than the symbol rate. In order to handle frequency offsets due to Doppler, an FFT whose bin width equals the symbol rate is done, and the symbol is decided according to which FFT bin has maximum power. Half of the FFT bins (the ones corresponding to negative frequencies) correspond to symbol 0, while the remaining half correspond to symbol 1.

Here the vectors \(v_j\) are the complex exponentials corresponding to the tones used by the FFT,\[v_j[l] = e^{2\pi i l/m},\quad l=1,…,m.\]We assume that the received tone lies exactly in one FFT bin, so the transmitted vector is one of these tones, and we can assume again that the transmitted vector is \(v_1\) without loss of generality.

The statistics are exactly as in the \(m\)-FSK non-coherent detection case, but now we consider that a bit error happens whenever the \(v_j\) that maximizes \(|\langle w, v_j\rangle|^2\) has \(j > m/2\) (whereas in the \(m\)-FSK non-coherent detection case we had \(j \geq 2\)).

The probability that there is no bit error can be computed in the following way. We denote by \(Z\) the event corresponding to \(X_1 \geq X_2\), \(X_1 \geq X_3\), …, \(X_1 \geq X_m\). There are two possible ways in which we can decode successfully. Either \(Z\) happens, in which case we will choose the correct bin that corresponds to the transmitted tone, and hence obtain the correct bit, or either \(Z\) does not happen, but among the variables \(X_2,\ldots,X_m\) the variable \(X_j\) with maximum value has \(j \leq m/2\). In this latter case we do not choose the correct bin but still choose a bin that decodes to the correct bit.

The probability that \(Z\) happens is just what we have computed in the non-coherent detection case, so\[\mathbb{P}(Z) = \int_{-\infty}^{+\infty} G(t)^{m-1} g_{2E_s/N_0}(t).\] If we condition to \(\neg Z\), the probability that among \(X_2,\ldots,X_m\) the variable with maximum value has \(j \leq m/2\) is just \((m/2-1)/(m-1)\), since we have \(m-1\) independent and identically distributed variables which are also independent of \(\neg Z\), so the probability that the maximum of them is one of the first \(m/2-1\) is simply \((m/2-1)/(m-1)\).

Therefore, the probability of decoding correctly is\[\begin{split}1-\mathrm{BER}&=\mathbb{P}(Z) + \frac{m/2-1}{m-1}\mathbb{P}(\neg Z) = \mathbb{P}(Z) + \frac{m/2-1}{m-1}(1-\mathbb{P}(Z))\\ &= \frac{m/2-1}{m-1} + \frac{m}{2m-2}\mathbb{P}(Z).\end{split}\]Hence, the bit error rate is\[\begin{split}\mathrm{BER} &= 1-\frac{m/2-1}{m-1} – \frac{m}{2m-2}\mathbb{P}(Z) = \frac{m}{2m-2}[1 – \mathbb{P}(Z)] \\ &= \frac{m}{2m-2}\left[1 – \int_{-\infty}^{+\infty} G(t)^{m-1} g_{2E_s/N_0}(t)\, dt\right].\end{split}\]

In the above we have concerned ourselves with the symbol error rate. When \(m = 2^b\), so that \(b\) bits are encoded in each symbol, we can easily calculate the bit error rate from the knowledge of the symbol error rate. In fact, if we denote by \(B\) the number of bit errors that occur in a symbol (so that \(B\) is a random variable), we have\[BER = \frac{1}{b}E[B].\]Now, conditioned to the fact that \(B \geq 1\), i.e., that there is a symbol error, we see that all the wrong symbols are equiprobable. Since for \(l=1,\ldots,b\) there are \({b \choose l}\) symbols with exactly \(l\) bit errors, we have,\[\begin{split}E[B] &= \mathbb{P}(B \geq 1)E[B | B\geq 1] = \mathrm{SER}\frac{1}{2^b -1}\sum_{l=1}^b l {b \choose l} \\ &= \mathrm{SER}\frac{1}{2^b-1}\frac{b}{2}2^b = \mathrm{SER}\frac{b 2^{b-1}}{2^b-1}.\end{split}\]Here we have used\[\sum_{l=0}^b l {b \choose l} = \frac{b}{2}2^b,\]which is a well-known identity related to the expectation of a binomial distribution.

We conclude that\[\mathrm{BER} = \frac{2^{b-1}}{2^b-1}\mathrm{SER} = \frac{m}{2m-2}\mathrm{SER}.\]

We see that we have arrived to the same expression as for the BER of LRTC, which equals the SER of non-coherent detection of \(m\)-FSK multiplied by the factor \(\frac{m}{2m-2}\). In fact, there is a nice relation between the BER of \(m\)-FSK and the BER of LRTC. If we encode the words of \(b\) bits onto the set of symbols in such a way that the symbols \(j=1,\ldots,m/2\) have 0 as their MSB and the symbols \(j=m/2+1,\ldots,m\) have 1 as their MSB, we see that LRTC is equivalent to \(m\)-FSK but only using the MSB of each symbol. Thus, the BER of LRTC is equal to the BER of the MSB of \(m\)-FSK. But the BER of the MSB is equal to the overall BER of \(m\)-FSK, since as remarked above, when there is a symbol error, all the wrong symbols are equiprobable.

It is worth to do some simulations to verify that the formulas given above are correct. These are done in this Jupyter notebook. The resulting plot is shown here. It compares the curves obtained by evaluating the formulas with simulations done in steps of 0.5 dB of \(E_s/N_0\). The formulas and simulations agree very well, as expected.

Note that the LRTC BER and the non-coherent \(m\)-FSK BER coincide, as we have explained above.

]]>