dsp – Daniel Estévez

LTE uplink: PUSCH

This post belongs to my series about LTE. In the LTE uplink, the PUSCH (physical uplink shared channel) is the channel used to trasmit data from the UEs (phones) to the eNB (base station). It plays a role analogous to the PDSCH (physical downlink shared channel), which is used to transmit data in the downlink. In this post I will decode the PUSCH in a recording that I made of my phone uplink a couple years ago.

The PUSCH uses the same kind of techniques as the PDSCH for transport block coding, so all the Turbo code implementation and related algorithms from my post about the PDSCH will be re-used here. However, there is an important difference between the PDSCH and the PUSCH that makes decoding the PUSCH much harder. The LTE downlink is, in a certain sense, a self-descriptive signal. The UEs don’t know in advance the configuration that will be used to transmit each transport block in the PDSCH, because the eNB decides it on the fly. Therefore, the eNB announces PDSCH transmissions in the PDCCH (physical downlink control channel).

When I decoded the PDCCH and PDSCH, the only slightly clever thing that I had to do was to find the RNTIs (radio network temporary indicators). These are 16-bit numbers that are used to address each PDSCH transmission. There are some of them which are statically allocated to some broadcast purpose (SI-RNTI, P-RNTI, RA-RNTI), and the C-RNTIs, which are individually assigned to each UE. The CRC-16 of the PDCCH DCIs is XORed with the RNTI to which the transmission is addressed. At any time, a UE knows the set of RNTIs that it is monitoring, so it calculates the CRC-16 of the received DCI, computes its XOR with each of its assigned RNTIs, and compares the result with the CRC-16 in the DCI. If there is a match, the DCI is accepted. This is a way of filtering out messages without spending additional bits to put the RNTI in a field in the DCI.

When we are monitoring an LTE downlink, we don’t know which RNTIs are being used. With some cleverness, if the SNR is good enough, we can detect and select each PDCCH transmission by hand (it is necessary to guess the REGs that it occupies and the DCI length) and then, assuming that we have decoded the DCI with no bit errors, obtain the RNTI as the XOR of the calculated CRC and the received CRC. This is what I did in the post about the PDCCH. If we were monitoring the LTE downlink for a longer time, this trick wouldn’t even be necessary. The C-RNTIs assigned to the UEs are communicated to them in a RAR transmitted with the RA-RNTI, as a response to their PRACH (see the post where I analyze this in Wireshark). So a downlink monitor application can simply watch the SI-RNTI, P-RNTI and RA-RNTI, and add any C-RNTIs to a list of known connected UEs when it sees a RAR. The C-RNTIs can be removed from this list after a period of inactivity, because the UE would have been sent to the idle state by the network. This idea really shows that it is possible to decode everything in the LTE downlink without doing clever blind decoding tricks.

In contrast, the LTE uplink is not self-descriptive. The eNB defines the configuration of each PUSCH transmission when it sends the uplink grant to the UE. So the UE doesn’t need to communicate this configuration again to the eNB when it transmits in the PUSCH. The information that describes the PUSCH transmissions is effectively in the PDCCH in the downlink, and in this case I don’t have a recording of the downlink that matches my uplink recording. This makes decoding the PUSCH much more difficult, but nevertheless not impossible. With some clever ideas and blind decoding tricks we can usually find all the information we’re missing. In the rest of this post, I describe how to do this in detail. It will be long and quite technical.

Computing PLL coefficients

Whenever I implement a PLL or a similar control loop, I invariably consult the formulas in the paper Controlled-Root Formulation for Digital Phase-Locked Loops, by Stephens and Thomas. Other sources that give formulas for the loop coefficients in terms of the loop bandwidth perform a continuous time analysis and then use a bilinear transform or a similar kind of transform to translate results between continuous time and discrete time. The appeal of the paper by Stephens and Thomas is that they work directly in discrete time, using a beautiful complex contour integral argument to calculate the loop bandwidth in terms of the loop coefficients for a loop of any order. Unfortunately, their method doesn’t give a closed-form formula for the loop coefficients in terms of the loop bandwidth. The loop coefficients can be obtained numerically, and the paper gives tables for common loop bandwidths and orders.

In most of my designs I use a second order loop with supercritical damping, which means that the two loop roots in the z-plane are equal (and hence real). As I was doing a design the other day, I wondered whether in this specific situation, which is much simpler than the general case, a closed-form solution could be obtained. It turns out that this is the case, so I’ll be using this formula from now on. In this short post I explain how this is done and give the formula.

Maia SDR DDC

I have implemented an FPGA DDC (digital downconverter) in Maia SDR. Intuitively speaking, a DDC is used to select a slice of the input spectrum. It works by using an NCO and mixer to move to the centre of the slice to baseband, and then applying low-pass filtering and decimation to reduce the sample rate as desired (according to the bandwidth of the slice that is selected).

At the moment, the output of the Maia SDR DDC can be used as input for the waterfall display (which uses a spectrometer that runs in the FPGA) and the IQ recorder. Using the DDC allows reaching sample rates below 2083.333 ksps, which is the minimum sample rate that can be used with the AD936x RFIC in the ADALM Pluto (at least according to the ad9361 Linux kernel module). Therefore, the DDC is useful to monitor or record narrowband signals. For instance, using a sample rate of 48 ksps, the 400 MiB RAM buffer used by the IQ recorder can be used to make a recording as long as 36 minutes in 16-bit integer mode, or 48 minutes in 12-bit integer mode. With such a sample rate, the 4096-point FFT used in the waterfall has a resolution of 11.7 Hz.

In the future, the DDC will be used by receivers implemented on the FPGA, both for analogue voice signals (SSB, AM, FM), and for digital signals. Additionally, I also have plans to allow streaming the DDC IQ output over the network, so that Maia SDR can be used with an SDR application running on a host computer. It is possible to fit several DDCs in the Pluto FPGA, so this would allow tuning independently several receivers within the same window of 61.44 MHz of spectrum. In the rest of this post I describe some technical details of the DDC.

Decoding LTE MIMO with a single antenna

In my previous post I decoded LTE PDSCH (physical downlink shared channel) transmissions from an IQ recording that I had made of an eNB recording using an USRP B205mini and a single antenna. The eNB has two antenna ports, and it uses TM4 (closed-loop spatial multiplexing) to transmit the PDSCH to each individual UE. In the post, I repeated several times that two-codeword TM4 is intended for 2×2 MIMO and relies on the receiver having at least 2 antennas in order to separate the two transmitted codewords, so I couldn’t decode these transmissions with my recording.

In this post I will show that in some cases this is not true, and these two-codeword TM4 transmissions can be decoded with just one receive antenna. I will decode some of these two-codeword transmissions from my IQ recording by using the ideas I introduce below.

LTE downlink: PDSCH

This post is a continuation of my series about LTE, where I decode a recording of the downlink signal of an eNB using Jupyter notebooks written from scratch. Here I will decode the PDSCH (physical downlink shared channel), which contains the data transmitted by the eNB to the UEs, including PDUs from the MAC layer, and some broadcast information, such as the SIB (system information block) and paging. At first I planned this post to be about decoding the SIB1. This is the first block of system information, and it is the next thing that a UE must decode after decoding the MIB (located in the PBCH) to find the configuration of the cell. The SIB1 is always transmitted periodically, and its contents and format are relatively well known a priori (as opposed to a user data transmission, which could happen at any time and contain almost anything), so it is a good example to try to decode PDSCH transmissions.

After writing and testing all the code to decode the SIB1, it was too tempting to decode everything else. Even though at first I wrote my code thinking only about the SIB1, with a few modifications I could decode all the PSDCH transmissions (except those using two-codeword spatial multiplexing, since my recording was done with a single antenna). I will still use the SIB1 as an example to show how to decode the PDSCH step by step, but I will also show the rest of the data.

The post is rather long, but we will get from IQ samples to looking at packets in Wireshark using only Python, so I think it’s worth its length.

A modern implementation of the Parks-McClellan FIR design algorithm

The Parks-McClellan FIR filter design algorithm is used to design optimal FIR filters according to a minimax criterion: it tries to find the FIR filter with a given number of coefficients whose frequency response minimizes the maximum weighted error with respect to a desired response over a finite set of closed sub-intervals of the frequency domain. It is based on the Remez exchange algorithm, which is an algorithm to find uniform approximations by polynomials using the equioscillation theorem. In signal processing, the Parks-McClellan algorithm is often call Remez. This algorithm is a very popular FIR design algorithm. Compared to the windowing method, which is another commonly used algorithm, it is able to obtain better filters (for instance, meeting design constraints with less coefficients), in part because it allows the designer to control the passband ripple and stopband attenuation independently by means of the weight function.

I have been laying some groundwork for Maia SDR, and for this I will need to run the Parks-McClellan algorithm in maia-httpd, the piece of software that runs in the Pluto ARM CPU. To evaluate what implementation of this algorithm to use, I have first gone to the implementations that I normally use: the SciPy remez function, and GNU Radio’s pm_remez function. I read these implementations, but I didn’t like them much.

The SciPy implementation is a direct C translation of the original Fortran implementation by McClellan, Parks and Rabiner from 1973. This C translation was probably written decades ago and never updated. The code is very hard to read. The GNU Radio implementation looks somewhat better. It is a C implementation that was extracted from Octave and dates from the 90s. The code is much easier to follow, but there are some comments saying “There appear to be some problems with the routine search. See comments therein [search for PAK:]. I haven’t looked closely at the rest of the code—it may also have some problems.” that have seemingly been left unattended.

Because of this and since I want to keep all the Maia SDR software under permissive open source licenses (the GNU Radio / Octave implementation is GPL), I decided to write from scratch an implementation of the Parks-McClellan algorithm in Rust. The result of this has been the pm-remez crate, which I have released recently. It uses modern coding style and is inspired by recent papers about how to improve the numerical robustness of the Parks-McClellan algorithm. Since I figured that this implementation would also be useful outside of Maia SDR, I have written Python bindings and published a pm-remez Python package. This has a few neat features that SciPy’s remez function doesn’t have. The Python documentation gives a walkthrough of these by showing how to design several types of filters that are commonly used. This documentation is the best place to see what pm-remez is capable of.

The rest of this post has some comments about the implementation and the things I’ve learned while working on this.

LTE Transmission Mode 4 (closed-loop spatial multiplexing)

This is a long overdue post. In 2022, I wrote a series of posts about LTE as I studied its physical layer to understand it better. In the last post, I decoded the PDCCH (physical downlink control channel), which contains control information about each PDSCH (physical downlink shared channel) transmission. I found that, in the recording that I was using, some PDSCH transmissions used Transmission Mode 4 (TM4), which stands for closed-loop spatial multiplexing. For an eNB with two antenna ports (which is what I recorded), this transmission mode sends either one or two codewords simultaneously over the two ports by using a precoding matrix that is chosen from a list that contains a few options. The choice is done by means of channel-state information from the UE (hence the “closed-loop” in the name).

In the post I found a transmission where only one codeword was transmitted. It used the precoding matrix \([1, i]^T/\sqrt{2}\). This basically means that a 90º phase offset is applied to the two antenna ports as they simultaneously transmit the same data. I mentioned that this was the reason why I obtained bad results when I tried to equalize this PDSCH transmission using transmit diversity in another previous post, and that in a future post I would show how to equalize this transmission correctly. I have realized that I never wrote this post, so now it is as good a time as any.

Demodulation of the 5G NR downlink

At the end of July, Benjamin Menkuec was commenting in Twitter about some issues he had while demodulating a 5G NR downlink recording. There was a lively discussion in which other people and I participated. I had never touched 5G, but had done some work with LTE, so I decided to take the chance to learn more about the 5G physical layer. Compared to LTE, the changes are more evolutionary than revolutionary, but understanding what has changed, and how and why, is part of the puzzle.

I had to take an 11.5 hour flight in a few days, so I thought it would be a nice challenge to give this a shot, take with me the recordings that Benjamin was using and all the 3GPP documents, and write a demodulator in a Jupyter notebook from scratch during the flight, as I had done in the past with my LTE recordings. This turned out to be an enjoyable and interesting experience, quite different from working at home in shorter intervals scattered over multiple days or weeks, and with internet access. At the end of the flight I had a mostly working demodulation, but it had a few weird problems that turned out to be caused by some mistakes and misconceptions. I worked on cleaning this up and solving the problems over the next few days, and also now preparing this post.

Rather than trying to give an account of all my mistakes and dead ends (I spoke a little about these in Twitter), in this post I will show the final clean solution. There are some tricky new aspects in 5G NR (phase compensation, as we will see below) which can be quite confusing, so hopefully this post will do a good job at explaining them in a simple way.

The Jupyter notebook used in this post is here, and the recording in SigMF format can be found in this folder. Here I will only be using the first of Benjamin’s two recordings, since they are quite similar. It was done with an ADALM Pluto at 7.86 Msps and has a duration of 143 ms. The transmitter is an srsRAN 5 MHz cell. The recording was done off-the-air, or maybe with a cabled set up, but there are some other signals leaking in. The SNR is very good, so this is not a problem.

The first signal we find is at 9 ms. There is a transmission like this every 10 ms. As we will see, this is an SS/PBCH block. Something that stands out to those familiar with the LTE downlink spectrum is that the 5G NR spectrum is almost empty. In LTE, the cell-specific reference signals are transmitted all the time. In 5G this is not the case. Downlink signals are transmitted only when there is traffic. There is always a burst of one or several SS/PBCH blocks transmitted periodically (usually every 20 ms, but in this recording every 10 ms), as well as other signals that are always sent periodically (such as the SIB1 in the PDSCH), but this may be all if there is no traffic in the cell.

Maia SDR

I’m happy to announce the release of Maia SDR, an open-source FPGA-based SDR project focusing on the ADALM Pluto. The first release provides a firmware image for the Pluto with the following functionality:

Web-based interface that can be accessed from a smartphone, PC or other device.
Real-time waterfall display supporting up to 61.44 Msps (limit given by the AD936x RFIC of the Pluto).
IQ recording in SigMF format, at up to 61.44 Msps and with a 400 MiB maximum data size (limit given by the Pluto RAM size). Recordings can be downloaded to a smartphone or other device.

A note about non-matched pulse filtering

This is a short note about the losses caused by non-matched pulse filtering in the demodulation of a PAM waveform. Recently I’ve needed to come back to these calculations several times, and I’ve found that even though the calculations are simple, sometimes I make silly mistakes on my first try. This post will serve me as a reference in the future to save some time. I have also been slightly surprised when I noticed that if we have two pulse shapes, let’s call them A and B, the losses of demodulating waveform A using pulse shape B are the same as the losses of demodulating waveform B using pulse shape A. I wanted to understand better why this happens.

Recall that if \(p(t)\) denotes the pulse shape of a PAM waveform and \(h(t)\) is a filter function, then in AWGN the SNR at the output of the demodulator is equal to the input SNR (with an appropriate normalization factor) times the factor\[\begin{equation}\tag{1}\frac{\left|\int_{-\infty}^{+\infty} p(t) \overline{h(t)}\, dt\right|^2}{\int_{-\infty}^{+\infty} |h(t)|^2\, dt}.\end{equation}\]This factor describes the losses caused by filtering. As a consequence of the Cauchy-Schwarz inequality, we see that the output SNR is maximized when a matched filter \(h = p\) is used.

To derive this expression, we assume that we receive the waveform\[y(t) = ap(t) + n(t)\]with \(a \in \mathbb{C}\) and \(n(t)\) a circularly symmetric stationary Gaussian process with covariance \(\mathbb{E}[n(t)\overline{n(s)}] = \delta(t-s)\). The demodulator output is\[T(y) = \int_{-\infty}^{+\infty} y(t) \overline{h(t)}\, dt.\]The output SNR is defined as \(|\mathbb{E}[T(y)]|^2/V(T(y))\). Since \(\mathbb{E}[n(t)] = 0\) due to the circular symmetry, we have\[\mathbb{E}[T(y)] = a\int_{-\infty}^{+\infty} p(t)\overline{h(t)}\,dt.\]Additionally,\[\begin{split}V(T(y)) &= \mathbb{E}[|T(y) – \mathbb{E}[T(y)]|^2] = \mathbb{E}\left[\left|\int_{-\infty}^{+\infty} n(t)\overline{h(t)}\,dt\right|^2\right] \\ &= \mathbb{E}\left[\int_{-\infty}^{+\infty}\int_{-\infty}^{+\infty} n(t)\overline{n(s)}\overline{h(t)}h(s)\,dtds\right] \\ &= \int_{-\infty}^{+\infty}\int_{-\infty}^{+\infty} \mathbb{E}\left[n(t)\overline{n(s)}\right]\overline{h(t)}h(s)\,dtds \\ &= \int_{-\infty}^{+\infty} |h(t)|^2\, dt. \end{split}\]Therefore, we see that the output SNR equals\[\frac{|a|^2\left|\int_{-\infty}^{+\infty} p(t) \overline{h(t)}\, dt\right|^2}{\int_{-\infty}^{+\infty} |h(t)|^2 dt.}.\]

The losses caused by using a non-matched filter \(h\), in comparison to using a matched filter, can be computed as the quotient of the quantity (1) divided by the same quantity where \(h\) is replaced by \(p\). This gives\[\frac{\frac{\left|\int_{-\infty}^{+\infty} p(t) \overline{h(t)}\, dt\right|^2}{\int_{-\infty}^{+\infty} |h(t)|^2\, dt}}{\frac{\left|\int_{-\infty}^{+\infty} |p(t)|^2\, dt\right|^2}{\int_{-\infty}^{+\infty} |p(t)|^2\, dt}}=\frac{\left|\int_{-\infty}^{+\infty} p(t) \overline{h(t)}\, dt\right|^2}{\int_{-\infty}^{+\infty} |p(t)|^2\, dt\cdot \int_{-\infty}^{+\infty} |h(t)|^2\, dt}.\]

We notice that this expression is symmetric in \(p\) and \(h\), in the sense that if we interchange \(p\) and \(h\) we obtain the same quantity. This shows that, as I mentioned above, the losses obtained when filtering waveform A with pulse B coincide with the losses obtained when filtering waveform B with pulse A. This is a clear consequence of these calculations, but I haven’t found a way to understand this more intuitively. We can say that the losses are equal to the cosine squared of the angle between the pulse shape vectors in \(L^2(\mathbb{R})\). This remark makes the symmetry clear, but I’m not sure if I’m satisfied by this as an intuitive explanation.

As an example, let us compute the losses caused by receiving a square pulse shape, defined by \(p(t) = 1\) for \(0 \leq t \leq \pi\) and \(p(t) = 0\) elsewhere, with a half-sine pulse shape filter, defined by \(h(t) = \sin t\) for \(0 \leq t \leq \pi\) and \(h(t) = 0\) elsewhere. This case shows up in many different situations. We can compute the losses as indicated above, obtaining\[\frac{\left(\int_0^\pi \sin t \, dt\right)^2}{\int_0^\pi \sin^2t\,dt\cdot \int_0^\pi dt} = \frac{2^2}{\frac{\pi}{2}\cdot\pi}= \frac{8}{\pi^2}\approx -0.91\,\mathrm{dB}.\]