Description of the half-rate QPSK code proposed for the QPSK/FEC extension
to PSK31


Encoding
The basic idea is that the binary datastream which normally goes to the
BPSK modulator of PSK31, goes instead to a convolutional encoder, where
the binary bits (0,1) are replaced by quaternary symbols, which I shall label
A, B, C, and D in this write-up so as not to confuse them with binary values.
This means that the information transmitted has been doubled. This redundancy
is used at the receiving end to correct errors. The quaternary data is
modulated onto the RF signal by assigning each of the values A, B, C, D to
one of the possible 90-degree phase-shifts of the QPSK modulation. The
emitted signal is therefore called differential quaternary phase-shift keying,
or DQPSK.


Note that the bandwidth of the signal remains the same as BPSK, but because
four states are squeezed into the phase-shift circle instead of two, QPSK
is more susceptible to noise than BPSK. In fact QPSK will start to give
errors with 3dB less noise than BPSK does. However, because of the
error-correction applied at the receiving end, the overall result may be
an improvement, especially if the error correction is designed to handle
bursts of errors such as are often found on real radio circuits. The purpose
of this proposal is to build the QPSK/FEC system and see if there is a
useful advantage.


In the simplest form of a convolutional code, the transmitted symbols are
formed from a run of three transmitted data-bits. That is, if we think of
the data we want to transmit as being shifted into the lsb of a register,
with previous bits shifted left, the symbols are determined by the three
least-significant bits of the shift-register. Such a code is said to have a
constraint length of 3 (k=3). The transmitted symbol is chosen from a lookup
table as follows:

    3 lsb of transmit register     Transmitted symbol
                       0  0  0     A
                       0  0  1     D
                       0  1  0     C
                       0  1  1     B
                       1  0  0     D
                       1  0  1     A
                       1  1  0     B
                       1  1  1     C

This encode table is constructed by adding twice the parity function of all
three bits to the parity function of the end bits and using the resultant
number (between 0 and 3) to select one of the four transmitted symbols, thus:

           Shiftregister bit       2      1       0
                                 -------------------
                                   X      X       X ----> *2
               parity positions
                                   X              X ---->  +
                                                         -----
                                                         total

The parity patterns (polynomials) are carefully chosen to maximise the
performance of the error-correction. The reasons for choosing these patterns
and not others, are outside the scope of this write-up. These polynomials
clearly must have a 1 at each end (otherwise they would be of length k-1),
and they must be different. From this it can be seen that the two polynomials
used for k=3 are the only ones possible. For longer constraint lengths, there
is some choice available, and some choices are better than others.


Decoding
The decoding uses the Viterbi algorithm. What follows is a description of
one way of doing this process. For the full reasoning, consult an
appropriate textbook, but basically the idea is to try and work out
what the complete transmit shiftregister must have looked like by trying all
the combinations, working out what the transmitted symbol should have been
for each combination, comparing it with the received symbol, and keeping a
running total of how good or bad the match is at each step.

The measure of 'how good the match' is a measure of the magnitude of the
difference between the received and supposed symbol, sometimes called the
'distance' between the two symbols. If we were using the Viterbi algorithm on
a channel with binary symbols, we could simply say that the distance was 1 if
the symbols were different and 0 if they were the same. In the case of a
channel with quaternary symbols we need to extend this, but the basic idea
is that we are looking for a measure of the 'amount of disturbance' that
would have been experienced if one symbol had been corrupted to the other.
In the case of a QPSK channel where our four transmitted symbols A,B,C,D are
represented by vectors at the corners of a square, we can choose as a
distance measurement the actual geometric distances from each of the four
corners of a square to the others. To keep the arithmetic simple, we choose
to approximate these, choosing the distance between adjacent corners as 2
the distance across a diagonal as 3:

                     Received symbol
                      A   B   C   D
                    |--------------
                  A | 0   2   3   2
     supposed     B | 2   0   2   3
      symbol      C | 3   2   0   2
                  D | 2   3   2   0


If we suppose that the transmit shift register started off empty, and we then
want the transmitter to send a '1' bit. At this point, then, the transmit
shiftregister will contain a single 1, and from the encode table, the
transmitted symbol will be D (assume empty positions are 0's). At the receiver
we have two combinations to test (two states). Either the transmitter shift
register contained a 0 or it contained a 1 in the most-recent position. If it
had been a 0 we would expect to have received the symbol A and if it had been
a 1 we would expect to have received the symbol D. Of course, if we DID
receive a D we could correctly assume that the transmitted bit was indeed 1,
but suppose for the moment that the symbol we received was actually a C, then
we cannot decide immediately what was actually transmitted. We just keep a
note that shift register state 0 now has a distance of 3 (the distance between
the C we received and the A we should have got), and state 1 now has distance
2 (the distance between the C we received and the D we should have got), and
we proceed to receive the next symbol.

Each time we do this, we double the number of states, that is, the number of
shiftregister combinations doubles. At each step we make two copies of each
state, add a 0 to one and a 1 to the other, look up the symbol for each of
the two new states (using the 3 lsb's), and add the distance between that
symbol and the received symbol to the distance accumulated from the previous
step. Using the example, if the next transmitted bit was a 0, the transmit
shiftregister would then contain 10, the next transmitted symbol would be C,
and if the received symbol was also C, the receiver would then have 4 states
to keep track of:

previous distance   state    new distance      new total
        3            00           3                6
        3            01           2                5
        2            10           0                2
        2            11           2                4

Already we can see that state 10 is looking promising as the most
likely transmitted sequence, in spite of the error in the first symbol.

Viterbi was able to show that when we have got to the point where the number
of significant bits in the states becomes equal to the constraint length, we
can start to eliminate some of the states with high total distances, without
ever wrongly eliminating any state (i.e. any estimate of the transmitted
sequence) that might, in the end, have turned out to be more likely than any
of those states retained. We do this by comparing the distances of all the
pairs of states which differ only in the 'oldest' bit, and discarding the
one with the highest distance. In our k=3 example, this process can be done
on the next step, choosing the 4 best states out of the 8 combinations of
the 3 lsb's of each state, leaving four 3-bit states to survive to the next
step. At this next step, we again double the number of states from four to
eight by adding one more bit to the state, computing the new total distances
for all eight states, choosing the smaller of the pairs of states X000/X100,
X001/X101, X010/X110, and X011/X111 keeping the four best 4-bit states, and
so on. X here will be the decoder's estimate of the first bit of the
transmitted sequence. In our example, if there were no more errors after
the first symbol, then all four X's would be 1's and this first message bit
would have been correctly decoded.

In this way the decoder proceeds, keeping track of four total distances, and
building up four estimates of the transmitted message. At any point in the
message, the most-likely transmitted message is one of these, but in the
worst-case we have to keep all four 'live' right to the end of the message,
that is, we might need to build four possible complete message estimates and
only decide between them right at the end.

We can, in practice, shorten the size of the message-estimate store, and also
make the decoder output the message bit by bit rather that all at once at the
end of the whole message, by noting that when there are no errors all four
estimates are the same, and when there are correctable errors, the
message-estimates only differ from each other for a finite length back from
the most-recent bit. If we look at the M'th bit back from the most-recent
bit, where M is about 3-4 times the constraint length, and we find that all
four estimates are the same in this position, we can be pretty sure that the
decoder has corrected all the errors up to the bit M symbols ago, and we can
output that bit as our estimate of the transmitted bit M symbols ago. In
this way the decoder can be made to output one bit for each received symbol,
albeit after a delay of M, and the storage for the message estimates can be
limited to M bits per estimate.

With the constraint length 3 code above, the decoder works like this:

A data structure which I will call the state array is maintained, each
element of which represents one of the possible estimates of the distant
transmit shift-register state that survive after each symbol-time. In the
case of the constraint-length 3 code, there are 4 surviving states, so the
state array has 4 elements. Each element of the state arrray consists of
(a) a record of the total accumulated distance for this state and (b) a
record of the estimate of the transmitted shift register, going back at
least 12 bits.

The process that is run each time a symbol is received is as follows:

Two temporary arrays of size 8 are generated. Each element of these arrays
represents one of the 8 combinations of the last 3 bits of the estimates of
the transmit shift register. One array is called the distance array and the
other is called the estimates array.

The total accumulated distances of the four survivors in the state array are
copied into the distance array in pairs, the first distance going into the
first two locations of the distance array, the second distance going into
the third and fourth locations of the distance array, and so on.

The transmit shiftregister estimates of the four survivors in the state
array are now copied into the estimates array, again in pairs as before,
but as they are copied they are shifted left one bit. A 1 is added to the
lsb for the copies that go into odd locations in the estimates array.

Next, for each of the 8 locations, the 3 lsb of the entry in the estimates
array is used to look up a supposed transmit symbol, and the distance lookup
chart is used to find the distance of that symbol from the received symbol.
This distance is added to the distance in the corresponding position of the
distanc array. While this is going on, a record is kept of the smallest of
these new distances. We shall use this later.

Next, the distance array is considered as being split into a top half
and a bottom half. Each half-array thus contains 4 elements. A comparison is
made between the distances in the top half and the corresponding distances
in the bottom half. In each case the smaller distance (called the survivor)
is copied back into the corresponding distance box in the states array, and
at the same time, the corresponding estimate in the estimates array is
copied back into the corresponding estimate box in the states array. For
example, the distance[0] is compared with distance[4]. If distance[4] was
the smaller, then that is copied into states.distance[0] and estimates[4]
is copied into states.estimate[0]. This is done for all 4 pairs of distances
in the top and bottom halves of the distance array.

If the new distance update was actually done as described in the previous
paragraph, the accumulated distances would, in an infinitely long message,
increase towards infinity. Since we are only interested in comparing these
distances with each other, we can prevent arithmetic overflow by 'normalising'
the distances as they are copied back into the states array. This is done by
subtracting the smallest distance from all of them, so that one of the
distances in the next column is always zero. It can be shown that the
largest distance will never become infinite if this is done. In theory we
only need to do this whenever it becomes neccesary to prevent arithmetic
overflow, but if the channel speed and constraint length are low enough that
we are not in difficulties with the computation load, it is expedient to
normalise at each symbol, by subtracting the smallest accumulated distance
which was recorded in the distance array calculation above.

It remains to generate the actual decoded bit. This is done by looking at
bit M of each of the four estimates in the states.estimate[] array, where we
typically choose M to be 3-4 times the constraint length: the bigger we
make M the better the decoder will be at cleaning-up the occassional tricky
error, but it will have a longer decoding delay. If the decoder had
succeeded, they will all be identical, and this bit value can be fed to the
output. If they are not identical, we could emit an error symbol, but it is
also reasonable to take a vote of the 4 survivors, only emitting an error
symbol (or a random bit) if the vote comes out equal.

The only thing we haven't covered is how to handle the start of the message.
We should really treat this as a special case, since the number of
combinations to deal with is only two after the first received symbol, and
4 after the second, rather than 8 subsequently. We can save ourselves the
problem by preloading dummy distances and estimates into the states array
to represent the distances and estimates for a dummy known-correct message
that has been running since time = -infinity. Although we could calculate
what these should be and wire them in as constants, it is easy enough to
'prime' the decoder by feeding it a string of known-correct symbols (for
example a run of A's) before we receive the first symbol. If we do this for
M-times the constraint length, the distances and estimates converge to a
stable pattern, and when we start to feed real received symbols into it,
the decoded bits that start to emerge will start with a pre-amble of M
dummy bits which can be discarded.


The example so far has been of a constraint-length 3 code. There is some
advantage in increasing this, that is, to use a longer run of previous
transmitted bits in forming the transmitted symbol. The resulting code has
a better capability for correcting bursts of errors. However, the
computation load increases rapidly, and the resulting code needs a longer
decoding delay time. In the code proposed for PSK31, a constraint length of
5 is chosen. The modifications from the above k=3 example to achieve K=5 are:


a) There are 16 states in the states array instead of 4.
b) The parity polynomials to generate symbols are 10111 and 11001 instead
   of 101 and 111.
c) The temporary distance and estimates arrays are size 32 instead of 8.
d) The decode delay is chosen as 20, i.e. 4 times 5 instead of 4 times 3.
   At 31.25 baud this means a decoding delay time of 640mS.


Pascal code examples.

Encoding:
This function takes as it's argument, the last 5 bits of the transmit
shiftregister. The lsb of this register is the currently transmitted bit.
The function returns a number between 0 and 3, (i.e. A,B,C,D) which will be
mapped to one of 4 phase-shifts to be transmitted.

function encode(data:byte):byte;  { 5 lsb's of data are used}
var I:integer;
    B:byte;
begin
 encode:=symbols[data and 31]
end;


The 'symbols' array is the encode table, which is generated in the
following way:

for I:=0 to 31 do symbols[I]:=2*parity(I and poly1)+parity(I and poly2)

where poly1=$19 and poly2=$17


Decoding:
The 'decode' function takes as it's argument the received symbol, as a
number between 0 and 3 mapped from the received phase-shift, and returns
a decoded bit, as a number between 0 and 1. This bit is, of course, the
decoded bit corresponding to the symbol received 20 symbols ago.

The decode procedure calculates new distances and estimates in the states
from the old distances/estimates. The 'dists' and 'ests' arrays are the
temporary arrays referred to in the preceding discussion. The 'distance(A,B)'
function used is not shown, but it is basically a lookup table giving the
distances between the 4 pairs of points of the QPSK phase-plane, the same
as used for the k=3 example above.
 
type tstate=record                {the type declaration of a state}
             distance:byte;       {accumulated distance}
             estimate:longint     {enough for 20 bits of data}
            end;
var states:array[0..15] of tstate;   {the static states array}

function decodeV(rcvd:byte):byte;    {input a symbol, returns a bit}
var I,J:integer;
    dists:array[0..31] of byte;
    ests:array[0..31] of longint;
    select,min,vote:byte;
begin
 min:=255;              {preset minimum:=huge}
 for I:=0 to 31 do      {calc distances for all states and both data values}
 begin
  dists[I]:=states[I div 2].dist+distance(rcvd,symbols[I]);
                             {copy old distances and add current distances}
  if dists[I]<min then min:=dists[I];    {keep track of the smallest distance}
  ests[I]:=((states[I div 2].estimate) shl 1)+(I and 1)  {build new estimate}
 end;
 for I:=0 to 15 do         {for each new state..}
 begin                                  
  if dists[I]<dists[16+I] then select:=0 else select:=16; {select=top/bottom}
  states[I].dist:=dists[select+I]-min;      {update excess distances}
  states[I].estimate:=ests[select+I]        {copy the selected estimate}
 end;
 vote:=0;                      {take a vote of the bit20's}
 for I:=0 to 15 do if (states[I].estimate and (1 shl 20))>0 then inc(vote);
 if vote=8 then decode:=(random(2)) else decode:=byte(vote>8);
end;

To prime the trellis, just call decode(0) 20 times at the start of the
program and discard the returned value. Remember that the decoder has a
'pipeline' of 20 symbols.


This description has not yet defined the mapping between symbols and QPSK
phase-shifts. This needs to be determined in conjunction with the
requirements of compatability between the QPSK and BPSK modes of the PSK31
system, so that the idle sequence is the same. A transmitted string of zeros
(in varicode) maps in BPSK to a string of reversals. With the encode
polynomials chosen as above, a string of zero bits fed to the encoder causes
it to transmit continuous A's and a string of 1's fed to the encoder causes
it to transmit continuous C's. It is therefore convenient to map A to
give 0 degrees phaseshift (plain carrier) and C to give 180 degrees
phaseshift (reversals), with B and D to represent +/-90 degrees shift.

We can then emit continuous reversals by inverting the bit from the varicode
encoder as it goes into the convolutional encoder, and we can also transmit
BPSK-encoded Varicode by replacing the convolutional encoder by a simple
process which converts a 0 bit into a C and a 1 bit to an A. 

The receive side of the PSK31 process will need to be modified to use
the top two significant bits of the differential phase for QPSK instead of
just the top bit for BPSK. Note that we need to agree on the sense of
rotation for the +/-90 degree shifts. Let's define that a quaternary B gives
a 90-degree advance to the transmitted RF signal. In the transmit
modulator, an A does nothing, B advances by 90 degrees, C inverts, and D
retards by 90 degrees, and the resulting 'square' waves are then treated as
I and Q components which are raised-cosine shaped as for BPSK. To help
to check the software, we can note that a stream of continuous B's will be
continually advancing by 90 degrees every symbol, i.e. will appear shifted
HF by one quarter of the baudrate. This should be easy enough to check on
the bench. Depending on the implementation, the sense of the receive
differential phase signal may need to be reversed, as this would not have
been important in BPSK.

Note that we will need to specify that both stations use USB for the QPSK
mode (or at least use the same sideband), whereas we did not need to do so
with BPSK.

Bitsync
With BPSK, about half the symbols gave rise to a reversal, and the varicode
alphabet was carefully chosen to make sure that there were always enough
reversals to make it possible to use their positions to correct any drift
in the receiver bitclock.

In QPSK, only 1 in 4 symbols will give rise to a reversal, and when varicode
is put through the convolutional encoder, there is no guarantee that there
will be a basic minimum number of reversals to keep the bits in sync. Some
runs of repeated characters give no reversals at all. The two 0's gap
between characters in varicode that guarantees that there will always
be reversals between characters, doesn't guarantee that we will have any C's
at the output of the convolutional encoder. We clearly need to rethink the
bitsync.

The method suggested, and which is currently under test, involves using
the envelope amplitude of the signal. All data patterns except continuous
A's give rise to a dip in the amplitude. Even a hard-limited transmitter,
when received on a narrowband receiver, will exhibit these dips. The idea
therefore is to derive, after the main receiver filter, a signal
proportional to the logarithm of signal amplitude, by using the sum of
the squares of I and Q and then a simple logarithm function, and save
these levels in an array which is 32mS long. By summing the values in the
first half of the array and subtracting from the sum of the values in the
second half, a quantity will be generated which will contain a measure of
the extent to which the level in the second half of the 32mS cycle is
greater than that in the first half. This is fed back to correct the timing,
with a time-constant long enough to survive short-term interference but
quick enough to allow the clock to pull-in on a new signal. By using the
logarithm of the signal level, the gain and time-constant of this loop will
be independent of mean signal level. Note that a separate 32mS array for
storing amplitude samples seems to be neccesary since the feedback process
is unstable if the timing is changed during the process of sampling in such
a way that the two 'halves' of the 32mS cycle are of unequal duration.
