VARICODE.DOC
Description of the variable-length coding used in the 31.25 baud
BPSK/QPSK system PSK31.
by Peter Martinez G3PLX
peter.martinez@btinternet.com

The normal asynchronous ASCII coding used on the original version of
this system by SP9VRC, and indeed the asynchronous system used for
transmission of RTTY for the last 50 years, uses one start-bit, a fixed
number of data-bits, and one or more stop-bits. The start-bit is always
the opposite polarity to that of the stop-bit. When no traffic is being
sent the signal sits in stop polarity. This enables the receiver to
start decoding as it receives the edge between the stop-signal and the
start-bit, and drop back to idle when the stop bit arrives.

One disadvantage of this process is that if, during a long run of traffic,
an error occurs in either a stop-bit or a start-bit, the receiver will
lose synchronisation, and may take some time to get back into sync,
depending on the pattern of following characters: in some situations of
repeated characters the receiver can even stay in a false sync. for as
long as the repeated character pattern persists.

The code used in PSK31 overcomes this problem by signalling the gap between
one character and the next, not by means of the stop-start sequence which
can occur in the middle of a character, but by designing the code carefully
so that the sequence which marks the boundary between characters can never
be mimicked inside a character. There can therefore never be a cascade of
errors if the code loses synchronisation. This idea also has another
advantage, in that the character-codes no longer need to be a fixed length.
If, as in normal amateur radio contacts, the traffic being sent consists of
plain language, there are some characters which occur more often that others
and there are some which may hardly ever be used. In morse code this is used
to advantage by using short codes for the common letters and longer codes
for less-common ones. In such a variable-length code, the average character
rate is faster than in a code where all the characters are the same length.
Or, stated in a different way, a variable-length code can be transmitted at
a lower bit-rate, and therefore a lower bandwidth, and hence suffer less
errors. The code used in PSK, called Varicode, works like this:


1. All characters are separated from each other by two consecutive 0 bits.
2. No character contains more than one consecutive 0 bit.

In the same way that it is obvious that all morse-code characters begin and
end with a "keydown" element, all characters in Varicode must begin and end
with a 1, and the "00" between characters is equivalent to the letterspace
in morsecode.

With such a code, the receiver detects the end of one code and the
beginning of the next by detecting the occurence of a 00 pattern, and
since this pattern never occurs inside a character, the "loss of sync"
problem that gives trouble with asynchronous systems can never occur.

The variable-length coding used in PSK31 was chosen by collecting
a large volume of English language ASCII text files and analysing them to
establish the occurrence-frequency of each of the 128 ASCII characters.
Next a list was made of all the binary patterns that meet the above rules,
namely that each pattern must start and end with a 1, and must not contain
more than 1 zero in a row. This list was generated by computer, starting
at the shortest. The list was stopped when 128 patterns had been found.
Next the list of ASCII codes, in occurence-frequency order was matched to
the list of binary patterns, in length order, so that the most frequently-
occuring ASCII codes were matched to the shortest patterns. To see how
well this would perform, a simple calculation was made to predict the
average number of bits in typical plain language text transmitted by this
code, taking into account the 00 gap between characters. The result was
between 6 and 7 bits per character. This compares very favourably with 9
bits per character for the asynchronous system. The shortest character is
the "space code", transmitted as a single 1. The longest is 10 bits long,
or rather 12 bits since we must include the 00 separator.

In order to make sure that the receiver can regenerate the symbol timing,
the logic zero state in Varicode is mapped to the "polarity reversal" state
in the BPSK and QPSK modulation. In this way, when idling, there is a
continuous modulation of the carrier and this amplitude modulation at the
bit-rate is used in the receiver to keep in sync. The worst-case is
transmission of a repeated "!" character in BPSK mode, where there will be a
9-bit period of unmodulated carrier followed by two reversals. This is
enough to keep the receiver in sync, and in any case this is not a common
character! In BPSK, every reversal gives a little boost to the bit-sync.
process. In QPSK, even the +90 and -90 degree phase-shifts contain some
amplitude modulation and there are no characters with long runs of 1's.

The actual alphabet is given in the file ALPHABET.DAT, which is a plain
text file giving the varicode pattern as 0's and 1's, one per line,
in ascending order of ASCII code.

