Intro to Computer Systems

Error Detection and Correction

When transferring data within a computer, for example when writing data to disk, or from one computer to another using a communication line, errors can occur due to noisy lines or simply malfunction. These errors may cause some 1s to become 0s and some 0s to become 1s, as in:

"ABC" = 1000001 1000010 1000011

when transmitted to another computer or stored may become:

1000001 1000010 1001011 = "ABK"

In data communications if an error is detected it may be xed by requesting retransmission, but when writing data to disk that this may be more difficult.

To solve that problem, extra redundant bits (also called check bits) are added to the original information before transmission. As a rst example, let's add an extra parity bit to a 7-bit ASCII character to detect an error. Depending on the number of 1s that we choose, even or odd parity may be used; there are no advantages of one over the
other, so they are used indistinctly.

Even Parity

For ASCII characters, bit 7 (that is, the eight-bit, the most significant bit) is set so that the total number of 1s in a character is even, as in:

'A' = 100 0001 Parity = 0 Code = 0100 0001
Parity Bit = 0

'C' = 100 0011 Parity = 1 Code = 1100 0011
Parity Bit = 1

Odd Parity

The parity bit is set so that the total number of 1s is odd, as in:

'A' = 100 0001 Parity = 1 Code = 1100 0001
Parity bit = 1

If we consider for example 'A' with even parity 0100 0001, any 1-bit error will change the number of 1s and 0s in the symbol, and thus, the parity of the symbol. Say the error is on bit 3, so the symbol received is 0100 1001; this error is easily detected because the number of 1s is 3 (odd) instead of an even number.

A single parity bit can only detect an error, it cannot be used to correct errors; we need more redundant bits if we want to also correct errors.

Error Correcting Codes

Simple parity is a particular example of an error detecting code. We have just mentioned that although using simple parity it is possible to detect errors, it is not possible to correct them.

Say we want to send a message: Yes = 1, No = 0. If there is an error in transmission, the receiver wouldn't know because they would not be able to detect a 0 changed into a 1, or a 1 into a 0. If instead we use an extra parity bit, and we agree that the number of 0s must be even, for example, to send a 0 we send 00 and to send a 1 we send 11. If there is a one-bit error, the receiver will get 01 or 10, and they will know that there has been an error. However, they will not be able to find out what was the original message.

If instead we use 3 bits to send our message, our messages are always either No = 000 and Yes = 111. If we assume that there is only a one-bit error and we receive 001, what symbol was originally sent? Obviously 000, because 111 has more than one bit difference with 001, so that cannot be the source of a one-bit error. Thus, since there are 3 bits difference between 000 and 111, we can nd the mistake and correct it.

An error correcting code is able to detect and correct bit errors in a (binary) code. In 7-bit ASCII code, a 1-bit change to a valid symbol gives another valid symbol. It is then not possible to detect or correct a one-bit error, because a 1-bit error is still a legal symbol. Adding a parity bit made it possible to detect, but not correct, an error.
To detect or correct errors, codes use extra bits to increase the distance between valid symbols, as in:

1111 1000 -> 111 1001 -> 1111 1011 -> 1111 1111
valid....... invalid.... invalid..... valid

Hamming Distance

In a code, the number of bit changes between two valid symbols of the code is called Hamming Distance. In the example, the Hamming Distance has been made constantly 3, and therefore 3 bits must change to arrive at the next valid symbol. Thus, if one bit error is detected, it may be corrected by selecting the closest valid symbol.

To detect up to K bit errors, the (minimum) Hamming distance between two valid code words must be K+1. For example, simple parity makes 2 the distance between symbols and we have seen that it detects a one-bit error in ASCII. To correct up to K bit errors, the Hamming distance between two valid code words must be greater than or equal to 2K+1.

Hence, to detect and correct 1 error requires a Hamming distance of 3.