Parity and Error Correction

In this video from ITFreeTraining, I will be looking at parity and error-correcting RAM. Parity RAM can detect corruption in RAM while error-correcting RAM can detect and correct some errors. I will be looking at both these RAM types, the advantages of each and their disadvantages.

Show lesson content
Effects of RAM Errors
Before looking at how error detection and correction works, I will first look at what happens when there is corruption in memory. Memory corruption generally occurs as either random errors which have little to no effect or stop errors, otherwise known as blue screen of death. Let’s have a look at these in more detail.

Random errors
A random error may not even be noticeable or may appear as a random glitch. For example, some data on the computer may change, however that data is never used. For example, the data is part of some code that the computer never calls. When this occurs, you will not even know a problem has occurred.

In some cases, you may have a more noticeable glitch. For example, in this video the playback distorts and changes color. A small change in data would generally not have such a big effect, but we exaggerated the effect so you could see it better. In some cases, a small change in data could cause a program to crash or return an error.

Stop Error
In some cases, the computer may not be able to recover from the data change and a stop error will occur. For example, data changes in some critical piece of code may cause a stop error. These can be hard to determine if it was a memory error, because the error given can be completely random. Thus, I consider a stop error once or twice a year or so to be normal; if they happen more often, I would start looking into what is causing it. The next question is how often do these memory errors occur?

How Reliable is RAM?
It is difficult to obtain a figure since there are so many different RAM manufacturers and different conditions that affect how reliable RAM is. One study quotes a figure that 8% of memory modules are affected by errors each year.

Even with a figure as low as that, if you start adding additional memory modules, for example if you are running a small size data center, you are going to have at least a few memory errors a year. If you are running critical applications like banking operations, this can have a big impact. For example, a single bit changed in RAM can change a person’s bank account from being positive to a negative value. In business, this would not be acceptable and, thus, a number of systems are put in place to prevent this from occurring. In this video, I will look at how parity and error correction memory fixes these problems.

Parity RAM
Parity RAM works by adding an extra bit for every byte of data. The parity bit is the sum of all the bits in the byte. If there is an even number of “on” bits in the byte, the parity bit is zero. If there is an odd number of “on” bits in the byte, the parity bit is one.

Let’s consider an example. In this example there are four one bits which is an even number of bits. This makes the parity bit zero. If I now consider a second example, and here the number of one bits is three. This is an odd number of bits and thus makes the parity bit one.

Thus, the parity bit is easy to work out, one just needs to count the number of one bits. Next, I will have a look at how the parity bit is used to detect errors.

Error Detection
When the computer accesses a byte in memory the parity bit is checked. If the sum of bits does not equal the parity bit, the computer knows the memory has changed and thus is invalid. If I consider the following example, where the first bit has changed from a zero to a one. Now when the parity bit is checked, the computer will know the data has become corrupt.

Parity RAM has its limitations, so will not detect all errors. If I consider a second example, and in this example, the first two bits have changed from zeros to ones. When the one bits are added up, the parity bit is still correct. Thus, the change in data is not detected. Parity RAM is thus limited in that it will not detect all errors.

When the operating system detects a parity error, this will cause a stop error and thus the computer will restart. This may sound extreme, but if the data in the memory has become corrupt the computer can no longer consider it reliable. If you continue to use corrupt data, this can lead to more corruption and unpredictable results. If you consider a banking application, corrupt data can lead to invalid amounts being written to banking records. Therefore, you can understand why it is considered better to reboot the computer when this happens rather than continuing with corrupt data.

It would be preferable that when memory corruption occurs the computer would be able to fix it and keep going, thus meaning there is no need for a reboot. This leads us on to our next topic of error-correcting RAM.

Error-Correcting Code Memory (ECC)
Error-correcting code, otherwise known as ECC, has the ability to detect and correct memory errors. The standard use in most ECC RAM is Hamming code. There is another standard, but this only tends to be used in high tech devices like those used in satellites, so it is unlikely you will come across it.

When one bit has changed, ECC has the ability to detect and correct it. If two bits have changed, ECC can detect the change but cannot correct it. If more bits have changed, ECC may not detect it and will also not be able to correct it. You can see that it is not perfect, but is better than parity checking.

In the real world, memory errors are rare and most of the time only a single bit is changed, thus ECC can detect and correct it. ECC does come at a cost, since ECC has extra bits for each byte and this means more transistors. More transistors mean higher cost. The other disadvantage is that ECC is slower than non-ECC RAM. Thus ECC is a trade off between speed and reliability.

Check Motherboard is Compatible
Before deciding which RAM you are going to use, first check if your motherboard supports it. Server motherboards tend to support ECC or parity RAM. However, you will find a motherboard generally will only support one or the other. Generally, ECC is more commonly supported.

Desktop motherboards tend not to support ECC or parity. For a home PC, generally a home user wants to pay less for RAM, get more of it and for it to be faster. Thus, ECC or parity RAM is not a good choice. However, motherboards designed for high end work stations or desktops may support ECC or parity RAM. The important point to remember before buying RAM is to ensure that the motherboard you are putting it in supports it.

This concludes this video. I hope this video has helped you understand the differences between RAM types and when they might get used. Until the next video, I would like to thank you for watching.

“RAM parity” https://en.wikipedia.org/wiki/RAM_parity
“ECC memory” https://en.wikipedia.org/wiki/ECC_memory
“DRAM Errors in the Wild: A Large-Scale Field Study” http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf

Trainer: Austin Mason http://ITFreeTraining.com
Voice Talent: HP Lewis http://hplewis.com
Quality Assurance: Brett Batson http://www.pbb-proofreading.uk

Back to: CompTIA A+ > Installing, Configuring, and Troubleshooting Storage Devices