What is RAID?
RAID is an acronym for either ‘Redundant Array of Inexpensive Disks’ or ‘Redundant Array of Independent Disks,’ and represents a configuration where multiple storage devices are interconnected to function collectively as a single unit. This setup enhances data storage, reliability, and performance.
RAID configurations are commonly employed in server environments and cloud storage solutions due to their reliability and enhanced performance. While the advent of high-performance Solid-State-Drives has reduced its prevalence in typical workstations, RAID still gets used in some professional workstation setups.
When working with traditional RAID, your storage devices need to be identical. This means the same size, same speed and preferably from the same manufacturer. Since RAID uses all the storage devices in the RAID at the same time, if one is different to the others, it will slow the RAID down to the slowest device. This is kind of like walking with a group of people. You can only move as fast as the slowest person.
RAIDs work at the block level. Thus, the data is split across the devices in blocks. If you have a storage device that is larger than the others, the RAID won’t be able to split the data over the storage devices evenly. Thus, any space larger than the smallest storage device will not be able to be used. Thus, even though it is possible, depending on what RAID you are using, to use different storage devices, it is recommended that you don’t.
When multiple storage devices are configured in a RAID setup, performance can potentially be enhanced. However, as we will explore, the degree of performance improvement depends on the specific RAID configuration employed. Certain RAID levels may lead to slower performance; however, the benefits of using RAID, such as improved data redundancy and reliability, often outweigh these limitations.
RAID can also offer redundancy. That is, RAID can still operate even if a storage device fails. Some RAIDs will have redundancy while others will not.
To identify and differentiate the various features of RAID configurations, they are categorized by levels, such as RAID 0, RAID 1 and RAID 5. While a wide array of RAID levels exist, this video will focus on the more popular ones which are also those CompTIA requires you to know.
RAID 0 (Striping)
The first RAID that I will look at is RAID 0, also known as striping. This storage technique divides data into blocks and spreads them equally over a number of drives. Thus, in order to use RAID 0, at least two storage devices are required.
This configuration can enhance performance because multiple storage devices are able to service requests simultaneously. However, it’s important to note that this setup does not offer any redundancy. This means that if a single hard disk in the RAID array fails, it could result in the loss of all the data stored in the array.
RAID 0 is typically reserved for specialized applications where performance takes precedence over data integrity. An ideal use case is a non-critical cache store, such as a web server caching web traffic. In this scenario, delivering content rapidly over the network is crucial. However, if the cached data is lost, it’s not a significant issue as the web server can simply retrieve it again from the source.
To help remember what RAID 0 supports, I use the memory jogger, “RAID 0 offers zero redundancy”. RAID 0 is the only RAID that we will look at that does not support redundancy. Although there are other RAID levels like this, they are not commonly used and not on the exam objectives.
To understand better how it works, let’s consider an example. In this example I have a single file which has been broken into six blocks.
Let’s consider that we are writing the file to the RAID. The file is being copied from another location to the RAID and currently only the first block has been sent. When this occurs, the block will be written to one of the storage devices.
So far, this is the same as we would expect for a non-RAID storage device. Now let’s consider that the next block is ready to be written. This time, the block will be written to the next storage device.
Up until now, the data blocks have been written sequentially, as that is all the data the operating system has supplied so far. Let’s assume that the remaining four blocks are provided simultaneously. Many high-end RAID controllers possess the capability to access every storage device in the array concurrently. Therefore, in this scenario, the remaining four blocks would be written across all storage devices simultaneously, optimizing efficiency and speed.
This illustrates how RAID 0 achieves its high performance. The greater the number of storage devices in the RAID, the more can be accessed in parallel, thereby boosting performance. Additionally, a RAID controller might simultaneously read from and write to different storage devices, depending on the current access requirements. However, it’s important to remember that the overall performance is largely dependent upon the quality of the RAID controller. With less expensive controllers, you might not experience significant performance gains. In fact, some budget RAID controllers may only access one storage device at a time, offering no performance improvement. As is often the case in computing, the value you receive tends to correspond with your investment. Now, let’s proceed to examine the next level of RAID.
RAID 1 (Mirror)
The next RAID level I will look at is RAID 1, commonly referred to as mirroring. RAID 1 maintains two identical copies of the data using two storage devices. While this approach reduces the usable storage space by half, effectively doubling the cost, it adds redundancy by adding a second copy.
Having two storage devices may improve read performance but does not improve write performance. There is no write performance improvement because both copies of the data need to be updated on both storage devices.
The potential improvement in read performance with RAID 1 depends on the nature of the I/O operations directed at the RAID controller, as well as the type of storage being used. In the case of hard disk drives, the read head must physically move to the data’s location. If the data is all in one spot, the hard disk can perform a sequential read, which is quite efficient. However, the presence of a second hard disk doesn’t speed up this process due to the delays in repositioning the read head. On the other hand, random access performance might see an improvement, as each hard disk is capable of independently accessing different locations, thereby enhancing the overall access speed for non-sequential data.
Random read performance with RAID 1 depends on the RAID controller. Having a second device means that both devices could be independently reading different data under the right circumstances and the right RAID controller. However, cheaper RAID controllers may only be able to access one device at once.
When it comes to Solid-State-Drives, the absence of a mechanical read head eliminates the need for physical movement to access data. As a result, reading sequential data can be faster when two drives are operating independently. Again, any potential speed improvement hinges on the capabilities of the RAID controller. A performance enhancement, if any, is largely determined by the efficiency and quality of the RAID controller in managing these simultaneous operations.
Let’s consider our file example again to see how mirrored drives manage data. When writing data to the RAID array, you will notice the data gets written on both storage devices at the same time. The RAID always needs to make sure that both copies of the data on both storage devices are updated at the same time.
RAID 1 supports one storage device failure. If one of the storage devices does fail, it is important to replace it as soon as possible. Once replaced, the data will need to be copied from the existing storage device. While this is occurring, your data is still at risk.
In a RAID 1 setup, a second storage device failure will result in total data loss. Therefore, the moment one storage device fails, your data becomes vulnerable until the defective device is replaced and the data is reconstructed. RAID 1 configurations are frequently utilized for operating system drives, where redundancy is prioritized over performance. This preference is due to the mostly static nature of many operating system files, which undergo minimal changes. Once the operating system is loaded into memory, the reliance on storage access decreases. Dynamic files, such as page files, can be allocated to other, faster storage mediums, while programs can also be stored separately if needed. The primary advantage of using RAID 1 for operating systems is to ensure consistent bootability; it offers robust redundancy but limited performance enhancements.
So far, we have looked at performance with RAID 0 and redundancy with RAID 1. I will next look at a RAID solution that has both.
RAID 5 (Striped With Distributed Parity)
RAID 5, also known as Striped with Distributed Parity, combines redundancy and performance by striping data across multiple drives and adding a calculated parity block for fault tolerance. This setup requires at least three drives, with one drive “lost” to parity for redundancy. RAID 5 can survive a single drive failure, allowing data to be rebuilt on the remaining drives. However, a second failure (occurring before the rebuild is completed) results in complete data loss. While popular in the past, RAID 5 with its fast read performance, has become less common due to its slow write speeds and lengthy rebuild times on large drives, increasing the risk of data loss with multiple failures. We’ll explore alternative solutions for data protection later in the video.
In RAID 5, as the space of one storage device is allocated for parity, adding more drives increases the proportion of usable space relative to that consumed by parity. However, this comes with a trade-off: More drives in the array increases the likelihood of a drive failure. Therefore, there’s a balancing act between optimizing the ratio of usable space to parity and managing the heightened risk of failure. For instance, using three one-terabyte drives in RAID 5 results in 66% of the total capacity being usable for data storage. In contrast, a setup with ten one-terabyte drives would yield 90% usable space, and demonstrates the efficiency gains with larger arrays.
As RAID 5 uses parity, all storage devices are updated when data is written. However, when data is read only, the storage devices with the data need to be read. Thus RAID 5 gives fast read speeds but slow write speed unless you are writing a lot of data at once. If you are writing large files at once, all the storage devices can work together to write all the data at the same time. However, if you are writing small files or randomly writing data, the performance is not so good since all storage devices need to be accessed even if you are writing the smallest unit of data like a single sector. Let’s consider our file example to get a better idea of how it works.
Our file has six blocks of data. So far, the operating system has only sent one block to the RAID controller. Maybe the file is delayed because it is being downloaded from another location and the download has stalled.
The RAID will write this data to the server, but you will notice that all storage devices will need to be accessed in order to write the data. Even though there is no data that is currently being used on two of the storage devices, it will still need to be read in order to calculate the parity correctly.
The operating system has now sent the next two blocks to the RAID controller. Thus, they can now be written to the RAID. Notice, once again all the storage devices need to be accessed in order to write the data.
You might question why the first storage device is accessed when its data hasn’t changed. The reason lies in the necessity of updating parity, which requires accessing all storage devices in the array. Consequently, the first storage device is read, while the others are being written to. However, if the RAID controller already has the relevant data in its cache, it can bypass this read operation. We’ll delve more into this aspect shortly.
Now let’s consider that the rest of the blocks for the file have been sent to the RAID controller and are ready to be written to the array. Notice once again, all storage devices need to be accessed in order to write the data. Since they were accessed all at once, the write is very efficient. When writing large files, you will probably get some good write performance, but in all other circumstances, the write performance won’t be as good.
You will notice that the parity has moved to another storage device. Although the manufacturer is free to layout the data on the storage devices in any way they want, most will distribute the parity across all the storage devices. Remember how I said that if data is already present in the cache, it doesn’t require re-reading during a write operation. Distributing the parity across the RAID in this manner effectively disperses the read and write operations throughout the array, thus enhancing overall efficiency.
RAID 5 was a popular choice in the past, but its usage has diminished in recent times. The primary reasons for this shift are its relatively slow write performance and the prolonged rebuild times associated with large capacity hard disks. These factors contribute to an increased risk of RAID failure due to a single disk malfunction. Next, let’s explore the alternatives that are increasingly being adopted in place of RAID 5.
RAID 10 (Stripe of Mirrors)
Numerous organizations, especially those requiring large-scale storage and cloud-based storage solutions, are now opting for RAID 10. Also known as RAID 1+0 or a stripe of mirrors. RAID 10 is a hybrid RAID which combines RAID 1 and RAID 0. This configuration effectively halves the usable storage space, or alternatively, it can be viewed as doubling the cost. Let’s delve into the workings of RAID 10 to understand it better.
To assist understanding how it works, I will first look at the RAID 0 part and look at writing my example file to the RAID. I will look at the RAID 1 part in a moment.
Let’s consider that two blocks are ready to be written to the RAID. Since we are using RAID 0, each storage device can work independently of the other. Thus, both blocks can be written at the same time. Since there is no parity information, the writes are very fast; however, so far, we don’t have any redundancy.
To add redundancy, we double the number of storage devices and thus double the cost. These extras are a mirror of the other storage devices, thus are RAID 1. So, when our two blocks are written to the storage devices, they are also written to the second copy in the mirror.
Now let’s have a look at what happens when we write the rest of the file to the RAID. You will notice that all the storage devices are working at the same time. Each storage device can work in their pairs independently of each other. Each pair needs to make sure that the data is mirrored across each storage device.
So far, all of our files have been written except the last block because the storage devices were busy writing block 3. Once block 3 has been written, it can write block 6. While this block is being written, the other storage devices are free to read or write data.
RAID 10 is designed to tolerate multiple failures within the array. Data loss only occurs if both storage devices in the same mirrored pair fail before a rebuild is completed. Unlike RAID 5, the addition of more storage devices in a RAID 10 setup actually decreases the likelihood of losing the entire array due to an additional failure. For instance, in an array with 20 storage devices, forming ten pairs, the failure of one device leaves a 1-in-19 chance of a subsequent failure causing data loss, assuming it occurs in the same pair. Therefore, while adding more devices does increase the overall chance of a failure, the probability of a second failure in the same pair remains relatively low in large RAID configurations. This resilience is a key reason why RAID 10 is favored in extensive storage environments and cloud-based systems.
Now that we have had a look at the common RAID levels used, let’s have a look at how they are implemented.
Hardware and Software RAID
RAID is implemented in hardware and software. Hardware RAID is generally implemented using specialized RAID controllers, but nowadays most motherboards include RAID, which is considered to be hardware RAID. More on that later in the video.
Hardware RAID works transparently to the operating system, so the operating system will see the RAID as one storage device, just as it would a physically connected storage device. In contrast, software RAID requires the operating system to have booted in order to access data on the RAID; the exception to this is mirrored drives or RAID 1. Since mirroring is an exact copy of the second storage device, the operating system can access the storage device to boot. Once it has completed initial startup, it will have loaded enough of the operating system to enable mirroring.
For striping, as well as other RAID configurations, a software RAID needs to load enough of the operating system in order to read the RAID. However, this scenario presents a paradoxical challenge: the computer is unable to access the RAID array until enough of the operating system has successfully booted. This creates a circular dependency, meaning certain software RAID configurations can’t be used to boot the computer.
Hardware RAID is generally faster than software RAID. When it comes to RAID, you get what you pay for. A decent, dedicated RAID card will generally perform faster than the RAID included with your motherboard.
Hardware RAID typically incurs a higher cost, as software RAID is often included for free with the operating system. However, it is worth noting that many modern motherboards now come equipped with RAID capabilities at no extra cost. Despite the cost difference, hardware RAID usually offers a broader range of features. This can include a greater variety of RAID levels than those available to the operating system. Additionally, hardware RAID can provide advanced functions, for example, having a standby storage device that can automatically be used to replace a failed one.
The disadvantage of hardware RAID is that it may need extra device drivers to function properly. These drivers must be present during the operating system installation. While some operating systems come with pre-included third-party drivers, there are instances where the RAID storage device remains invisible to the OS setup until these drivers are installed.
Software RAID uses the operating system device drivers, so it does not require additional device drivers to be installed. However, you are limited to which RAID configurations you can use to boot the operating system.
Hardware RAID generally offers better hot-swap support. If you have a failed storage device, it is generally a simple matter to pull it out and put in a new one. Nowadays software RAID also has hot-swapping using SATA. SATA hot-swapping will need to be enabled in the computer’s setup and must also be supported by the motherboard.
Now let’s have a closer look at how RAID works.
Real Hardware RAID vs Fake RAID
For the exam, you won’t need to know any of this, but I present this information to help you understand what you are buying when you purchase a RAID card. A true hardware RAID uses hardware to improve performance. Motherboard RAID often runs in software instead. When the computer boots up, RAID functions are handled by software loaded into memory, rather than by any additional hardware components. This is still called hardware RAID although it is software based and does not use extra hardware to improve performance.
Some people will refer to this as fake RAID since no extra hardware is used. I personally believe it is still technically hardware RAID since the operating system sees it as physical storage and treats it the same as any other physically connected storage device. However, it is not true hardware RAID since no additional hardware is used.
In the real world, hardware RAID is used to describe any RAID solution that is transparent to the operating system. That is, it appears as physical storage to the operating system. Software RAID is any solution that is implemented by the operating system or third-party software running on the operating system. When deciding on which hardware RAID to use, just be aware that some hardware solutions are actually using software and thus you won’t get the same performance as when using dedicated hardware.
Dedicated hardware RAID is found on the RAID card. On high-performance RAID controllers, there will generally be a heatsink which gives it away that is using hardware for RAID functions. Companies like LSI and Adaptec generally only make RAID cards that use hardware.
Not all RAID solutions advertised as hardware RAID use hardware. RAID cards like the one shown are in fact software RAID. It is hard to tell sometimes, but when I looked up the specifications for the processing chip on this expansion card, it is software RAID. If you are unsure, generally the cheap price gives it away or it is made by a company that is not well known for making RAID cards. There are not that many companies that make good quality RAID cards.
Intel Rapid Storage Technology is considered to be software RAID. This is because the CPU does calculations for the RAID, not dedicated hardware. Although it could be argued that certain CPUs have features to assist with processing, it is still putting extra load on the CPU and no dedicated hardware is being used.
In the real world, when it comes to RAID, you get what you pay for. For some applications the motherboard RAID may meet your needs, although for a server or professional workstation, you should look at true hardware RAID.
Let’s have a look at how to set up RAID.
Demonstration
In this demonstration I have a RAID controller card installed in the computer. This is a hardware RAID connected to four hard disks. As an add-on component, a hardware RAID card comes with its own configuration tool, independent of the operating system. This allows for the RAID setup to be completed prior to the installation of the operating system. Once the RAID configuration is finalized, you have the option to install your operating system directly onto it.
For the A+ exam, you only really need to have an understanding of how RAID works. You won’t be expected to configure one; The demonstration is to give you an understanding of how to configure one if you need to.
To start using the RAID controller on this computer, there is a setting that I need to change in the computer’s setup. To access it, I will start the computer and press the delete key. To enter the setup on your computer you may need to press a different key. Usually when the computer starts up it will tell you the key that you need to press. If it does not, you may need to consult the manual.
If you are lucky, the card will work without any changes in the computer’ setup. The computer uses UEFI and the RAID controller card uses BIOS. To allow them to work together, I need to select “Settings” and select the option “Advanced”.
Depending on your computer, you might need to modify various settings, which could be located in different sections of the system. For this computer, I need to navigate to and select the “Windows OS Configuration” option to make the necessary adjustments.
This setting allows me to change the setting “BIOS CSM/UEFI Mode”. This is currently set to UEFI so it won’t allow the RAID controller expansion card to operate. So, I need to change it to CSM. CSM stands for Compatibility Support Module. Essentially, this allows UEFI to work with old features designed for BIOS. Keep in mind that enabling CSM may disable some of the newer features of UEFI. For Windows 11, these newer UEFI features are required, so I would need to purchase a new expansion card if I wanted to use Windows 11. Let’s now consider how the computer communicates with the RAID controller card.
On this computer, the motherboard has a UEFI chip which provides the basic software that runs the low-level functions of the computer. The RAID controller has a BIOS chip on it. Effectively it has its own BIOS. In order to use both, the UEFI daisy chains to the RAID BIOS.
If the computer was running BIOS, the process would be the same. In order for this process to work, the computer’s UEFI or BIOS needs to have a link to the BIOS on the RAID controller. The process is the same if the computer uses UEFI and the RAID controller uses UEFI. Since it is only a link, there won’t be many settings you can configure in the computer’s setup, you are able to enable it and decide where it appears in the boot order for the computer.
I will now save the settings. Once the settings have been saved the computer will reboot. When the computer starts up, you will notice the RAID controller screen will be shown first. When working with devices that have their own BIOS or UEFI, if the controller shows a screen at startup, it may appear before or after the computer’s boot up screen.
You will notice the reference to the BIOS. Just like a regular BIOS, the BIOS on the RAID controller card can be upgraded. The RAID controller will now start initializing. Essentially, that is starting up and checking all the devices are attached. You will notice that there is a message saying, “Press control C to start the configuration utility”. Different RAID controllers will have a different key stroke sequence to enter the configuration utility.
I will now press control plus C to enter the configuration utility. If there are multiple identical RAID controllers installed in the same computer, this tool will allow you to select the one you want to configure. There is only one RAID controller in this computer, so I will select it.
On this screen, you can configure the RAID controller. I will only be looking at how to create a new RAID. Different RAID controllers will have different configuration tools. A lot of them will also have software that can be installed on the operating system to manage and monitor the RAID controller.
To create a RAID, I will select the option “RAID Properties”. Once selected, this will show the four hard disks that are connected to the RAID controller. I will now select the first two hard disks to make part of a RAID.
Once the hard disks are selected, I will press C to create a volume. To create the volume, I just need to save the changes. It is important to understand that the hard disks are under the control of the RAID controller. Thus, an operating system running on the computer won’t see them. It will instead see a single volume. This volume will appear to the operating system as a physical storage device just as a hard disk would. It will not be able to tell the difference.
Once the volume has been created, I will be taken back to the main menu. From here, I will exit out and restart the computer. The computer will once again initialize the RAID controller. Once this is complete, notice that the volume is currently resyncing. Although you will be able to use a volume while it is resyncing, it is important to understand that until the resyncing is complete, the volume won’t have any redundancy.
You will notice that the current volumes and drives available are shown. This will also show any failed devices or volumes. It is important to check this or have software that checks it. If you have a hard disk fail and don’t replace it, another failure will result in loss of data. Thus, when a hard disk fails, you want to replace it as soon as you can.
I will now enter the setup so I can have a look at how to configure the computer to boot from this volume. To do this, I will select “Settings” and then select the menu option “Boot”. This will allow me to change which storage devices the computer will boot from.
To set the first boot device, normally I would select the first boot option. However, the volume I created on the RAID card is not shown. To configure it as a boot volume, I need to exit out of here and go down to the bottom option, “Hard Disk Drive BBS Priorities”.
Once I select this option, I can next select the top boot option. You will notice that the Solid-State-Drives I have installed in the computer and the volume I created are shown. Since the hard disks are not allocated to a RAID, the RAID controller will allow direct access to them. Some RAID controllers may allow this, others will not.
I can now select the volume, and the computer will attempt to boot from this volume first. When using RAID devices like this one, you may have to hunt around to find the right settings. Since it is an add-on card, the computer setup will configure this differently from the storage that is directly attached to the computer.
RAID has changed a lot since it was first created, so let’s have a look at how it is implemented in the real world.
In The Real World
In the real world, RAID configurations are predominantly utilized in servers and professional workstations, which typically come with hardware RAID support. However, it is important to note that even though a server or workstation may support it, additional hardware like expansion cards may need to be purchased in order to enable the functionality. Given that servers and professional workstations are higher-priced, they are more likely to offer RAID support. After all, you get what you pay for.
When purchasing standalone devices, don’t assume it is hardware RAID. It is not uncommon for these devices to be running a Linux operating system and thus using software RAID.
If you see the device has a lot of features above and beyond standard RAID features, such as changing the level of redundancy of the data or allowing the RAID to expand in size as data is added to it, it most likely is running an OS rather than hardware. Features like that are generally not implemented in hardware RAID.
Lastly, motherboard RAID is probably software RAID. Why? Well, you get what you pay for. The important takeaway from here is, don’t assume that if you set up RAID, you are going to get excellent performance. A good hardware RAID will give you good performance. However, software generally won’t give as good performance.
End Screen
That concludes this video on RAID. I hope you have found this video informative. Until the next video from us, I would like to thank you for watching.
References
“The Official CompTIA A+ Core Study Guide (Exam 220-1101)” pages 56 to 60
“Mike Myers All in One A+ Certification Exam Guide 220-1101 & 220-1102” pages 287 to 301
“Picture: Kittens” https://unsplash.com/photos/uePn9YCTCY0
Credits
Trainer: Austin Mason http://ITFreeTraining.com
Voice Talent: HP Lewis http://hplewis.com
Quality Assurance: Brett Batson http://www.pbb-proofreading.uk