Formatting
To prepare a hard disk for use, it needs to be formatted. Although I will focus on hard disks for this video, the same process applies to other storage devices including solid-state drives. Formatting is the process of configuring and preparing the hard disk for data.
There are essentially two categories of formatting hard disks. The first is low-level formatting. In the old days it was possible to low-level format a hard disk. This was sometimes required if the motors controlling the hard disk were not moving to the same location as they did previously, or the platters on the hard disk changed shape. In old hard disks, the platters were more susceptible to changing size due to factors such as heat.
Nowadays, hard disks are more reliable, so they don’t have these problems and the data is very tightly packed. So packed in fact, there is no room for the data to move around once it has been put in place. For these reasons, low-level formatting is done in the factory using specialized equipment. You may see software that is advertised as a low-level formatting tool; however, this software essentially just erases all the data on the drive, it doesn’t change the underlying structure of the hard disk, which is what low-level formatting does.
To understand a bit better what low-level formatting does, consider that you have a hard disk platter. In order to be able to store data on the drive, you first need to organize how you are going to lay the data out on the platter. At the factory, the platter will be divided up into sectors. Data is written on the platter so the hard disk can tell where these sectors start and where they stop. Once these sectors are written in the factory, they are fixed and can’t be changed.
These sectors form the basic building blocks for storing data on the hard disk. A single sector is the smallest unit or block of data that can be stored on the hard disk. If the hard disk wants to access the data, it must read the whole block. It is possible for an operating system to divide the sectors into smaller parts, but if it does this, it will reduce the performance of the hard disk. This is because, if the sectors are subdivided, the hard disk will still be required to access the whole sector, even if only a segment of the sector is required. This results in more reads and writes to the hard disk.
In order for an operating system to store data on a hard disk, it needs some way to keep track of how data is stored using these sectors. To do this, a high-level format is needed. The high-level format will determine the allocation unit that will be used for that hard disk. In older operating systems this allocation unit may be referred to as a cluster.
An allocation unit will determine how many sectors or sub-sectors will form one allocation unit. In the case of modern hard disks, the default sector size is 4k. In a lot of cases, the operating system will use an allocation unit of 4k. It makes sense to keep these the same size for performance reasons. There are times when you may want to change the default. Reducing the allocation unit reduces the amount of wasted space per file on the hard disk. If you have a lot of small files, reducing the allocation unit will reduce the amount of wasted space per file. Having a smaller allocation unit does increase the amount of data required to keep track of the allocation units, since there are more of them. Later in the video I will look into this in more detail.
When looking at the data on a hard disk, there is also what is referred to as a track. In the old days, specifications like tracks were important in understanding where data was physically being stored on the hard disk. Nowadays, hard disks are accessed by a single number referred to as Logical Block Addressing or LBA. Essentially the physical sectors on the hard disk are allocated a number.
The operating system accesses blocks on the hard disk using this number. The hard disk uses a table to translate the logical block address to the physical sector. Since it is called Logical Block Addressing, you may hear this referred to as the block number. Since both are referring to the smallest unit of data on the hard disk, you may hear the terms sectors and blocks used interchangeably.
If a sector goes bad, the hard disk moves the sector to a backup sector on the hard disk. All hard disks, nowadays, have a small amount of backup sectors for the purpose of replacing bad sectors on the hard disk. The process is transparent to the operating system, so it will still use the same block number to access the data for that sector, even though the sector has moved.
In the old days, the tracks were important because there was no Logical Block Addressing. The operating system had to keep a record of where data was physically stored on the hard disk. Nowadays, it does not need to worry about that, it simply needs to remember the block number. Thus, you won’t find the term track referred to that much nowadays.
When performing a format of a hard disk, there are generally two options for the format. The first is a quick format. This format, as the name suggests, is quite fast. It’s fast because it writes only configuration information that tells the hard disk where data is stored. Any existing data on the hard disk will be left untouched. Since the data is still on the hard disk, it is possible, using data recovery tools, to get this data back.
Another thing to consider with a quick format is, since it does not write to all the sectors that are in use on the hard drive, these sectors are not tested to see if they are working correctly. If there is a problem with a sector, you won’t know about it until you attempt to use one of the damaged sectors. The hard disk will hopefully move the data from there to a backup sector, but you may lose data in the process. If you have too many bad sectors, the hard disk will run out of backup sectors to move data to.
If you want to ensure all the data on the hard disk is removed, a full format is recommended. A full format will erase all the data; however, it does take a lot longer to complete. Since it is erasing all the data on the hard disk, it is also testing all the sectors currently in use on the hard disk to make sure that they are working correctly.
There is one other thing that we will need to consider before we can format the drive.
Partition Table
Before you can format a storage device, it needs to be divided up into partitions. In order to do this, a partition table needs to be created on the drive. There are two different partition types that are available.
To understand how partition tables work, I will first consider that we have an empty hard disk with nothing on it. A hard disk can be used as one large block or it can be divided up into different independent areas. When the computer starts, it first needs to read the configuration information stored on the hard disk to work out how many of these areas there are and where they are located on the hard disk. These areas are referred to as partitions as they essentially divide up, or partition, the hard disk into different areas.
There are two different methods to achieve this with the oldest being Master Boot Record or MBR. Master Boot Record works by writing some boot code and configuration information on the first sector of the hard disk.
The boot code is the first data on the drive. This small amount of code essentially provides a boot strap to read a boot loader. Also in the first sector is configuration information for four different partitions.
Partition information defines what part of the hard disk will be used. For example, let’s say we wanted to allocate some space from the hard disk for the C: drive. Where this partition starts and stops is stored in the Master Boot Record.
If a second partition was required, the start and stop information for it would also be stored in the Master Boot Record. This would also apply if a third or even a fourth partition was created. These four partitions are called primary partitions. Essentially, a primary partition is when the configuration data for that partition is stored in the Master Boot Record.
Four partitions per hard disk should be enough in most circumstances, but in some cases you may want to create more than four. When this occurs, one of the primary partitions gets changed to an extended partition.
When this occurs, the configuration for that partition is no longer stored in the Master Boot Record. What occurs is an extended partition is created. We still need some configuration information to know where the partition starts and stops, and this is stored at the start of the extended partition. Thus, the configuration information that was previously stored in the Master Boot Record, now points us to this location to read the partition information. Essentially, the primary partition information is stored in the Master Boot Record. If a primary partition is changed to an extended partition, this information now points to a location on the hard disk where the partition information is stored. Thus, extended partition information is stored outside of the Master Boot Record.
The next question would be, why would we want to do this? To understand this, consider that we add another partition to the hard disk. As before, the partition information for the new extended partition is stored at the start of the partition. Now what occurs is, the previous extended partition points to the new extended partition. These extended partitions allow us to link partitions together into a list. This allows us to exceed the four primary partition limit.
To demonstrate this a bit better, I will add two additional extended partitions. Again, the partition information is stored at the start of the partition and the previous extended partition will link to these partitions.
By linking extended partitions together like this, extended partitions can keep being created until you run out of storage space on the hard disk. In reality, which operating system you are running will determine the limit of how many extended partitions you can create.
Master Boot Record was first created back in the 80’s. With the use of extended partitions, it provides good functionality and is still used today. However, it does have some limitations, with the main one being that it is limited to hard disks of two terabytes in size. For this reason, a replacement partition table called GUID Partition Table or GPT was created. To understand how GPT works, let’s consider a hard disk with no data on it.
In order to stay compatible with Master Boot Record, the first sector of the hard disk has what is referred to as the protective MBR written to it. This is written to the disk so that disk utility tools will pick it up. Older disk utilities did not understand what a GPT partition was and, thus, they may have overwritten the GPT partition. Thus, the protective MBR was originally designed so that older tools would pick up this protective MBR and not write over the GPT data. Modern tools understand what a GPT partition is, but the protective MBR partition still remains for backwards compatibility.
The actual data for the GUID Partition Table follows on from the protective MBR. At the start is the GPT header which contains configuration information. This configuration information includes information on the maximum number of partitions that are supported. Unlike MBR, it is not limited by design to four primary partitions. The default number of partitions for GPT is 128. It is unlikely that anyone would need more than 128 partitions; however, the specification would allow it to support more if required. Keep in mind that once the GPT partition is written, the number of partitions is set. In order to change it, you would need to use disk tools that support it and may require a hard disk reformat.
To create a partition on the hard disk, information about the partition is simply stored in the partition table. For example, if I were to create a C: drive, the configuration information for this partition would be stored in the Partition 1 portion of the GPT header.
This also occurs for any additional partition that is created on the hard disk. So, if I were to do a basic comparison of the MBR to GPT partition tables, MBR has four fixed primary partitions which can be increased by changing one to an extended partition. GPT does not use extended partitions; however, the number of partitions it supports is variable. Generally, operating systems will use the default of 128 partitions which should be plenty.
Unlike MBR, GPT keeps a backup copy of the partition data. This backup is stored at the end of the hard disk. The idea behind this is, if the platter of the hard disk is damaged, it is unlikely the damage will be both at the start and end of the hard disk.
GPT was designed to replace MBR; however, MBR is still used today in certain situations. To understand why, I will compare the two.
MBR vs GPT
The main benefit with MBR is that it has good support for older hardware. Modern hardware will be supported as well, but the main limitation is that it will only support storage devices up to two terabytes. This is because MBR only supports 32-bit values. MBR was designed with 512-byte blocks in mind, so when you multiply this by the maximum number 32-bit supports, this will give you two terabytes. If you use a storage device that is larger than two terabytes, MBR will still work;, however, you won’t be able to access any data after two terabytes.
GPT, by comparison, uses 64-bit values and thus does not have a limitation of two terabytes. This is the main reason why you would use GPT. With large storage devices, disk tools will automatically use GPT if it is supported.
Because GPT uses 64-bit values, it may require 64-bit hardware. I say may, because it depends on the operating system. Operating systems like Windows will require 64-bit hardware to boot from GPT. However, newer versions of Windows can use GPT for data drives on 32-bit hardware. Some operating systems, for example Linux systems, may be able to boot a GPT drive using 32-bit hardware but not all Linux systems will use it. GPT generally has good support on newer computers, but may not be supported on older computers.
So, I think that we have covered enough theory, l will now change to my computer running Windows 10 and have a look at how to go about formatting a hard disk.
Demonstration
For this demonstration, I have connected a three terabyte hard disk to the computer. Due to the size of the hard disk, it won’t be possible to access it all using MBR; however, it will be possible to access all the data using GPT. Let’s have a look.
To start using the hard disk, I will first need to partition and format it. To do this, I will first open Windows Explorer. Once Windows Explorer is open, I will next right click on “This PC” and select “Manage”.
Once “Computer Management” has opened, I will next select the option “Disk Management”. When Disk Management opens, notice that a dialog box will appear showing that Windows has detected a hard disk that has not been initialized. Initializing a hard disk in Windows will create a partition table and will also create disk signatures on the hard disk, so that if the hard disk connections are changed inside the computer, Windows will be able to determine if it is the same hard disk on a different connection.
Notice that GPT will be selected by default. The version of Windows you are using will determine which option will be selected. On a modern computer running Windows 64-bit operating system with large hard disks, it should select GPT by default.
In this case, I will select the option MBR for Master Boot Record, so we can see what happens when a hard disk is used that is over two terabytes. Once I press O.K., notice that the hard disk will now appear in Windows as divided into two parts.
The hard disk is now using the Master Boot Record for the partition table. When I right click on the hard disk, notice that there is an option “Convert to GPT Disk”. This option will only be available if there are no partitions that have been created on the hard disk. In other words, in order to use this feature in Windows, all the data on the hard disk needs to be deleted.
Since the hard disk is using Master Boot Record, only the first two terabytes will be accessible. You will notice that the first part of the hard disk is displayed as 2048 gigabytes unallocated. The second part of the hard disk is also listed as unallocated. You will notice that if I right click on the second unallocated space, the options to create volumes are grayed out. A volume is essentially part of a hard disk or multiple hard disks that is used as a single storage area. Since MBR is being used, it is not possible to use the space beyond two terabytes.
Try not to get a partition confused with a volume. A partition is a low-level structure that divides up the hard disk into areas. A simple volume will match the partition in this case. More complicated volumes may span over multiple disks and use multiple partitions on different hard disks. In Windows, you don’t need to worry about creating partitions, because when you create a volume, Windows will automatically create the partitions for you. Think of partitions as low-level building blocks. A simple volume will use one partition of the same size. More complex volumes will use more than one partition which may span multiple hard disks.
Notice that when I right click on the first part of the hard disk, this time the option to create a simple volume is available. The other options are not available because they require additional hard disks. I will cover these options in another video.
To create a new volume, I will select the option “New Simple Volume Wizard”. Once I am past the welcome screen, I will need to select the size of the volume. In this case, I will enter in one million megabytes which is 1000 gigabytes.
On the next screen of the wizard, Windows will automatically select a drive letter for the new volume, but you are free to change it if you wish. You also have an option to mount the volume to an existing folder or not assign a drive letter to the volume and do it later if you wish.
On the next screen of the wzard, you have the option to format the volume. You don’t have to format the volume when you create it, but in most cases you will.
The first option will allow you to select what type of file system to format the volume with. The size of the volume will determine what options you have available. For storage that is being used in the computer, you will generally use NTFS. For external storage you will generally use a FAT file system. Depending on the size of the storage, different types of FAT file systems may be available. In this case, due to the size of the volume, the only option that is available is exFAT.
The next option is the size of the allocation unit. Later in the video I will look at this option in more detail, so in this case, I will leave it on the default. In most cases, you will want to use the default option.
By default, Windows will set the name of the volume to “New Volume”; you are free to use this or rename the volume later on.
Notice that there is an option, ticked by default, to perform a quick format of the drive. When ticked, the format generally only takes a few seconds to complete as it only writes basic file structures to the hard disk. Existing data that was present on the hard disk before the format may still be recoverable using recovery tools. If you deselect this option, a full format will be performed erasing all the data on the volume. This also tests the hard disk to make sure that it is working;, however, if the volume is large, it will take a long time to complete.
The next screen of the wizard is the finish screen. Once I press the Finish button, the volume will be created.
Once the volume is created, there will still be space available between this volume and the area beyond two terabytes that cannot be used with MBR. I will next create some more volumes to see how MBR handles multiple partitions. To do this, I will right click on the unallocated space and select “New Simple Volume”.
I will go through the wizard and set the size of this volume to two hundred thousand megabytes. I will go through the rest of the screens, accepting all the defaults and complete the wizard. I will follow the same procedure to create a third volume on the hard disk. I will make the volume the same size as the last one.
MBR has a limit of four primary partitions per hard disk. In order to get more partitions, one of these primary partitions needs to be changed to an extended partition. In order to demonstrate how Windows does this, I will create a fourth volume on the hard disk the same size as the last two.
Notice that this time, when the volume is created, it looks different to the others – the volume and the free space has a green square around it. To understand what this means, have a look at the bottom of “Disk Management”.
Notice the dark blue of “Primary partition”. Primary partitions were used for the first three volumes. Here dark green indicates an “Extended partition”. When I created the fourth volume, Windows automatically created it as an extended partition rather than a primary partition. This allows more partitions to be created if required, so we are not limited to four.
Notice that light blue is for “Logical drive”. Any partition inside an extended partition is referred to as a logical drive rather than a primary partition. The last partition that I created is light blue to indicate that it is a partition inside an extended partition.
I will next create a volume using the rest of the free space. Since I am using all the free space, I just need to accept all the default options in the wizard. Notice that once complete, this volume will appear as a logical drive.
Disk Management does a good job of creating and managing the partitions on the hard disks for the volumes. You generally don’t need to worry too much about it, as Windows does all the hard work for you. In the case of simple volumes, the partition will match the volume, so you may hear the names used interchangeably. If you want to get technical, a volume is what Windows uses to refer to the storage space and a partition refers to the low-level area that has been divided up on the hard disk for use with volumes.
If you want to know if the hard disk is using MBR or GPT, right click on the hard disk and select “Properties”. From properties, select the tab “Volumes”. Under volumes the partition type will be listed, in this case it will be Master Boot Record or MBR.
I will next have a look at how you can convert an MBR partition to a GPT partition. Firstly, I need the disk number. You will notice that in Disk Management the disk is listed as zero. Next, I need to open a command prompt. The command prompt will need to be opened with administrator rights. To do this, I need to right click on “Command Prompt” and select “Run as administrator”.
Once I press Yes to confirm, the command prompt will be opened with administrator rights. The tool provided by Windows to convert an MBR hard disk to GPT is MBR2GPT.
To check if the hard disk can be converted, use slash validate. This won’t attempt to convert the hard disk from MBR to GPT but will test to see if it is possible. Notice that when I run the command, I get an error message saying that it needs to be run from a Windows Preinstallation Environment. Windows Preinstallation Environment, otherwise known as Windows PE, is a minimal Windows bootable environment. It is recommended to use Windows PE to help prevent the hard disk from being accessed while the conversion is in progress.
To override this requirement, add the switch “AllowFullOS”. This will allow the command to run, but you will notice that an error will be reported. MBR2GPT won’t be able to convert this hard disk from MBR to GPT. You may find that MBR2GPT won’t work under some circumstances. If you find that you need to convert a MBR hard disk to GPT without erasing the data on the hard disk, there are also third-party tools available that can do this.
I will now have a look at converting the hard disk from MBR to GPT by first erasing all the data on the hard disk. To do this, I will first need to delete all the partitions on the hard disk. I will first start with the extended partitions.
To delete a volume, which will also delete the underlying partition, right click the volume and select “Delete Volume”. Windows will prompt you that deleting the volume will erase all the data. It is possible to restore data using recovery tools; however, it is not a guarantee that data will be able to be recovered. Thus, before deleting a volume, you always want to make sure you have a copy of the data if you value it.
Once the volume is deleted, it will appear as free space. I will next delete the other volume that is stored in the extended partition. Once deleted, notice that the dark green box still remains. What has occurred is the extended partition still exists, it just does not contain any partitions.
To delete the extended partition, as before, it is just a matter of right clicking it and selecting “Delete partition”. Notice that once I do this, the space will now appear as unallocated.
To convert the hard disk from MBR to GPT, I will now delete the remaining volumes on the hard disk which will remove the underlying partitions.
Once there are no more partitions on the hard disk, I will right click on the hard disk. Notice the option “Convert to GPT Disk” is now available. Once selected, the partition table on the hard disk will be changed from MBR to GPT.
As with MBR, the process of creating a volume is the same. You simply need to right click on the unallocated space and complete the wizard. In this case, I will create the volume with a size of one hundred thousand and then finish the wizard.
I will now create a further five volumes, each of the same size, and accept the default options. Again, as each volume is created, the underlying partition will be created, and the volume will be formatted.
You will notice that when the fourth volume and the volumes after this are created, they are all primary partitions. GPT is not limited to four primary partitions, as with MBR, and thus all partitions are created as primary. You will notice at the bottom of “Disk Management”, only “Primary partition” is listed. Since GPT does not use extended partitions, it is simpler than MBR in some ways. Keep in mind that Windows will do the hard work of creating and managing the partitions for you, so you don’t need to worry about it.
Now that I have created some volumes on the hard disk, I next want to have a look at the allocation units that Windows has assigned to a volume. To do this, I will go back to my command prompt. To see the size of the allocation unit I will run the command “chkdsk”.
You will notice that the allocation unit for this volume is 4096 bytes or 4k. This is the allocation unit that Windows will use for the hard disk. Essentially, this is the smallest block that Windows will use to read and write to the hard disk.
Now let’s have a look at the sector size of the hard disk. To do this, I will run “MSInfo32”. MSInfo32 provides information about the operating system and also the hardware. To find out what the sector size of the hard disk is, I will expand down through components, to storage and then select “Disks”. Notice that the sector size is listed on the right as 512 bytes.
It may seem strange that Windows uses a different allocation unit to the sector size. Let’s have a closer look at sector size and allocation units to understand why.
Sector Size/Allocation Unit
To understand how these work, let’s consider how the data on the hard disk is stored. A sector is the smallest writable area on the hard disk. Sectors have been around since storage devices did not have a lot of storage. Back in those days, 512 bytes was a popular sector size. Since it was a popular sector size, a lot of tools and hardware required it to operate. This caused problems when storage devices started increasing in size, as it would lead to a lot of space being wasted in certain situations.
To help reduce the amount of wasted space, larger sectors could be used. If you have a lot of large files on your hard disk, this reduces the amount of wasted space since less data will be used for system structures and error detecting data. The problem with using larger sectors is one of compatibility. This includes problems with older hardware and disk tools.
By contrast, smaller sectors are good for small files, since less space will be lost for each file. However, smaller sector sizes means more space is required for system structures and data error detecting. So, you can see there are tradeoffs depending on what you are doing.
Sector sizes are set by the manufacturer so they cannot be changed. So, in order to get some control over how data is stored on the storage device, the operating system stores data using an allocation unit.
An allocation unit can be bigger than the sector size. Although it is possible to make an allocation unit smaller than the sector size using certain file systems, it is not generally recommended. There are some reasons for this which I will go into later in the video.
Modern Windows operating systems use the term allocation units, while older Windows operating systems and Dos use the term clusters. Unix and Linux based systems will use the term blocks. They all refer to the same thing, so the terms may be used interchangeably.
The main take away you should get from this is that 512-byte sectors are compatible with older systems and disk tools. They worked well when storage devices did not have a lot of space. When space increased, there was a demand for large sector sizes; however, compatibility problems started to occur if larger sector sizes were used. Modern operating systems use allocation units or blocks to allow them to decide how data will be stored on the device, regardless of what the sector size is. Let’s now have a look at how hardware manufacturers get around this 512-byte sector limit but still retain compatibility.
Advanced Format
In order to increase the sector size but remain compatible, in 2010 the Advanced Format standard was created. Advanced Format solves the compatibility problems by grouping eight 512-byte sectors into a 4K sector. Essentially, the eight sectors are stored in the one sector. The hard disk transparently accesses each of these sectors as requested, essentially a sub-sector, by reading and writing the whole 4k sector as required. Thus, as far as the operating system is concerned, it is accessing a 512-byte sector. The hard disk performs the work of either reading the whole 4k sector and disregarding what is not required, or updating part of the 4k sector as required. Keep in mind that in order for the hard disk to write to a sub-sector like this, it will need to read the sector to work out what is there and write it back with the updated information. Thus, although Advanced Format offers compatibility with older systems, it is not an efficient way to access 512-byte sectors.
Advanced Format also increases the amount of data that can be stored on the hard disk. To understand why, consider that you have a single 512-byte sector. A sector on a hard disk will start with header information. This information allows the hard disk to determine where the sector starts. Keep in mind that the platters of the hard disk are spinning and header information is required, otherwise the hard disk won’t be able to detect the start of the sector.
At the end of the sector is error correcting data. This data will detect if data in the sector has changed and in some cases be able to correct it. So, you can see that each sector on the hard disk has some overhead. The overhead is required, but the more overhead you have, the less space that will be available for user data.
To understand this better, consider that there are eight 512-byte sectors. You can see that each sector requires space for header and error correcting data.
Advanced Format combines eight 512-byte sectors into a 4k sector. Essentially, the 4k sector contains eight 512-byte sectors, but only space for one header and error correcting data is required to be stored, so you can see how much space is saved. On a large storage device, this space saving starts to add up.
Modern operating systems support Advanced Format and thus you can understand why the allocation unit is set to 4k in Windows by default. This lines up with the native sector size of the hard disk. However, having Advanced Format means that any existing tools or software that requires 512-byte sectors will still be supported. Basically, what occurs is, the hard disk transparently provides 512-bye sectors even though the data itself is stored in 4k sectors.
In most cases, you would leave the allocation unit on the default. However, in some cases you may want to change the default allocation unit. Let’s have a look at what effects that might have.
Hard Disk Layout
To understand what effects different sector sizes have on a hard disk, I will look at how data is laid out. This is a general look at how data is laid out. Different file systems will lay out data differently. Newer file systems are generally more efficient in laying out data and have more features than older operating systems.
To start with, the file system will need some sort of allocation table or master file table. This data structure will contain data of where to find the files and folders on the hard disk. The hard disk will also need some way of determining what areas of the hard disk have been used and which areas are free.
Different file systems will use different methods to determine this. Older file systems will use a file pointer like system which will point to the next used area. Newer file systems will use a bitmap style system where a one or zero will determine if the space has been used or not.
If you decide to use a smaller allocation unit, this will increase the amount of configuration data required to store data in relation to where the file and folders are located on the hard disk. This data is not really that much in the overall scheme of things, so I would not worry too much about it. It was more of a concern in the old days where storage devices could not store that much data. Nowadays, storage devices can hold a lot of data and therefore these structures, when compared to the overall data held on the storage devices, are not that much.
Let’s now consider what data is required for files on the hard disk. When a file is written to the hard disk, allocation units are assigned to the file; however, this file may not completely fill all the allocation units that have been assigned. Unless the file is an exact multiple of the allocation unit, there will be wasted space.
You can see the wasted space at the end of each file. To decrease the amount of wasted space, you can decrease the size of the allocation unit. A smaller allocation unit will reduce the amount of wasted space. However, it will increase the amount of space required for configuration data, such as keeping track of the free space on the hard disk. Generally, this is a minor trade off, but should be taken into consideration, in that a smaller allocation unit will decrease performance.
So essentially, what this means is that a smaller allocation unit means less wasted space but will decrease performance. This is only a real concern if the storage device will store a lot of smaller files. Thus, unless you are storing a lot of small files, I personally would leave the allocation unit on the default setting.
To get a better understanding of how the allocation units affect your storage, I will change to my Windows 10 computer and have a look at what happens when different files are transferred.
Allocation Unit Demonstration
For this demonstration, I have added a second hard disk to the computer. This hard disk is 250 gigabytes in size. Thus, there is a three terabyte hard disk and a 250 gigabyte hard disk connected to the same computer.
On the desktop of the computer, I have created three files of different lengths containing random data. I will use these files to get an idea of how much data each allocation unit and file system is using.
To access Disk Management, I will open Windows Explorer, right click on “This PC” and then select “Manage” to open “Computer Management”. Once Computer Management is open, I will select “Disk Management”.
You will notice that once Disk Management is open, a window will appear saying that two hard disks have been found that have not been initialized. This includes the three-terabyte hard disk I used previously, however I wiped it before I started this demonstration, so it appears as uninitialized.
The default option will be to use GPT for the partition table. The type of partition table used does not affect how the allocation units will work on the file system, so I will accept the default option and press O.K.
Once the partition table has been written to the hard disk, I will next create a partition on the first hard disk. To do this, I will right click on the partition and select the option “New Simple Volume” to start the new partition wizard.
For the wizard, I will accept the defaults to use all the space on the hard disk. When I reach the format partition window, I will change the file system to “exFAT”. Later in the video I will look at how allocation unit affects NTFS.
For the allocation unit, I will select the smallest allocation unit which is 512 bytes. Once selected, I will complete the wizard and the partition will be formatted. Once complete, I will right click on the partition and select “Properties”.
You will notice that even though there is nothing on the hard disk, there is space already used. This space is used for data structures on the hard disk for storing information, for example where files and folders are stored.
I will now exit out of here and go to the desktop and copy the three files on the desktop to the hard disk. The files are different lengths. The first 100 bytes, the second 1024 bytes and the last 4097 bytes. These three different files will give us a better understanding of how files are stored with different size allocation units.
Now that some data has been copied, I will open the properties of the hard disk to see how much space has been used.
You can see that the amount of used space has increased. When you do the math, the space used has increased by 7680 bytes. The hard disk is using 512-byte allocation units. This essentially means that 15 allocation units have been used. The files I copied over should be using 14 allocation units. The extra allocation units can be accounted for due to additional structures required for recording other configuration information such as the file names.
Let’s have a closer look at the files I copied over. The first file I copied over is 100 bytes. I will open the properties of the file and have a look at how much space it is using.
You will notice that Windows is reporting the size as 100 bytes, which is correct. However, notice that Windows is reporting the size on the disk as 4096 bytes. The “Size on disk” value is supposed to inform you how much space the file is taking up on the hard disk, accounting for wasted space due to allocation units not being completely filled up.
In this case, it seems Windows is reporting this incorrectly. It appears that it is using a 4096 byte allocation unit size rather than 512 to report this. Keep in mind that the used space was reported as 7680 bytes for all three files and we have already seen more than half of this figure looking at only the first file. For this reason, I believe that Windows, in this case, is reporting the size used on the disk incorrectly. This is why I would personally also look at the used space on the drive to help determine how much real space is being used by files just in case something is not being reported correctly.
I will now move on and look at the properties for the next file which is the one kilobyte file or 1024 bytes. You will notice that the size of the file is reported correctly as one kilobyte. The “Size on disk” is reported to be 4096 which I believe is incorrect. Considering just the first two files, this would already take us over the amount of used space for this drive as reported by Windows.
I will close the properties for this file and open the properties for the third file. This file is 4097 bytes long, so one byte longer than a 4K allocation unit. You will notice the size of the file is reported correctly as 4097. Notice however, the “Size on disk” is reported as 8192 bytes. You can see the “Size on disk” is greater than the amount of used space, so I don’t believe in this case the “Size on disk” is being reported correctly since the allocation unit I selected was 512 bytes.
I will now go back to Disk Management and create a simple volume for the three terabyte hard disk. As before, I will accept all the defaults so the full hard disk is used. For the file system I will select exFat and an allocation unit size of 4096 which is four kilobytes. I will now complete the wizard and the volume will be created.
Again, I will open the properties for the hard disk so I can see how much space is being used when no data is on the drive. You will notice with nothing on this hard disk, about 87 megabytes of space is already used for internal file structures. This hard disk is using a 4K allocation unit size, so when I compare that with the hard disk formatted with 512 bytes, notice that it is almost 30 megabytes smaller.
In the scheme of things, 30 megabytes on a three-terabyte hard disk is nothing really. So, I would not be concerned about setting the allocation unit smaller in relation to the amount of data required for file system structures. Keep in mind that old file systems like FAT32 use a lot more data for file structures. New file systems are generally more efficient at storing data. For example, exFAT stores data using a bitmap, zeros and ones essentially, whereas FAT32 stores data using linked lists which is much less efficient.
I will now go to the desktop and copy the three files to the hard disk. As before, I will open the properties for the 100-byte file. You will notice that I get the same results as before, the file is using 100 bytes and the “Size on disk” is reported to be 4096 bytes. This would be what we expect using a 4k allocation unit.
I won’t worry about looking at the other files as they are exactly the same as the previous hard disk reported. I will now close this window and open the properties for the hard disk.
When I compare this to how much space is used before the files were copied across, you will notice that about 24 kilobytes of space has been used. With the 512-byte allocation unit, the space used was about seven kilobytes. So even with a small number of files, if the files are small, you can start saving space. Let’s summarize what has occurred so far.
I have formatted two different hard disks using different allocation units. The first with a 512-byte allocation unit and the second with 4096 bytes or a 4k allocation unit. I copied three files to both hard disks containing random data, the files being 100 bytes, 1024 bytes and 4097 bytes in length.
Each file will be allocated allocation units. The file must fit within these allocation units, as any unused space in these allocation units will be wasted space.
Having a look at the hard disk with 512-byte allocation units, it appears that Windows has displayed the size on disk incorrectly. In this case it appears the value was calculated using 4096 rather than 512. However, looking at the used space figure we can get an idea of how much space was used and come to the following conclusions. For the first hard disk, 14 allocation units were used for data and one was used for admin data such as file details like the file names.
In the case of the second hard disk, four allocation units were used for data and two for admin data. The admin data can be a little misleading because if additional files are copied to the drive there may be enough space to add the additional filenames and other data in the space previously allocated. Thus, it may not increase straight away, but as more files are added it should start to increase in size.
The main take away from this should be that the difference in the data size was almost twice the amount used. If you are copying a lot of small files, this may be a concern for you. However, if you don’t have a lot of small files, it is not worth worrying about.
I think we have a good understanding of how allocation units work, but there is something that I would like to demonstrate that occurs with the NTFS file system, so I will change back to my Windows 10 computer to have a look.
I will now go back to Disk Management and delete the volume on the first hard disk. Once deleted, I will create a new volume using the default options until I get to the format partition screen. On this screen, I will leave the file system on NTFS rather than selecting exFAT and also leave the allocation unit size on the default.
Windows will default to 4096 bytes for the allocation unit except for small drives under 256 megabytes where it will use something smaller. Once I complete the wizard, the volume will be formatted using NTFS. Once complete, I will copy the files from the desktop to the hard disk.
Once copied, I will open the properties for the 4097-byte file. You will notice that for the file size and “Size on disk” it is showing what is expected. That is, two 4k allocation units are being used to store the data.
I will next open the one kilobyte file. As expected, the file is using one allocation unit so the size on disk is reported as 4096 bytes. So, nothing unusual so far. I will next open the properties for the 100-byte file.
You will notice that the size of the file is reported correctly as 100 bytes; however, notice that the “Size on disk” is reported as zero bytes. At first this may seem incorrect, but I can assure you that it is actually correct, so let’s have a look at what has occurred.
Resident Files
NTFS has a feature called resident files. When this is used the file is stored in the Master File Table or MFT rather than in the data area of the storage. For each file on NTFS, it requires one data record which contains the filename and other properties. This record is one kilobyte in size, so if there is free space in the record, Windows will store the file in the MFT rather than storage. Let’s consider an example.
Let’s consider that you have a one kilobyte NTFS record. We want to store a file on the hard disk, so the NTFS record is filled with information about the file, such as its filename, attributes, permissions and other information.
The full one kilobyte is not being used so if the file is small, it is simply stored in the MFT rather than in the data area of the storage device. You can now see why Windows reported the file as having zero bytes used on the disk. The file is essentially not taking up any data on the disk because it is stored completely in the MFT.
Now let’s consider a second example. Here, the same file is used; however, this time there are more permissions and other information and thus more data is required to be stored in the MFT. Now when we attempt to store the file in the MFT it won’t fit and, thus, it must be stored in the data area of the storage device.
You can see why I can’t give you an exact answer on how big a file will need to be in order to be stored in the MFT, and this is because it depends on the other data being stored as well. Doing things like using a different file name can make a difference. Thus, if you are storing a lot of small files and using NTFS, you may not get much benefit using a smaller allocation unit if a lot of your files are small enough to fit in the MFT.
There are a couple more points that I would like to address. To do this, I will go back to my computer running Windows 10.
Demonstration
For this demonstration, I am using the three-terabyte hard disk and also adding a 16 gigabyte USB stick. This will show some of the problems you may come across.
As before, I will open Windows Explorer, right click “This PC”, select “Manage” and then select “Disk Management”. My three-terabyte hard disk is using a GPT partition table.
I will right click on it and select “New Simple Volume”.
For the wizard, I will accept all the defaults. This will create a volume of maximum size which will be two terabytes in this case. When I get to the format partition screen, I will change the allocation unit size to 512 bytes. Once this is done, I will complete the wizard.
You will notice that an error appears saying the format did not complete successfully. As NTFS file systems get larger, the size of the allocation unit needs to increase. If you attempt to use an allocation unit that is too small, the format will fail. In the case of hard disks two terabytes and above, NTFS requires an allocation unit of at least 4k.
I will next go to my USB stick and attempt to create one volume using all the available space. Once I get to the format partition screen, I will select FAT32 with the default allocation unit and complete the wizard.
Once the format is complete, I will attempt to copy a five gigabyte file containing random data from the desktop to the USB stick. You will notice that the copy will fail. This is because FAT32 does not support files larger than four gigabytes. It is not recommended to use NTFS on USB flash drives; however, in this case because the flash media is of such small capacity, this is your only choice. If you have the option, you should use exFAT for USB since that supports large files.
There is a lot covered in this video, so let’s now have a look at what you really need to know in the real world.
In The Real World
In the real world, MBR is good for older hardware but does have a two-terabyte limit. GPT does not have this limit but requires new hardware, maybe 64-bit hardware. 64-bit hardware is generally required for booting but may not be required for data drives.
Unless you have special requirements, it is best to leave the allocation unit on the default. You will find a common sector size of 512 bytes used with many different storage devices. This is a legacy sector size which a lot of disk tools will expect to see.
To increase the sector size but keep compatibility, Advanced Format groups eight 512-byte sectors into a single 4K sector. The physical sector on the hard disk will be 4K but it will appear as eight physical sectors. Operating systems are aware of this and thus will use 4K allocation units, so the allocation unit matches the actual physical sector size.
4096 or 4K has become the default size for hard disks. This sector size is also required for NTFS compression, so if you use that feature you will need to use 4K sector sizes.
Larger sector sizes like 8K are only required by NTFS if the storage device is over 16 terrabytes in size. Unless you are working with large RAID systems, you won’t come across storage devices that big. Some specialized software may use 8K sectors on storage devices.
Larger sector sizes may be used for specialized uses such as RAID and databases. Best to check first, because some specialized software may not support larger sector sizes.
Having a smaller allocation size means less wasted space at the end of a sector. This is useful if you have a lot of small files. Using a smaller sector size will reduce performance, so it is only worth doing if you do have a lot of small files.
Generally speaking, having a larger allocation size means the storage device will perform faster. However, there are some things to consider if you change the allocation unit from the default and doing it either way, larger or smaller, can reduce performance depending on the circumstances.
Firstly, I will have a look at what happens when you reduce the allocation unit to be smaller than the sector size. When this occurs, you can get what is called fragmentation. For example, consider that you have two physical sectors.
Consider that you want to read two allocation units that are next to each other but are in different sectors. In this example, to read two allocation units, two physical sectors need to be read. In this case, each sector has four allocation units, so eight allocation units are effectively read to read just two allocation units. The extra data is read but will be discarded. You can see that fragmentation results in extra data reads that reduce performance.
The next problem that occurs is with writing. Consider that I have a single physical sector that contains four allocation units. In order to write to one of the allocation units, the whole physical sector needs to be read to work out what is there. Once it has been read, it next needs to write the physical sector with the changed data. Performance is slower since changing data results in one read and one write. Thus, you can understand that reducing the allocation unit can cause a lot of performance issues, so I would only do this if storing a lot of very small files is more important than performance.
If you do the opposite and make the allocation unit larger than the physical sector, this can cause data fragmentation issues. For example, let’s consider that we have a number of sectors. If the allocation unit goes over the physical sector size, in order to read one allocation unit, two physical sectors need to be read. As before, the extra data is discarded. Data fragmentation results in more input and outputs to the storage device. For this reason, I would not recommend having an allocation unit that exceeds the physical sector size of the storage device, but sometimes it needs to be done.
For the average user and even the advanced user, it is best to leave the allocation unit on the default. Unless you have special requirements, I would not change the default as it will probably reduce performance in most cases.
I will now have a look at which file system to use.
In The Real World
If you are using removable storage, such as a USB stick, it is recommended to use a FAT file system. In Windows FAT32 will be used for storage up to 32 gigabytes. FAT32 supports large storage devices, but Windows does not format large storage devices with that file format, but it will read FAT32 storage devices that were formatted in other systems.
FAT32 is limited to files of four gigabytes. To overcome the limitations of this, exFAT was released. Later on, Microsoft released this as an open standard and thus anyone can implement it now. This makes it a good choice if you want to transfer files between different systems. If you need to store files larger than four gigabytes, use exFAT if it is available. If not, you may have to use NTFS which is not recommended for the reasons I will go into next.
FAT does not support security or journaling. In order to get security, you need to use NTFS. The security is not worth anything on removable storage because, once you put it in another system you can override the security and access the files. Journaling is the process of keeping logs of changes to the storage media. If the storage device is disconnected or power is lost, the journaling allows the previous state of the storage media to be restored; without this, there is more chance of data loss. Journaling increases the number of writes to the storage media, which slows down accesses and also reduces the lifespan of flash devices.
Thus, FAT is good for removable storage. It has good compatibility with other systems and the extra features provided by file systems such as NTFS are not needed.
In the case of hard disks and other storage devices, if they are not removable storage, NTFS is a better choice of file system. NTFS offers security and journaling. Journaling is useful for your operating system drive. If you have a system crash, the computer is more likely to recover without any data loss.
Since NTFS has security and journaling, it is a good choice for internal storage. So, as a general rule, use FAT for removable storage and NTFS for internal storage. There has been a lot in this video, but if you are not sure which options to use, leave it on the default option.
End Screen
That concludes this video from ITFreeTraining. I hope it helps you make the right decisions for formatting and using your storage devices. Until the next video from us I would like to thank you for watching.
References
“The Official CompTIA A+ Core Study Guide (Exam 220-1001)” Chapter 6 Paragraph 188 – 191
“CompTIA A+ Certification exam guide. Tenth edition” Page 370
“Advanced Format” https://en.wikipedia.org/wiki/Advanced_Format
“Picture: Hard disk” https://pixabay.com/photos/hard-drive-hard-disk-hdd-disk-4699797/
“Picture: Sectors” https://en.wikipedia.org/wiki/Disk_sector#/media/File:Disk-structure2.svg
“Picture: Hard disk” https://pixabay.com/photos/hard-disk-drive-hardware-data-2477/
“Picture: Blocks” https://pixabay.com/photos/partition-brick-perspective-3369706/
“Picture: Scales” https://pixabay.com/photos/scale-question-importance-balance-2635397/
“Picture: Light Design” https://pixabay.com/illustrations/uhd-wallpaper-laser-light-design-6686660/
“Picture: Advanced Format” https://en.wikipedia.org/wiki/File:Advanced_format_logo.png
“Picture: Hard disk drive format efficiency with Advanced Format 4K technology and distributed ECC” https://en.wikipedia.org/wiki/Advanced_Format#/media/File:Advanced_format_(4Kib)_HDD_sector.svg
“Picture: Paper Mesh” https://unsplash.com/photos/79T37JljDZ4
“Picture: Pixels” https://pixabay.com/vectors/pixel-square-background-halftone-2658014/
“Picture: Hard disk” https://unsplash.com/photos/GNyjCePVRs8
“Picture: Cargo containers” https://pixabay.com/illustrations/shipping-containers-cargo-containers-6607770/
“Picture: Woman in room” https://www.pexels.com/photo/pensive-female-standing-near-window-in-dark-room-7258440/
Credits
Trainer: Austin Mason http://ITFreeTraining.com
Voice Talent: HP Lewis http://hplewis.com
Quality Assurance: Brett Batson http://www.pbb-proofreading.uk