To start with, a quick review of a GPU. A GPU is a specialized chip for processing graphics. Well, originally that was all it was used for. Nowadays GPUs have a lot of processing power and manufacturers have found other applications for them to be used for, other than graphics.
Rather than having a few cores like a CPU, a GPU has a large number of cores. In the case of this GPU, it has 2560 cores. This allows for a lot of parallel processing but having this many cores in the one chip reduces the size of each core compared with what can be achieved in a CPU. The end result, a GPU core can achieve less than a CPU core; however, given that there are so many of them, this is where the processing power comes from. A GPU essentially trades the ability to process complex sequences of instructions with the ability to parallel process a lot of simple instructions quickly. It is like having the choice between employing a couple of highly skilled workers or employing a large number of unskilled workers. Which is better depends on what you are trying to achieve.
I will now have a look at some of the things that make a video card work.
CPU vs GPU Clock Rate
The first thing that I will look at is the clock rate. The clock rate is essentially a pulse that is generated and is used to synchronize component operations. The clock rate is like a metronome, all the electronics in the device work in sync with it.
I will now have a look at the CPU clock rates compared with GPU clock rates. This chart looks at the fastest Intel CPU clock rate and Nvidia clock rates released each year from 2000 to 2019. You can see that CPU rates have gone a little up and down; however, nowadays it stays around the 5 Gigahertz range. There are some hard engineering limits that have been reached in the design of CPUs, so this is why, to get more performance, you see more cores added and other improvements rather than clock rate improvements.
For GPUs, by contrast, there has been a slow increase over the same time period; however, you can see that the fastest CPU is still about three times as fast as the fastest GPU. However, clock rates can be a little bit misleading.
You can see with CPUs, the speed went a little up and down, but the whole time they were getting faster until in 2011 they started to level out. Even though the clock rate has leveled out, other improvements in the CPU have meant that the CPU is able to process more information. Clock rate is not the only thing to consider when looking at what you may be able to achieve.
To understand what different parts of the GPU may affect performance, I will first have a look at how 3D graphics are created.
Example Render Pipeline
I will first look at an example rendering pipeline. I will only look at a small part of the pipeline and different implementations may do it differently, but this should give you an idea of what may be involved. First, data is inputted into the pipeline. This data is in the form of triangles. When you break down a complex 3D object, it is always broken down to the simplest element which is a triangle. A triangle has the property that, no matter where the three points of the triangle are, it is always a plane. In other words, it is always a flat surface.
The next step is to change the position of the points of the triangle to suit its rotation, position and the angle that it is being looked at. Once this is complete, the points of the triangle will most likely change.
Now that the triangle is orientated according to its position and how it is being viewed in the 3D world, we can apply a shader to the triangle. The shader essentially takes each point of the triangle and gives it a color. This is then passed to the next step in the pipeline.
There are a lot more steps in a pipeline, but this gives you an idea of what goes into processing a 3D world and putting it onto a 2D screen. Let’s now have a look at how this is achieved using a GPU.
In order to achieve this, Nvidia GPUs use Cuda cores and AMD GPUs use Stream cores. Each core supports direct programming access. Older video cards are likely to have fewer cores support fewer instructions. Older video cards did not have programable cores and the output was not very customizable compared to what can be achieved with modern cores.
To understand what can be achieved, consider we have a triangle that we want to render. The triangle is mapped to the pixels on the screen. Of course, we would need to do this for everything we could see on the screen; however, for this example, I will just look at a single triangle.
The next step is to put the data through the cores that have been programmed for shading. A shader essentially colors the pixels. There are many different ways this can be achieved. You can see that being able to program the cores with whatever shader you want gives you a lot of flexibility.
Once the cores have finished processing, the results are outputted. A full 3D pipeline contains many different steps, and shading is just one of them. Also factors like lighting need to be considered and textures applied. Cuda and Stream cores provide a lot of flexibility, but as flexible as they are, there are also other specialized parts of the video card that improve the performance. Let us have a look.
The next feature I will look at is the Tensor core. A Tensor core was added to Nvidia graphics card in order to speed up the training of neural networks which are used in artificial intelligence. To understand how they work, consider a streaming multiprocessor or SM inside the Nvidia chip. An SM is a number of cores that are grouped together – in this case 32 cores are grouped together.
The GPU can process both integer and floating-point math. The far-left side is used when integer math is required, and the middle section is floating-point math. On the far right are two Tensor cores.
The Tensor core multiplies two four-by-four matrices and adds a third four-by-four matrix. To perform this using a standard core would require 60 operations to complete. You can see that even only having two Tensor cores makes the process a lot faster. Since this operation is used in training neural networks, using Tensor cores makes the whole process quicker.
At the time this video was made Tensor Cores are only really useful for AI related activities. Maybe they will figure out how to use them for other things, but they are quite a specialized core. It may seem strange that Nvidia has added these to graphics cards when they are only used for a small market. I can only assume this is for marketing reasons and advancement of the technology. Having just two Tensor Cores per SM allow users to essentially get a taste for the technology. If you are running an AI based application, you can give them a try and see how much faster they make things. We are starting to see AI also used in generating computer graphics. You can see this is probably Nvidia’s way of developing a new market for their processors.
Tensor cores are specialized cores which have only recently been added to Nvidia cards, so now let’s have a look at some of the other parts of the GPU.
The next part of the GPU that I will look at are the texture units. Texture units are the part of the GPU that does texture mapping. Texture mapping is the process of mapping 2D images to a 3D object. To understand how that works, consider this texture of a house. The graphics for the house are essentially flattened out onto a 2D image. This 2D image is then mapped to the 3D points of the 3D object for the house.
Texture mapping is a fundamental part of 3D work. It is what essentially makes a 3D model look realistic. For example, if you have a plane rectangle, you could apply a timber texture map to make it look like timber. You could apply stone to make it look like a stone floor. Essentially you apply any 2D image to essentially color the pixels on the 3D object.
If I now consider a block diagram for a Nvidia graphics card, you will notice that at the bottom are the texture units. AMD cards will be designed in a similar way. The texture unit’s job is essentially to apply these texture maps to the 3D model.
You will notice that there are only four texture units and they are at the bottom. This is because they are towards the end of the 3D pipeline. Also, nowadays, things like shaders have become more important and require more processing power to calculate. I will have a look into that in more detail in a moment. The end result is that texture units are towards the end of the pipeline and you don’t require too many of them.
On older video cards, you will see the specification listed of how much texture mapping a video card can do per second. On more recent video cards, we have seen this specification is no longer included and replaced with other specifications about how much data the graphics card can process at once. Since texture mapping is run in parallel with other processes and is only a small part of the processing, you can see why they no longer include it with the specifications, as it would not give you an accurate measure of how fast the video card is.
One question you may have been asking yourself when I was looking at the cores, could you run the integer, floating-point and Tensor cores at the same time? Well the answer is you could by scheduling and formatting the data in a certain way before it reaches the cores. You can see that it is not a simple matter of giving a specification of how fast a core can run at, because how much you can achieve is based on how well you can utilize all the cores at once. This is why we are starting to see a move away from quoting specifications of things like the texture unit in favor of an overall specification.
Let’s have a look at why texture mapping is not as process intensive as it used to be.
To understand why the texture unit has become less of a factor in the overall performance of the video card, let us consider an example pipeline. To start with, you have your 3D model. In this case, the 3D model is the planet Mercury. The model will look a bit jagged as it is essentially points connected by lines.
To make it look smoother, the object will have a normal added to each point. The normal determines which way light will bounce off a surface when it hits it. With a spherical surface like this, it can be used to make the sphere look smoother and less jagged.
The surface of Mercury is of course not smooth, so the next step would be applying a bump map. A bump map changes the way light hits a surface. If you consider a smooth surface like a mirror, the light will all bounce in the same direction. If you consider a surface like asphalt, this is very rough, so the light will bounce off in different directions. Bump mapping allows greater control of how the light bounces off the object. You can see that just applying a bump map has made the surface look a lot rougher.
The last step would be to apply the texture map. You can see that applying the texture map is kind of like applying the paint to the 3D model. You can start to see why shaders and other processes in the pipeline are becoming a lot more important and thus require a lot more processing. With the newer graphics cards, the texture units only need to be able to perform at a level that will support the highest resolution and frame rate. Any extra processing power is essentially not needed and thus wasted. You can see why texture units are not as important in the pipeline as they used to be. Essentially processes like shaders and lighting essentially do most of the work and the texture map just provides a little bit of color.
We will probably see a trend where texture mapping specifications are not included with graphics cards because as long as they are fast enough to support the other processes on the card, any extra processing power won’t make a difference.
In newer graphics cards, there is another new function that has been added which improves the output of graphics. Let’s have a look.
The newest feature added to graphics cards is ray tracing. Ray tracing is a way of creating an image by tracing the path of light. In this example you can see there are a lot of objects that are see through, create shadows and reflect on to other objects. In order to create this image, light rays are tracked as they pass through an object, reflect to other objects, bounce off others and create shadows and reflections. When you think about it, this is quite a complicated image to create. This image was created in a professional render program and you will not be able to achieve something like this using a graphics card, well not yet.
If you have a close look at the image, you can see the green ash tray has cast green reflections on other objects. If you look at the glasses, you can see other glasses through them; essentially this is a very simple image of some items you may find in a home, but it presents a lot of rendering issues. To understand how we can use the new ray tracing features in graphics cards, let’s compare it with the traditional rasterization method that is currently being used.
Rasterization vs Ray Tracing
To understand how rasterization and ray tracing are different, consider that rasterization is more like you are painting a picture. The painter attempts to capture the scene by applying different colors to their canvas. The painter considers a small part of the scene and thinks about what color they will apply to that part.
By contrast, in order to determine a single pixel in an image, ray tracing considers all the objects in the scene and how they react to each other. You can see in this image that there are lights coming from the windows that are out of shot – these light up the floor and the seats. The rasterization method simply tries to take guesses at what these should be. You can see that some shadows and lighting effects have been applied, but nowhere as good as the ray tracing example. You can see that ray tracing better represents the effect lights have on other objects.
To better understand how ray tracing works in these new graphics cards, let’s consider a different example.
Ray Tracing Example
Ray tracing works by tracing rays through the scene to create the image. If you were looking at a scene like this and wanted to recreate the scene using ray tracing you would do it like this. You would simulate a ray of light, the more the better, traveling through the scene.
In this case, the light has hit the side of the building and has bounced off. The light would decrease in intensity when it hit the building and also potentially change color. The light then hits the water and since the water is not completely flat the light scatters. This scattering of the light could potentially hit other objects in the scene and needs to be accounted for. Also, the scattering of the light reduces its intensity. You can see how quickly ray tracing can get very complicated. To do it correctly, all objects in the scene need to be considered and how they react with each other.
Lastly, the light bounces off a very shiny rock and this where the camera looking at the scene captures the light. To the person viewing the scene, they should see a reflection on the rock of the river. Given how much the light has bounced around this will probably be very dull.
By contrast, rasterization attempts to simulate how the world would look. For example, with things like shadows guesses can be taken. This can also be done, to some extent, with things like reflections. For example, looking at the water you can see a reflection. However, it is very unlikely that reflections in objects like rocks and with other things in the scene will be accounted for using rasterization.
Rasterization also has problems in that, when creating shadows, you may get artifacts in the shadow – that is extra information in the shadows which makes them look unrealistic. Rasterization is essentially taking a best guess at what the world will look like. Ray tracing, on the other hand, gives a more accurate view of the world; however, to use it effectively, you need a lot of rays, thus it is significantly more computationally expensive to use ray tracing rather than rasterization.
Let us have a look at how the video cards with ray tracing attempt to fix these problems.
Real-time ray tracing is considered the holy grail of computer graphics. It is something we have wished for since real-time graphics started becoming a thing in the 90’s. Nvidia has added the RT cores which are specialized cores dedicated to ray tracing. We will see why this is not quite the holy grail yet, but it is a great start.
To understand how RT cores work, once again let us consider the block diagram for the GPU. The RT cores are located at the bottom. This tends to indicate they are at the end of the pipeline; however, the way it works is that these cores are used to augment rasterization to help give more photo-realistic images. There may also be some back and forth between the cores to get to the end result.
You can see in this example of ray tracing, unless you look twice, you would think it was real. It goes to show what can be achieved. If you wish to watch the whole video, a link is included in the reference part of this video.
In order to achieve the best results, hybrid ray tracing is used. This is essentially a mix between rasterization and ray tracing. To better understand, let’s consider an example. The left image is without ray tracing and the right is with ray tracing. Comparing both, you can see that parts of the image are the same.
For other parts, more detail can be seen that was not visible before. You can see that essentially the 3D engine needs to make a decision between which parts of the image to use ray tracing on and which not. Essentially it is using the processing power available in the RT cores as efficiently as possible.
They are other tricks to getting good results as well. If you had unlimited processing power, you would cast a lot of rays and then average the results. With limited RT cores, you have to pick and choose which rays to use, which can lead to noise in the end image. To get around this, Nvidia uses advanced denoising filtering to get even better results. Essentially this means taking the output from ray tracing and cleaning up that output to get better results rather than casting more rays and averaging the result. Ideally it would be better to cast more rays, but given the technology we currently have, this is the best way to get the best results.
This covers the basics of how the components inside the GPU work, so I will now have a look at the other parts of the graphics card.
All video cards will have memory on them. Even an integrated video card will have a very small amount of memory for buffering. Dedicated video cards will have a lot of dedicated memory for the video card. The memory on dedicated video cards is designed with video in mind. If differs from the memory inside the computer due to it having higher bandwidth and thus is able to transfer more data at once. More on that later in the video.
You can see that there are ten memory chips on the video card itself. There are a few different ways the memory can be used on the computer, and later I will look at some of the different ways memory can be put on a video card.
The video card uses dedicated memory for a lot of different things. These include frame buffers, storing textures, Z-buffer and shadow maps. The frame buffers are used to store the image that is currently being shown or about to be shown. Texture maps need to be installed on the video card so it can access them quickly. The Z-buffer is a buffer that allows the video card to quickly determine if the graphic it is processing is in front of an existing graphic or behind it. Shadow maps are used in the creation of shadows.
When you start using resolutions like 4K, you really start to need a lot of video memory. It is common for video cards to have at least one to four gigabytes of memory. The more expensive video cards can have anywhere from eight to 24 gigabytes of memory. A professional card can range all the way up to 48 Gigabytes of memory. That is a lot of memory, but before I look at why we need that much, I will first look at why we need dedicated video memory on a video card.
In computing, performance problems can occur due to bottlenecks and the video card is no exception. A bottle neck in computing is when the system is limited by a single component. To understand this, think of a computer system like roads. There is no point upgrading a road to a superhighway if it leads to a dirt track. Congestion will form where the superhighway meets the dirt track slowing down everything on the superhighway. This is why the speeds of a computer system tend to increase together rather than one system component being significantly faster than the others.
In a graphics card, you have a very powerful GPU, and you don’t want it to be bottlenecked by it not been able to get enough data quickly enough. This is why you need very fast memory connected directly to the GPU.
This memory is different from traditional system memory. The first difference is that it has a higher data throughput. Throughput is essentially the maximum rate data can be transferred at once. Video memory has a much greater throughput then system memory. Video memory is also able to read and write at the same time.
The next characteristic to consider for video memory is access speed. Access speed is the time delay between when the memory request goes through and when it is completed. Video memory differs from system memory in that it generally has a faster access time then system memory.
The last characteristic to consider for video memory is capacity. Capacity is the maximum amount of memory that the video card can access. If there is too little video memory, the video card needs to swap memory with the computer. So generally, the more the better. More video memory will increase the cost of the video card, so often it will be a tradeoff between the amount of video memory you want and how much it costs.
To get a better understanding of video memory, let’s compare it with DDR memory.
DDR and GDDR Very Different
When comparing DDR and GDDR the first point to make is that they are very different from each other. Before GDDR3, they were similar; however, after that they went in different directions. The main similarity between the two is that they can both transfer two pieces of data per cycle, thus the name double data rate. However, as we will see, that changed later on.
Shown here is a table comparing DDR3 with the current GDDR memory and its competitor HBM. I have mentioned DDR3 because it is found on some budget graphics cards and is a good place to start the comparison.
You will notice that the clock rate and data transfer rate is quite low. Keep in mind that computer memory is sold in modules which generally have eight chips on each side making a total of 16 chips. These all work together and are connected by a high speed bus, which is why memory chips can be much slower than the memory module and transfer at a slower rate. This statistic is only for a single chip since a graphics card will access all the chips at once, whereas a computer will access the memory module which will then access all the memory chips at once, so looking at the speed of the memory module, it will be around the same speed as a single GDDR chip. Keep in mind that different versions of chips will run at different speeds, so this table only works as a rough guide.
If I now have a look at GDDR3, notice that the clock rate and data transfer rate are much higher. This version of GDDR is where it started changing its design away from that of DDR. This version implemented the ability for the chip to read and write at the same time. In graphics processing you can see that you could be reading a texture map from one location and writing the results of another process somewhere else. This works well in graphics cards since you can divide your data up like that. But, in computing consider you are trying to read a bank account balance and then write an updated amount. It would be very difficult to support reading and writing at the same time. With graphics processing, the problems you are trying to solve are very different. So, you can see why there is a need for this feature in graphics processing, however you probably won’t see that feature for a computer’s main memory.
Following on from this was GDDR4. GDDR4 saw a much lower clock rate; however the prefetch was doubled from four to eight. The prefetch is how much data the memory chip will get ready for transfer when requested. It is like going to a counter and requesting items from the storeroom. The amount of items the staff member can carry is essentially the prefetch. If the staff member can carry twice as much, then twice as much can be transferred per trip. You can see that if you double the amount each chip transfers per request, you don’t need a high clock rate.
Next there was GGDR5 which uses a higher base clock rate, but also adds additional clock rates. For example, a faster write clock was added. The high clock rate runs in sync with the lower clock rate, which effectively means write speeds are increased over read speeds.
The next addition was GDDR5X which adds a quad data rate. This means that four data bits can be transferred per clock cycle rather than two. You can see that GDDR has significantly branched away from DDR, so you cannot directly compare them anymore. At this stage maybe it should not have DDR in the name anymore, but I guess they leave it that way because we are used to it.
Following this was the release of GDDR6. GDDR6 increased the bus size. A larger bus essentially means more data can be transferred at once. Keep in mind that these figures are for single chips. If you have eight chips, you can effectively multiply the transfer rate by eight.
That covers GDDR memory, however there is also another type of memory called High Bandwidth Memory or HBM. This is a competitor to GDDR but, besides some use in AMD graphics cards, has not been used that much in graphics processing.
HBM applies a totally different way of using memory. Rather than having the memory placed on a PCB alongside the GPU and the rest of the components, it is instead stacked with the GPU. Later in the video I will look more into this process. The advantage of this method is that the memory is much closer to the GPU which, in theory, should give faster performance; however, it is harder to manufacture.
HBM2 improved on HBM allowing up to eight dies per stack. Essentially the memory is stacked on top of each other. It is probably getting confusing talking about HBM without knowing how it works, so let’s have a look at how it is different to GDDR.
High Bandwidth Memory (HBM)
HBM is designed so it is connected to the GPU by an interpose which is effectively an alternative to using a PCB or using 3D chips. To understand this better, consider the previous example of an Nvidia graphics card. You see the memory chips are placed on the PCB around the GPU. If I now compare this with a graphics chip that uses HBM, notice at the bottom there are two HBM stacks.
You can see that the memory is essentially part of the same package as the GPU. If I look at this from the side, you can see that essentially the memory is side by side with the GPU. The HBM is connected to the GPU by the interposer. Essentially an interpose is a substrate – a substrate is a rigid flexible material that can be used to connect electronics together. It performs essentially the same job as the board on the graphics card; however, it is designed for use inside a single package. In this example, the GPU and the memory are included inside the same package.
The advantage of this is that the memory is very close to the GPU, meaning that the distance between the two is a lot less and thus there are fewer communication problems. In theory then you should be able to get much faster communication between the two.
Using this method is not without its engineering problems; however, it is a lot easier and cheaper than other methods such as creating 3D chips. 3D chips would require everything to be built in one package which is quite difficult to do. If one thing goes wrong in the chip, the chip potentially won’t work correctly and can’t be sold. Using a method like this, both parts can be manufactured separately and tested. If both work, they can be combined together. This reduces the fail rate in manufacturing.
You will notice as well that the HBM dies can be stacked on top of each other for a total of eight. When they are stacked together like this, the stack is accessed as a group. You can see there are some advantages to this method; however, in real-world graphics processors, only some AMD graphics cards use this method (Nvidia has stuck with GDDR). This method does require the memory to be added during the manufacturing process but, with GDDR, this can be done when it is put on the PCB board. This may be part of the reason why it has not taken off, maybe it also has something to do with cost. I am not able to predict the future, but if I were to take a guess, I would say video memory will either be GDDR, HBM or 3D chips in the future. Only time will tell us which way it will go.
Now that we understand how video memory works, let’s now look at why we need so much.
Why so Much Video Memory?
To understand why so much memory is required, consider that we have a basic 3D model that we want to display in a 3D world. This basic model has 1448 polygons which nowadays would be considered a low polygon model. When some animations are added, the total is about 2 Megabytes of data. Not too much yet.
If I now add a texture map and a normal map, this will add an additional 24 Megabytes. Still not too much, but keep in mind these are not high-resolution textures.
Next, to display an image, we need a buffer. For a 4K image this will take 32 Megabytes of memory. So, to display one model on the screen we are already up to 60 Megabytes. Now consider that you have a lot of models on the screen; remember, everything on the screen will most likely have a texture map associated with it, the ground, the sky and everything in-between. This quickly starts to add up and if you start using high resolution texture maps this will eat memory up even faster. With one Gigabyte of video memory with low-resolution models like this one, you could have around 30 different models in memory at once. If you want higher resolutions, you would only be able to have a couple of 3D models and their textures in memory within the one Gigabyte. If I were for example, to double the size of the texture map used, it would effectively increase the amount of memory required by four.
The graphics engine will attempt to swap out textures and other assets as it needs them, keeping only what it needs in the graphics memory. So firstly, we need to have enough graphics memory to hold all the images on the screen plus what it needs in the immediate future. If we can’t do that, the video card will do high volume of swapping between the computer’s memory and video memory, which essentially causes a thrashing effect between the two.
In this example we have not considered how many other buffers you may need. So, when you start adding Z-buffers and shadow buffers, your available video memory is going to go down even further, and we have not even started talking about physics engines or other effects. Nowadays, if you want to run a 3D application, you’re not going to get too far with one Gigabyte of memory. Two Gigabytes should be a base line minimum, but if you want to get some good results start looking towards four Gigabytes and above. More memory also means the developer can use the extra memory for optimizing the performance of their engine. For example, keeping multiple texture maps of different resolutions depending on whether the person viewing them is close to the object or far away.
If you have ever played a computer game and gone into a new area, you may have seen objects appear or the detail on an object suddenly becomes clearer. This is the 3D engine loading data it needs into the graphics memory. The more memory it has, the more data it can keep in its memory. I will next look at the process a video card uses to get that data into the video memory.
In order to get data into the video card, graphics cards are connected to the computer using lanes. A lane is essentially a full-duplex byte stream. If I consider a PCI Express graphics card, this card is connected to the computer using a series of lanes.
The lowest being one. Following this it will be multiples of two starting with four lanes, eight lanes and 16 lanes. You will find that some graphics cards will not work if the required number of lanes are not met.
With PCI Express, the size of the slot will generally indicate how many lanes that slot supports. Shown here are the different slot sizes for one, four, eight and sixteen. In some cases, the larger slots will support less lanes then their size indicates. For example, a 16 slot may only support eight lanes. If you have multiple PCI Express cards in the same computer, some motherboards reduce the number of lanes. For example, a 16 slot will become an eight-lane slot.
Let’s have a look at how the number of lanes you are using will affect the performance of your computer.
Let’s consider that you are playing a game which is displaying a 3D world. In this example, the graphics card has two lanes. This example is only to explain the concept of lanes, your video card should have more lanes than this.
Now let’s consider that you get teleported to a new area. All the assets for that area have to be transferred to the graphics card using the lanes. The game may make you wait while this occurs or transfer the basic assets first followed by others later on. If the engine does the latter, as the assets are transferred you will see them either start appearing in the 3D world or suddenly become rendered.
Lanes send data in the form of packets. This is the same principal that is used in network cards. To transfer all this data through, the computer waits for a lane to be free and sends the data over it. If multiple lanes are free, it can use all the lanes at once.
Notice that the computer only needs to wait for a lane to be free before it starts sending. It does not need to worry about keeping the lanes in sync. If there are more lanes, it can transfer more data at once.
Think of it like you have a number of delivery drivers. Each driver fills their truck up and does the delivery. When they have finished, they come back and grab the next delivery if anything is waiting. If you want to speed the process up, hire more delivery drivers. If the drivers arrive back and pick up a delivery out of order it does not matter.
You can see that having more lanes helps, but in the real world you only need enough lanes to keep up with the demand. If the graphics card does not have a lot of memory, it may have to remove an asset to load another. If it needs this asset later on, it will need to be transferred over the lanes again. You can see that having a lot of graphics memory can potentially reduce the demand on the lanes required.
If you want to speed up your graphics, you may want to consider purchasing a second video card. Let’s have a look at this.
Combined Cards (SLI/Cross Fire)
The process of combining two or more graphics cards together is called SLI with Nvidia and CrossFire with AMD. Which cards you are using and which motherboard you are using will determine how many can be combined together. Keep in mind that even though you may be able to combine three video cards together, this may be an unsupported configuration, so you many have trouble finding anything that will run correctly on it. Depending on what video card you purchase, combining two video cards together may not be supported.
Adding an extra video card gives a small performance boost. For example, two video cards don’t double the performance, but maybe give you a 30 to 40 percent improvement. If you add a third video card, if supported, this gives you even less performance boost. So personally, I would use the money to buy a newer video card that is faster rather than add a second one. If you do decide to add a second card, make sure they are compatible with each other. It is best to use video cards that are the same; however, if you can’t and one runs slower than the other, the faster video card may slow down to keep pace with the slower video card.
In order to get it all to work, a connector is required to connect the video cards. The video card also is required to support this connector. This connector, in the case of SLI, comes with the motherboard, and CrossFire with the graphics card. The thinking was originally to supply the connector with the motherboard so the manufacturer could make their own decision on how far to space the slots on the motherboard. However, for CrossFire, the motherboard will need to be made to ensure it works with the connectors that are supplied with the graphics card. This generally will not be a problem as motherboards are made to a standard so they will fit in any computer case. So, for now I guess it will keep being done this way, since this is the way it has always been done.
Video Card Connectors
The last topic that I will cover in this video are the connectors at the back of the video card. Each video card may have a completely different set of connectors. It is up to the manufacturer of the video card to decide which connectors that they want to include.
I will cover these connectors in more detail in other videos, but I will give you a quick run down of them. The trend with the newer video cards is to have a number of DisplayPorts. DisplayPort does not require a licensing fee to be paid for them to be used on the video card. Therefore, it does not cost a manufacturer any more to add it other than the costs of the ports themselves.
You will generally also find at least one HDMI port. HDMI and DisplayPort offer very similar features. HDMI was originally designed for home use, and DisplayPort more towards business computers so some of the features of each are slightly different. The core features however are the same and both support high resolutions such as 4K and beyond. The signaling on both is very similar so compatibility between the two is also quite good. If you find you don’t have the port you want, you may be able to use a cable with a DisplayPort connector on one side and a HDMI connector on the other. If the monitor or the video card is able to detect and change the signal, it will work; otherwise you will need to get an adapter.
We are also starting to see a trend of including a USB-C connector. A USB-C connector is different from the others in that it is up to the manufacturer to decide which signal to send over the cable. Most likely, in the case of video cards, it will support DisplayPort and HDMI. If using an Apple product, most likely the USB-C port will also support Thunderbolt.
On older video cards you may also see DVI connectors. DVI connectors are obsolete technology nowadays, so you most likely won’t see these on newer video cards. They don’t support high resolutions like the others, so if you have the choice I would use the other types.
Even older than DVI is VGA. VGA is analog. Starting with DVI the switch was made to digital. It is unlikely that you will need a VGA connection nowadays. If you do, you can purchase a converter to convert one of the other signals to VGA.
That concludes this video on the different components that make up a graphics card. I hope this video has given you a better understanding of how they work and what you can achieve using them. Until the next video from us, I would like to thank you for watching.
“The Official CompTIA A+ Core Study Guide (Exam 220-1001)” Chapter 5 Position 118 – 128
“CompTIA A+ Certification exam guide. Tenth edition” Pages 762 – 764
“White NVIDIA GeForce GTX 1080” https://international.download.nvidia.com/geforce-com/international/pdfs/GeForce_GTX_1080_Whitepaper_FINAL.pdf
“How a GPU Works” https://www.cs.cmu.edu/afs/cs/academic/class/15462-f11/www/lec_slides/lec19.pdf
“Using Tensor Cores for Mixed-Precision Scientific Computing” https://devblogs.nvidia.com/tensor-cores-mixed-precision-scientific-computing/
“GDDR SDRAM” https://en.wikipedia.org/wiki/GDDR_SDRAM
“GDDR4 SDRAM” https://en.wikipedia.org/wiki/GDDR4_SDRAM
“GDDR5 SDRAM” https://en.wikipedia.org/wiki/GDDR5_SDRAM
“GDDR6 SDRAM” https://en.wikipedia.org/wiki/GDDR6_SDRAM
“High Bandwidth Memory” https://en.wikipedia.org/wiki/High_Bandwidth_Memory
“Picture: Block Diagram of the GP104 GPU” https://international.download.nvidia.com/geforce-com/international/pdfs/GeForce_GTX_1080_Whitepaper_FINAL.pdf
“Picture: Texture mapping demonstration animation.gif” https://en.wikipedia.org/wiki/File:Texture_mapping_demonstration_animation.gif
“Picture: Glasses 800 edit” https://en.wikipedia.org/wiki/File:Glasses_800_edit.png
“Picture: Painting statue” https://unsplash.com/photos/cmoGoU49NqE
“Picture: Landscape water color” https://pixabay.com/illustrations/landscape-watercolor-nature-sky-2019939/
“Video: Reflections Real-Time Ray Tracing Demo | Project Spotlight | Unreal Engine” https://www.youtube.com/watch?v=J3ue35ago3Y
“Picture: Ray tracing example” https://images.nvidia.com/geforce-com/international/comparisons/control-ray-tracing/control-ray-tracing-interactive-comparison-001-on-vs-off.html
“Picture: Concept car” https://pixabay.com/photos/car-concept-vehicle-auto-speed-4962244/
“Picture: Water pipes reservoir overflow” https://pixabay.com/photos/water-pipes-reservoir-overflow-2438837/
“Picture: High Bandwidth Memory” https://www.amd.com/system/files/49010-high-bandwidth-memory-hbm-1260×709.jpg
“Picture: GDDR6 HBM” https://www.amd.com/system/files/49010-gddr5-hbm-1260×709.png
“Picture: Radeon Vega” https://www.amd.com/system/files/76826_Radeon_Vega_500x500.png
“Picture: Stainless steel conveyor” https://pixabay.com/photos/industry-metal-stainless-steel-2147405/
“Picture: cat mieze short ” https://pixabay.com/photos/cat-pet-mieze-short-hair-3113513/
“Picture: Tea cup” https://pixabay.com/photos/tea-set-saucer-cup-tea-set-drink-1069145/
“Picture: House” https://pixabay.com/photos/architecture-building-driveway-1867187/
“Picture: Frog” https://pixabay.com/photos/frog-frog-prince-crown-figure-cute-5073324/
“Picture: Elephant” https://pixabay.com/photos/elephant-safari-animal-defence-1421167/
“Picture: Giraffe” https://pixabay.com/photos/africa-african-animal-big-brown-214967/
“Picture: Két Nvidia GTX 1080 SLI-ban” https://commons.wikimedia.org/wiki/File:K%C3%A9t_Nvidia_GTX_1080_SLI-ban.jpg
“Picture: CrossFire usage” https://en.wikipedia.org/wiki/AMD_CrossFire#/media/File:Computer_system_with_3,16Ghz_Core_2_Duo,_6GB_RAM_and_2x_Radeon_HD_4850_in_CrossFire.jpg