TCP and UDP
TCP stands for Transmission Control Protocol. The protocol guarantees a connection and delivery of packets over the connection. You will often hear this protocol referred to as connection-oriented. For the exam, know that TCP is connection-oriented and that means guaranteed connection and delivery. You may get a question on it.
The other method of transmitting data is UDP, or User Datagram Protocol. This protocol is “fire and forget.” Thus, it is often referred to as connectionless. Make sure you know the difference between the two.
I will take a deep dive into the protocols to help you better understand how they work. For the A+ exam, you only need an understanding of the basic concepts. When I have finished the deep dive, I will give you a summary of the major points. Let’s have a closer look at these protocols.
TCP 3-Way Handshake
TCP uses a 3-Way handshake, which essentially is a three-step process used to create a connection. To understand how it works, consider that you have a client computer. The client computer wants to make a connection to a server.
In order to do this, the client sends a SYN message to the server, SYN being short for synchronize. The SYN message configures certain values, for example, a random sequence number. More on that later in the video.
The server sends a SYN-ACK message back. This message acknowledges that the SYN message was received and the server sends its own SYN which will be a random sequence number. More on that later in the video, I promise.
The last message is an ACK message. This message is from the client to tell the server that the configuration and sequence numbers have been agreed upon. Essentially, a TCP 3-way handshake has been done so both sides can negotiate parameters before data transmission. Once the final acknowledgment is sent, both sides are now ready to send data.
Now that we understand what the 3-way handshake is, let’s have a look at the sequence number.
Sequence Number
A sequence number is a number representing the number of bytes sent in the data stream. The data is sent in segments, which can vary in size. The sequence number is used to acknowledge to the sender how much data the receiver has received.
The first sequence number is randomized, and I will go into more detail shortly about why this occurs. To make things easy to understand, it is often better to work with a relative sequence number. The relative sequence number is a count of the actual bytes subtracted from the random start sequence number. I will go into more detail shortly, so don’t worry if this sounds confusing at first.
The reason that sequence numbers are used is that they allow for the number of bytes received to be acknowledged. This is how TCP achieves reliable communication. Each data segment, when received, needs to have an acknowledgment message sent back. Updates in the TCP protocol allow multiple segments to be acknowledged at once. More on that later in the video.
If a segment is lost during transmission, having a sequence number allows for the missing data segment to be retransmitted. Now that we understand sequence numbers, the next question is why is it initially randomized?
Random Initial Sequence Number
The initial sequence number is random to help prevent impersonation and denial of service attacks. If an attacker is not able to predict what the sequence number is, it makes it harder for them to affect the traffic traveling over that data stream.
To understand how making a connection works, consider that you have a client attempting to make a connection with a server. The client will send the SYN message to the server. The SYN message will contain a random sequence number.
The server responds with a SYN-ACK message. I will cover the SYN part in a moment. The acknowledgment message will add the size of the data it received. That is, the size of the payload received. In this case, the client SYN message contains no payload making the size of the data zero. In order for the server to confirm the SYN message was received, it increments the sequence number by 1, sometimes referred to as a “phantom byte”, to confirm the SYN message was received. This increment lets the client verify that its initial sequence number is acknowledged, a crucial step in finalizing the connection setup.
The second piece of information is the server’s random sequence number. Since the connection transfers data in both directions and each direction uses a different random sequence number, it needs to send its random sequence number to the client.
The last message is an acknowledgment message. This acknowledgment message is the client acknowledging that it has received the random sequence number from the server. Once again, the sequence number is incremented even though no data has been sent.
To get a better understanding of how it works, I will open up Wireshark. Wireshark is freely available software used to capture network traffic. I have created a connection from one computer to another using TCP. Using Wireshark, I can have a look at the data traveling to one of the computers.
Looking at the traffic, we can see the first SYN message. You can see that Wireshark decodes the frame and gives you additional information; in this example, it tells you it is a SYN message. In the bottom left-hand corner, notice that you can view more details about the frame. To see the details about TCP, I will expand the section “Transmission Control Protocol”.
Wireshark takes the raw data you can see on the right side of the screen and presents it to us in a way that is easy for us to read. Notice the section “Sequence Number”. This is the random sequence number that I spoke about previously.
Notice above this is the section “Sequence Number”. In the brackets to the right is “relative sequence number”. When a value is in brackets like this, it means that Wireshark has taken existing data and performed processing on it. In this case, the relative sequence number is the initial random sequence number subtracted by the current sequence number. Don’t worry if you don’t quite understand this yet; I will look at it again later in the video.
Notice the section “Next Sequence Number”. This is a field generated by Wireshark. When you select a section on the left-hand side, the data associated with that section will be highlighted on the right. You will notice that when this section is selected, nothing is highlighted on the right. This is because Wireshark has generated the data in this section rather than displaying existing data or performing just minor processing on existing data.
Since the next sequence number is one, we can expect that if one byte was sent, this sequence number will change to one. Keep in mind the text on the right “Relative sequence number”. This sequence number is relative, meaning it is calculated by subtracting the initial random sequence number from the current sequence number.
Below this is the acknowledgement number. This keeps track of the other party. Keep in mind that the other party has not sent its random initial sequence number, so this value is currently unknown; this is why it is set to zero.
When a section on the left is selected, on the right-hand side, the part of the frame containing that data will be highlighted. Wireshark does an excellent job of taking the frame data and presenting it in a way that is easy for the user to understand. If you see nothing highlighted on the right-hand side, you know the section on the left was generated.
I will now select the next frame down. You will notice that this frame is a SYN-ACK message. Like before, it has a sequence number. You can see the raw sequence number is a random number. This random number will be the initial sequence number for the server-side communication. The client side has already sent its random sequence number; now the server will send its random sequence number.
You will notice the relative sequence number is zero. Since this is the first communication from the server side, it will be zero. However, notice that the acknowledgment number is one. If this is not making sense, I will have another look at this shortly.
The next packet down is the ACK packet. This message is from the client to server, responding back to its message accepting its random sequential number. You will notice that the acknowledgment number has increased to one. This is increased because it has sent one message to the server. Looking at Wireshark, it is a little difficult to follow what is going on, so let’s look at it another way.
TCP Example
In this example, I will consider that Bob is attempting to transfer data to Alice. In order to do this, he wants to use reliable communication and thus will use TCP, which is a connection-oriented protocol.
To make the connection, Bob sends a SYN message to Alice. This SYN message contains a random sequence number. In this example, I have used 5000 as the random number to make it easier to understand.
Since no traffic has been sent over the connection as yet, Bob’s relative sequence number is zero. The SYN message also contains other configuration information. Bob wants to know if Alice has received the SYN message, so the next step is for Alice to send a message back.
Since the connection is bi-directional, that is, data is sent in both directions, Alice will also send a random sequence number. Since Alice has not sent any data as yet, Alice’s relative sequence number is zero.
Alice wants to let Bob know that she has received the SYN message, so Alice will update her acknowledgment number. The acknowledgment number will be Bob’s initial random sequence number plus one. The sequence number increases according to how many bytes are transferred. In a moment, I will look at this process in more detail. In this case, there is no data being transferred, but we have to increment the acknowledgment number to indicate the message was received.
Since the sequence number has increased, the relative acknowledgment number will also increase by one. Alice has acknowledged Bob’s sequence number, but now Bob needs to acknowledge hers. In order to do this, Bob sends an ACK message.
Bob’s ACK message will contain Bob’s current sequence number. Since one message has been received, it will be the random sequence number plus one. This means Bob’s relative sequence number will be one.
Bob needs to acknowledge that he has received Alice’s sequence number. In order to do this, Bob will increment the acknowledgment number by one. This makes Alice’s relative acknowledgment number one.
So far, a 3-way handshake has been completed. The 3-way handshake creates the connection that we will use to transfer data in both directions. Until this 3-way handshake is completed, no data can be sent. Now let’s have a look at how data is sent over the connection.
TCP Example
Let’s consider that Bob wants to send Alice 200 bytes of data. A packet with the data is sent over the network. The sequence number so far will be the same as before. That is, it will be our random initial value of 5000 plus one for the initial SYN message, giving us a sequence number of 5001 and a relative sequence number of one.
The acknowledgement number will be 6001. 6000 was the original random sequence number created by Alice. One was added because of the SYN message, giving us an acknowledgment number of 6001 and a relative acknowledgment number of one.
Alice now needs to send a message back that she received the data. Since Alice has not yet sent any data, her sequence number will be 6001. If it makes it easier for you, think of it as one side’s sequence number pairing up with the other side’s acknowledgment number.
Alice needs to acknowledge the data has been received from Bob, so Alice will add the size of the data received to the acknowledgment number. This will give us a value of 5201 or a relative value of 201.
In its simplest form, an acknowledgment needs to be sent after every packet containing data is received. With improvements to the TCP protocol, this is no longer the case with delayed acknowledgment. I won’t go into details about it, but just understand that multiple data packets can be acknowledged in one message rather than every packet being acknowledged. So if you are looking at Wireshark and wondering why you have a missing acknowledgment message, don’t worry about it; just look at the acknowledgment number.
In this example, Alice will send 100 bytes of data to Bob. Even though Alice is sending data, the sequence number will remain the same. The sequence number gets updated after the data is sent.
Since no data has been received since Alice acknowledged the last data, the acknowledgment number remains the same.
Bob now needs to acknowledge that he has received the data. To do this, Bob will send a message to Alice. Notice that this message has 50 bytes of data in it. The TCP protocol allows you to send data at the same time as sending acknowledgments.
Bob has sent 200 bytes previously. Thus, his sequence number will have 200 added to it. Since the additional 50 bytes has not been acknowledged as yet, the sequence number will only include the first 200 bytes. Since Bob has received 100 bytes, the acknowledgment number will be increased by 100.
Alice needs to acknowledge that she has received 50 more bytes from Bob, so she will send a message back. Alice has sent 100 bytes since her last message, so her sequence number will increase by 100.
Alice has received an additional 50 bytes from Bob, so Alice will add 50 to her acknowledgment number. You can see how TCP keeps track of how much data it receives in both directions. If the sender of the data does not receive an acknowledgment packet in a certain time period, the sender will retransmit the packet. However, there are times when the sender may request a packet to be resent.
TCP Example
Let’s consider that Bob sends a 50-byte packet to Alice. However, during transit, the packet is damaged. When Alice receives the packet, she detects that it has been damaged and sends a negative acknowledgement message.
Once Bob receives the negative acknowledgment, Bob sends the data again. This time, Alice receives that data and, like last time, Alice sends an acknowledgment message back to Bob.
You can see that using this system, TCP has a way of recording when data is received and is able to request re-sends when data is not received. There has been a lot covered so far, so let’s have a look at what you need to know for the exam.
Exam Tip
I doubt that you will get a difficult question on TCP, if a question at all. The course material covers the material I have gone through, which is the reason I have presented it. For the exam, I would know that TCP is a connection-based protocol that provides reliable communication. That is, data is guaranteed to get to the destination as lost data is re-sent.
The price of reliable data is additional overhead for transmission. This means a larger header and more processing. Just be aware that TCP adds overhead, but that is the cost you pay for reliable transmission.
Now let’s look at the next protocol which provides less overhead, but does not provide reliable communication.
User Datagram Protocol (UDP)
User Datagram Protocol, or UDP, is connectionless, unlike TCP, which requires a connection. Thus, UDP does not require a 3-way handshake like TCP does. UDP is not reliable communication and thus works like the mail system does. When you send regular mail, you never know if your item will arrive at its destination. It is best-effort delivery, and most of the time it will get there, but there are no guarantees.
Since it is connectionless, it has less configuration information that needs to be recorded. Thus, it has smaller overhead. Smaller overhead means less data is needed for each packet that is sent. With small packets, this becomes significant; with larger packets, this is not so significant.
UDP does not have flow control. This means that it can’t slow down traffic if it is going too fast. In TCP, if the receiving end can’t keep up, it can request the sending side to slow down. If the sender is transmitting too fast, this can result in lost packets. In the case of UDP, since the protocol does not have flow control, it is up to the application to slow down packet transfer if it gets too fast; otherwise, packet loss will occur.
UDP does not guarantee delivery of packets. Thus, if a packet is lost, it is not resent. It also does not guarantee the data will be delivered in order. TCP, in contrast, will re-send lost packets, and if data arrives out of order, will re-order.
UDP’s biggest advantage is that it is good for time-sensitive data. That is, data that is worthless or worth nothing very quickly. Data being transferred using TCP may get delayed waiting for packets to be re-transmitted. In the case of some applications, you may not want to wait for re-transmission since the data won’t be worth anything when it arrives.
For example, if you have a voice over IP application, if data is lost, you may end up getting a pause in the audio waiting for a re-transmit. This will mean one party will now effectively be delayed, which can make having a conversation difficult. Some applications, when this occurs, will speed up one party so they can catch up, which may make their audio sound unnatural.
With UDP, voice over IP will simply have some breaks in the audio where the data is missing. Hopefully, it is not too noticeable as an alternative to speeding up the audio afterward to catch up.
UDP is also used for real-time video. For example, if you are watching a sports game, you don’t want the feed delayed while you are waiting for missing data to be re-transmitted. You want to see the action as soon as it happens.
I will now have a look at Wireshark at some UDP traffic that I captured. In this example, I have captured traffic using Trivial File Transfer Protocol or TFTP. TFTP is used to transfer files just like FTP is. The difference is that TFTP uses UDP and does not have as many features as FTP. It is designed to be implemented in devices where you want to use minimal code size and processing in order to transfer files. For example, in network terminals to download an operating system.
You will notice that the first UDP message is a write request. Since TFTP is a well-known protocol, Wireshark is able to decode the information in the packet and present it to us in a human-friendly way.
Different applications will use UDP differently, and this is just an example of how it may be used. In the packet window, you will notice that “User Datagram Protocol” is listed rather than “Transmission Control Protocol”.
When I expand the UDP section, notice that there is not much data listed. To get an idea how much smaller it is compared to TCP, notice that when I compare this with TCP. In order to see all the TCP fields, I need to scroll the window where the UDP window easily fits inside the window. TCP has a much larger header than UDP does.
Since TFTP is a well-known protocol, notice that Wireshark will have a TFTP section. This section will decode the data and present it in a human-friendly way. When using Wireshark, it often helps when processing the data to have a look for sections like these.
Notice that the second packet captured is an acknowledgment packet. Since TFTP is a file transfer protocol, it needs to confirm that the file has been transferred correctly. Since UDP does not offer reliable communication, TFTP needs to implement it itself. Thus, TFTP will send its own acknowledgment packets so the other side knows the data was received.
So essentially, when using UDP, the developer needs to implement functions like acknowledgments, flow control, and any other features that they require. These functions are all natively part of TCP. So the question is, why not use TCP in this case? The answer is that when TFTP was traditionally used, the developer was trying to keep the code as small as possible, as memory on chips in the old days was expensive. Not implementing the full TCP code and only implementing UDP was a lot smaller. The overall result was a smaller code size. TCP and UDP both have their advantages and disadvantages, thus you will find some applications will use one or the other, and in some cases, some applications will use both.
Notice that the next packet down is a data packet. This packet is labelled as block 1. Below this is the acknowledgment that block 1 has been received. The next packet down is the data packet for block 2 and the acknowledgment for block 2. The process continues for all the blocks that are transferred.
The main takeaway from this is that UDP uses a smaller header and has fewer features. If the application wants additional features, it will need to implement them itself. In some cases, it may be easier to use TCP; in other cases, UDP, since the application is also free to implement the features as it sees fit. For example, the application may number its own packets and thus does not need TCP to ensure they arrive in order but does not require lost packets to be re-transmitted. The protocol chosen will be determined by what you are trying to achieve.
Let’s look at some examples of when each protocol is used.
Protocol Examples
TCP is used for HTTP and HTTPS. This protocol is used to retrieve text and graphics for web pages. When you retrieve data like this, you need to ensure that it has not been corrupted in transmission. Another example is Secure Shell. Secure Shell is used to create a secure terminal to another computer so you can run commands. You want to, in this case, ensure that data is not lost in transmission.
UDP is used by Dynamic Host Configuration Protocol or DHCP. In this case, UDP is the only choice since the initial configuration of a node does not have an IP address. The only choice in this case is to use broadcasts, which need UDP. In the case of DHCP, if it does not receive a response to the broadcast, it simply sends the broadcast again. Eventually, if no response is received, the node will time out and stop sending broadcasts.
Another example of UDP is Simple Network Management Protocol or SNMP. SNMP is a protocol used for sending and collecting information about managed devices on the network. For example, information from a device like CPU and memory usage. Information like this is only useful at the time it was sent. If you don’t receive a CPU usage message, this message becomes obsolete when the next message arrives; thus, there is no need to ask for it to be re-transmitted.
End Screen
That concludes this video on the TCP and UDP protocols. I hope you have found this video useful, and I look forward to seeing you in the next video. Thanks for watching.
References
“The Official CompTIA A+ Core Study Guide (Exam 220-1101)” pages 184 to 186
“Mike Myers All in One A+ Certification Exam Guide 220-1101 & 220-1102” pages 783 to 784
Credits
Trainer: Austin Mason http://ITFreeTraining.com
Voice Talent: HP Lewis http://hplewis.com
Quality Assurance: Brett Batson http://www.pbb-proofreading.uk