Troubleshooting Basics

Troubleshooting Basics


In this video from ITFreeTraining, I will look at some basic troubleshooting techniques. CompTIA has their own troubleshooting model, which I will look at in a later video. For this video, I will just give an introduction to basic troubleshooting techniques.

Download the PDF handout:

Basic IT Troubleshooting
To understand the basics of troubleshooting, let’s start with a joke which explains all the basic principles of IT troubleshooting. Consider that you have a car with a manager, an engineer and an IT technician all in the vehicle. The vehicle travels down a hill and the brakes fail. The vehicle travels at speed off the road and almost falls off a cliff.
Everyone gets out of the vehicle and are thankful that they are still alive after almost falling off the cliff. They all urgently need to get to a meeting in the city on the future of free training, so they need to get the car fixed and get on their way as soon as possible.
The manager says, “We need to have a series of meetings, work out a plan and a strategy. We next need to implement the strategy and then have further meetings to work out if the strategy was effective.”
The engineer says, “That is going to take too long and that never worked before. What needs to happen is that I need to get some tools and get under the car and fix the problem.”
The IT technician just laughs. “You’re both wrong. What we need to do is push the car back up the hill, roll the car back down the hill and see if it happens again.”
You can start to understand the basics of IT troubleshooting. One of the first questions asked is, can you replicate the problem? If you can’t, there is no problem to fix. Now you can understand why so many IT technicians will say, “have you tried turning it off and on again?” If you turn it off and on again and the problem goes away, there is nothing to solve.
Just for legal reasons, if the brakes on your car ever fail, please get them fixed ̶ don’t push the car up the hill and see if it happens again!
Let’s now have a look at what to do when turning the computer off and on again does not work.

Cause/Symptom/Consequence
There are many different ways to troubleshoot computer problems. One may work better in some cases; others may work better in other cases. Generally speaking, when a problem occurs, there is generally something that caused it.
In this example, we will consider that the hard disk is failing. The hard disk is sometimes not reading and writing data correctly. In computers, generally there is only one thing causing the problem and it is a matter of treating it like a logical problem to find out what it is. Find the cause, fix it or replace it and the problem goes away. In some cases, there will be multiple causes, but this is rare. For example, I once had a laser printer that was not working correctly. It had two faulty parts which needed to be replaced for it to start working again. I replaced one thinking this would fix it. When it did not, I removed the part and replaced the other part which again did not fix the problem. It took a while for me to work out that both parts were faulty. This is rare in computers, as generally there is only the one cause.
The next thing to consider is the symptom. In this case, the symptom is the computer keeps crashing, giving a blue screen. The blue screen will give you an indication of how the crash occurred, but this may not always be true. For example, a failing hard disk may not appear as a hard disk problem. For example, corrupt data reads from the hard disk may cause a hardware crash in a completely unrelated system. If you find that switching it on and off again keeps giving different error messages, the cause may have nothing to do with the error message. Random blue screens can be anything from a failing hard disk, failing memory or even a problem with the CPU. If any of these components start acting in a random way, they can cause other hardware and systems to crash, and thus the error messages given by the blue screen could have nothing to do with the problem.
The thing to consider when fixing problems like these is the consequences. A failing computer will result in lost productivity for the worker. Trying to figure out what is causing the problem will take time. Even more time may be required to fix it. For example, to copy the data from a hard disk to another hard disk takes time.
For these reasons, many businesses deploy computers which store all of a user’s data on the network. If the computer fails, it is a simple matter of replacing it with a new one. A good IT technician will have a computer ready to go. When they attend to fix a computer, the clock is essentially ticking. If it starts taking too long, replace the computer. Why? Because the consequences are lost productivity the longer it takes to fix. Therefore, the fastest fix is often to replace the whole computer. Once it is replaced, you can take the old computer away and work out what the problem is without having to hurry. This will keep the user happy, and take the pressure off you.
When fixing IT problems, always think about the consequences this will have to the business; this will often direct you to what should be fixed first and how you should go about it. Often a business cares more about the consequences then the problem itself. For example, replacing the computer will fix the consequence faster than fixing the cause.
It is also important to consider that a particular cause may be a symptom of a much larger problem. For example, if the hard disk keeps failing all the time, it may be a problem with the manufacturing of a particular model. For example, I once worked in a business where the motherboards kept failing. The problem was poor quality capacitors that were used on the motherboard. Once we worked out which model was failing, it was an easy matter to remove those computers from service and get them repaired before they started failing. This is a good example of how you can sometimes proactively fix something, before it starts impacting the business.

Process of Elimination
The next troubleshooting process I will look at is the process of elimination. As the name suggests, it uses the process of elimination to find the cause of a problem. For example, let’s consider a user cannot access a cloud-based application from their desktop computer.
In order for this to work, the desktop computer needs to connect to the internet to access the cloud which contains the application. The user has reported that the application is not working. Let’s consider how we may troubleshoot it, using the process of elimination.
First, let’s make sure that we can access the local network. The problem could be as simple as a network cable being unplugged or a network device failing. Next, I would try and connect to the internet. It could be a simple matter of the internet connection being down which is causing the problem. If this works, next try and access the cloud. For example, maybe accessing the cloud requires the application to have a username and password. Or you may need to check on-line that the cloud services are available.
The last step is to check that the application is available. If the application is available on the cloud, the problem may be something to do with the way the user is trying to access the application, or a problem with their settings.
Looking at this, you may be thinking, could this be done in reverse. Could you check the application first, check the cloud is available, check the internet followed by checking if the local area network is available.
The answer is, yes you can. There is no right or wrong answer. In fact, you could start in the middle by checking the internet first. The point to remember is, this is a process of elimination. You want to figure out what is working and what is not. Different IT technicians will work in a different order and, depending on the problem, your approach may vary. The point to remember is that there is no right or wrong way, just the process of elimination. Work out what works and what does not. This will lead you to what is causing the problem.

Problem Management
In any decent-sized company, there will be some kind of problem management solution in place. The process and procedures will differ, depending on your organization. Essentially the users in your organization will report a problem to your helpdesk.
Depending on your organization and your position, you may be responsible for taking the initial report of the problem either by telephone or electronically. Once the problem has been reported, the organization will generally follow some sort of process to fix the problem. The process will of course differ from organization to organization but will generally involve attempting to fix the problem initially (if it can be fixed) and confirm it was fixed. If not, the technician will attempt to gain more information and escalate the problem. Escalating may involve the problem going to a senior member or to the manufacturer of the software or hardware.
In some cases, the problem may not be able to be fixed on the user’s end. For example, if the user cannot access something on the internet or another site, a change may need to be made on a firewall. This change can potentially affect everyone. A small change on a large network can potentially affect a lot of users, and thus the process needs to be controlled. Many organizations will have a change management process.
Change management will involve, as the name suggests, making a change to something. In the case of a firewall, a rule on the firewall may need to be changed. The change will be reviewed to ensure the change will not cause other problems on the network. Also, a lot of change management will include information about what to do if something goes wrong, and this is called a contingency. In the case of the firewall, if more problems occur, the contingency may be as simple as changing the rule back to what it was. This will fix the additional problems caused by the change but will not fix the original problem reported by the user.
Fixing the problem may take time and if you are waiting for change management to fix the problem it may take even longer. When this occurs, the good IT technician will think outside the box (so to speak) to get around the problem without fixing it. This is commonly referred to as a workaround..

Workaround
A workaround is essentially something that is used to avoid a problem, but not actually fix it. A workaround is generally used when you can’t directly fix the problem; however, by using a workaround, you can get the functionality to work. Maybe the workaround is temporary or permanent depending on the situation.
Let’s look at an example. Consider that there is an application server on the network. A user has been connecting to the application server without a problem. The problem occurs when the user moves to a different office. Now the user needs to connect to the application server through a firewall. The problem is, the firewall is blocking the connection. This was not a problem before, as the user was on the same network.
Changing a rule on the firewall to allow the connection will take some time. Perhaps the person who looks after the firewall is away from work, perhaps it takes some time to get the request authorized. In some cases, it may not even be possible to get the change done. For example, maybe the user is on a network that you don’t have any control of. Access to the application server is critical to the user performing their job function. This is a classic situation where the consequence of a problem is more important than fixing the problem. The sooner you can get the user access to the server the better.
When in business, it is important to consider what effect not having access to something will cause. Telling the user the firewall will be fixed in a week or saying it can’t be fixed at all and walking away is not a good idea. The user will most likely talk to their manager who will shortly call your manager and you will be sent back to fix the problem.
Often with problems like these, workarounds will help. In this example, the user’s proxy server could be configured to a proxy server on the same network as the application server. As the proxy server is on the same network as the application server, the connection to the application server will be allowed. This workaround can be used until the firewall rule is changed.
Using a workaround can also cause problems. In this example, the user’s internet was also redirected to the proxy server and thus their internet speed was slowed down, as it was not accessing the internet directly. In this case, a small inconvenience like this was more than an acceptable trade off to access the application server that was critical to the user performing their duties. Often a user will be happier with a workaround than not having anything at all. Always keep this in mind. At the end of the day, if you have made the user happy, you have done your job properly.
I hope this introductory video on IT troubleshooting has been helpful. In later videos I will look into more detail about troubleshooting. Until those videos, I would like to thank you for watching.

References
“The Official CompTIA A+ Core Study Guide (Exam 220-1001)” Chapter 3 Position 14936-15575
“CompTIA A+ Certification exam guide. Tenth edition” Page 18
“Troubleshooting” https://en.wikipedia.org/wiki/Troubleshooting
“Picture: Cliff near board of water” https://www.pexels.com/photo/cliff-near-body-of-water-1715078/
“Picture: Red BMW Coupe” https://www.pexels.com/photo/red-bmw-coupe-parked-on-road-1396015/
“Picture: Silver IPhone Smiley emote” https://www.pexels.com/photo/silver-iphone-6-987585/
“Picture: Symbol for helping people on the reception vector illustration” https://publicdomainvectors.org/en/free-clipart/Symbol-for-helping-people-on-the-reception-vector-illustration/33703.html
“Picture: User 2 avatar vector image” https://publicdomainvectors.org/en/free-clipart/User-2-avatar-vector-image/13633.html

Credits
Trainer: Austin Mason http://ITFreeTraining.com
Voice Talent: HP Lewis http://hplewis.com
Quality Assurance: Brett Batson http://www.pbb-proofreading.uk