Just like most IT professionals, I get computer troubleshooting questions all the time from customers, friends, and family. A few are, um, well, memorable. For example, the one about email a while ago. The conversation started out something like this:
Friend: My email doesn’t work.
Greg: (Trying to be helpful) OK, what email program do you use?
Friend: Huh?
Greg: Well, you run a program on your computer to get to your email, right?
Friend: No, I just click on “email”. But now it doesn’t work. What’s wrong with it?
I don’t think we ever solved that problem. And most IT people reading this, after they finish laughing at an all too familiar story, know why. I didn’t have enough information to begin solving the problem, and my friend was unable or unwilling to provide it.
All IT people read articles with advice about communicating with “normal” people. The articles usually scold us for speaking a language most people don’t understand. Fair enough and guilty as charged. But we have our “IT words” for a good reason, as do all other professions. I’m not sure why we get picked on so mercilessly. For you finance people – why is it OK to say “EBIT-DA”, but not OK for IT people to say, “DHCP server”?
This blog entry is a little different. I’m an IT guy and I’m asking so-called ”normal” people who do not speak IT as a natural language to stretch just a little bit. If you can say non IT words like “EBIT-DA”, you can say some IT words too. It won’t hurt, I promise.
Meet us in the middle for your own benefit. We IT people are pretty good at solving problems – that’s why we’re IT people – but we need more than “it doesn’t work”. If you want your problem solved, we need more from you. I’ve learned at the feet of some of the best in the business, and what follows are some great troubleshooting tips.
First, before solving the problem, we have to identify it. We call this characterizing the problem. The process is part science, part art form.
Here are some things you can give me to help you get back up and running again:
What exactly happens when it breaks? What do you do and how does the computer respond? Give me a sequence of events leading up to the problem. Give me exact error messages, codes, and pictures of screen shots if possible. Details are important because at least one of those details may be a significant clue.
Has the system ever worked as expected or has it always been broken? If it worked earlier and is broken now, when did it break? What changed between when it worked earlier and now when it’s broken?
“Nothing changed” is always the wrong answer. If nothing changed, then the system would still behave the same as it did earlier. My friend, Bruce had a cell phone email problem a while ago. He insisted nothing chanaged and his email just stopped working for no reason. We talked about it and ended up removing and adding the email account to his smartphone. Email behaved properly after that, and then Bruce said, “Oh yeah – a big update for my phone came out a few days ago and my email broke right after that!” My other friend, Bob was also in the room, and Bob said, “wow – that’s probably why my cell phone email stopped working too!”
That’s the power of characterizing the problem – sometimes it helps solve multiple problems.
If the system worked before and is broken now, something broke it. That something may be subtle and difficult to find, and that’s why details are important. So think back to everything that happened with your broken system around the time the problem started. Put together a detailed sequence of events. Write it all down if this helps. If I had known about that cell phone software update with Bruce and Bob, we could have saved time and jumped immediately to the solution.
Is the problem reproducible at will, or does it only happen sometimes? If reproducible at will, what are the steps to reproduce it? And if only sometimes, what is different about when it works versus when it breaks? One time, I had a Dell laptop that sometimes refused to connect to the office wireless network. After hours of trial and error, we finally found a pattern – the problem happened when the laptop was running on battery power, but not on AC power. This turned out to be a (questionable) feature and not a bug – somebody at Dell thought it was a good idea to conserve power by turning off the wireless adapter by default when running on battery power. The cure – press a function key to turn it on.
The solutions to many problems seem obvious, but generally only after going through the exercise to find them.
Perhaps most important – compare and contrast how the system should behave versus how it actually behaves. It’s your job to explain this clearly and in detail to an expert who cannot be as familiar with the history of the problem as you.
Answer these and similar questions and now we have a well defined problem.
Next comes finding a solution. The process is also part science, part art form. For the science part, we form a possible solution based on the problem definition, come up with a way to test it, then evaluate the results. The process is usually iterative, sometimes tedious, and always slower than anyone wants. For the art part, sometimes inspiration strikes and sometimes it’s right. Check out this article for a great example of a troubleshooting scenario. And watch this space for more articles about interesting troubleshooting scenarios as they come up.
(Originally posted April 4, 2013 on my old Infrasupport website.)
Recent Comments