Someone once told me that to make an engine run you need 3 things: Air, Fuel and Spark. I love the simplification, and it really helped to narrow down what wasn’t working. In reality though, there are many more things that could go wrong. Network routing troubleshooting might be broken down into similar steps too:
- Firewall Rules
- Application Settings
After a brief (ahem) review of ping and traceroute, we are going to look a bit deeper and see some other network routing troubleshooting tools and techniques that help determine what might be wrong with a network. This is by no means an exhaustive list, but these tools are useful anyway.
We are going to focus mostly on network routing troubleshooting tools in this article, and ignore firewall rules and application settings. First, we’ll cover some of the basics with ping and traceroute and routing.
PING – HUMBLE, YET MIGHTY
ICMP is spelled out in greater detail in http://tools.ietf.org/html/rfc792 if you choose to force it to give you it’s secrets. For a more pragmatic “how to” approach, keep reading….
What Ping Does and Does Not Do
Ping sends a ping (Echo Request type 8) and reports if the echo succeeded or failed (Echo Reply type 0). Ping does not tell you why something failed beyond a few simple messages (listed below), or if it failed milliseconds after your test or between pings. It only tells you if the ping you just sent responded, and if it didn’t you might get one of various error message/clues (derived from Type 3 messages from http://www.iana.org/assignments/icmp-parameters/icmp-parameters.xhtml):
- Network Unreachable
- From Gateway: Type 3, Code 0. Destination Network unreachable is usually when a gateway doesn’t know the route to a network. If you add a route to the upstream router that does know, you might fix this error. Check your routing tables.
- Host Unreachable
- From Gateway: Type 3, Code 1. Make sure the host is actually on and responding to pings. The router/gateway sending this response believes it is directly connected to the host, but the host isn’t responding.
- Request Timed Out
- type 11, Ping didn’t get a response within the default response time, but at least there was a route that matched. If it’s not blocked by a firewall, the device might be turned off.
- Time Exceeded
- One of the important parts of a ping response is the TTL. It gets decremented by 1 for every route it hits, and if it reaches 0, the router will respond with “Time Exceeded”. This might be seen if there is a routing loop.
There are many other ping responses, but the 3 above will be the most likely ones you will see. Pings are also rate limited on many devices. Cisco IOS for example, will only respond to 1 ping per 500ms by default, certain VOIP phones may only respond every 10ms.
You may also run into strange responses where every other ping fails. Sometimes this is caused by routing issues, or by a cluster repeatedly failing over. Ping can also simply be blocked on certain networks.
Most modern security devices have some type of rate limiting or flood protection, but ping can also be used to Denial of Service a network.
The important thing to realize is that ping is not 100% fool proof, but can tell you things quickly sometimes about how a network is functioning..
WAYS TO USE PING:
Test the TCP/IP stack using Ping
Simply pinging 127.0.0.1 will tell if you the TCP/IP stack is properly loaded and installed. The loopback address 127.0.0.1 will respond if the machine has networking installed.
Use Ping to Test NIC Drivers
If you ping your local ip address and get a response then you know the drivers and NIC are setup properly at least.
Using PING for Basic Network Test
if you can ping the gateway, you can stop troubleshooting the host you are currently sitting on and look elsewhere (except maybe a route).
Test Routing with PING
If you can ping the remote host, and it returns a response then routing is good, with the exception of rate limiting. For example, IOS defaults to one ping response every 500ms. If you try to send too many pings at once you’ll get a pattern of success/failure pings, which is misleading. On Cisco it might look like: U.U.U. (Unreachable, Not Received) In general though, if you can ping the remote host, stop troubleshooting routing.
Send a “bell” on failure
Maybe you are tracing cables and don’t have a cable tracer, or want to know by a small beep when the ping starts failing…. Rig it with a ctrl G – Macguyver Style:
C:/>for /L %i in (0,0,0) do @amp;(ping -n 1 22.214.171.124 || echo ^G) & ping -n 2 127.0.0.1 > NUL
Use Ping to Test Best/Highest MTU
This PING CLI kungfu will send larger and larger byte sized ping chunks. Notice at as they get bigger the response time goes down. Some networks will simply fail if the MTU exceeds a certain size, others will perform poorly:
C:\>for /L %i in (1000,1000,66000) do @ping -l %i -n 1 126.96.36.199 | find "Reply" Reply from 188.8.131.52: bytes=1000 time=45ms TTL=56 Reply from 184.108.40.206: bytes=2000 time=56ms TTL=56 Reply from 220.127.116.11: bytes=3000 time=66ms TTL=56 ..snip.. Reply from 18.104.22.168: bytes=12000 time=96ms TTL=56 ..snip.. Reply from 22.214.171.124: bytes=21000 time=131ms TTL=56 ..snip.. Reply from 126.96.36.199: bytes=52000 time=199ms TTL=56
Now that you know a bit more about ping and icmp messages, we’ll review traceroute. Traceroute works similar to ping in that it uses the ICMP protocol, but it sends UDP messages (by default) to multiple routers in the path from source to destination. Traceroute can also use ICMP and TCP SYN packets if you choose, and can send them to any port. The TTL is incremented until it exceeds the OS default for max TTL and used to determine the path a packet takes.
Routing loops happen when there is no clear path to the destination and instead the packet goes between routers until it’s TTL exceeds the OS default for max TTL. At this point an ICMP Time Exceeded message is sent back and TTL is set to 0.
Trace From Source and Destination
Tracing routes from each host back and forth can confirm if routes are as you expect. Most firewalls expect traffic to leave and come back on the same interface by default and will drop with a spoof error unless configured to accept return traffic on a different interface. Performing two separate traces (1 from each end) will confirm this (packet captures do too, but this is simpler).
However, traceroute is also sometimes blocked is not a reliable method to test a network, unless you know for sure that you’ll get traceroute messages from a previous baseline network test. Sometimes you’ll see drops to the first few hops, and then good responses beyond the “bad” ones. This just means that a few of the routers along the way didn’t want to respond to any type of icmp message that you sent, and is not an error.
Traceroute also only shows 3 attempts per hop. If you need to see an intermittent network fault, traceroute usually will not show it to you.
To keep the bloat down on this article I’ll only share two commands:
- How to Display Routing Tables
- How to Show Which Interface a Route will take (Linux Only)
Displaying routing Tables
Showing the routing table is easy with various commands per operating system. Routes in the table will attempt to use the smaller metric first. Metrics can be specified on the interface, or by the type of route (static specific > network route). Both Linux and Microsloth use the same command:
The Windows version of netstat -rn is messy and shows IPv4 and IPv6 combined by default. This is too much info for me and I prefer the route print -4 version for windows because it’s easier to read without the extra IPv6 junk.
However, these routing tables are a bit difficult to interpret without some explanation. Basically they say, if it matches this route, use this interface. If you have hundreds of routes on a firewall, you might not see what you are looking for easily, especially when troubleshooting at 2am during the change window…
You can manually grep/find your way through a routing table, but that’s a mess and on larger, more complicated routing tables you are bound to miss something.
Just make sure you have a default route (happen to notice anything about my routing table?) The 0.0.0.0 is the default route, and if no other routes match, the data will travel through that interface. My example works, but it’s actually not a clean routing table and has an error. Can you spot the routing table error?
Finding Routes The Easy Way
Here is the command on linux that shows you which interface a route will likely take:
Show Specific Route on Windows
Windows has really poor route troubleshooting tools. If you were looking for an EXACT match route, you might type something specific like this:
route print -4 10.77*
But that won’t tell you where other 10.0.0.0/8 stuff goes, only the 10.77* stuff and only if there is an entry in the routing table specifically for it. So it’s useless in my opinion because if you are troubleshooting routes you want to know which interface/route/VLAN a packet is going to take. The route print command only tells us if there is a route already in the table that exactly matches, nothing else. Linux and Cisco have much better tools that show you which interface some route will take, and you can type any IP, even though it’s not in the routing table specifically:
ip route show match 188.8.131.52 default via 184.108.40.206 dev eth0
You can ignore everything after the very first line. The first line tells us that if we try to go to 220.127.116.11 it will send it to 18.104.22.168 on eth0.
Some firewalls have 10 or more interfaces and thousands of routes. With this simple command you can verify if your route is correct for a given IP. I wish I knew of a similar command for windows!
NETWORK TRACEROUTE OVER A PERIOD OF TIME:
Often the network will work sporadically, and these types of network errors are the hardest to troubleshoot. To help monitor multiple routers along a path over an extended period there are several tools that work just like ping and traceroute – but keep track of many attempts. ICMP messages are sometimes weighted or queued lower than normal traffic – so you cannot rely on ICMP messages to give you 100% accurate information, but tools like these will often help pinpoint an error. Here are three that take traceroute to the next level:
MTR for Linux
mtr is a linux based utility that sends multiple traceroutes for as long as you specify, and records which router is having problems. TTL, packet loss, and latency can all be recorded this way for an extended time.
PathPing for Windows
Pathping is a windows based utility that works very similar to mtr, but with an interface that is not quite a friendly. It can send various amounts of traceroutes and keep statistics on network activity. In this example, you can see a few packets were lost. TCP/IP is designed to handle loss like this and this could be considered normal. Any loss is a clue that something isn’t quite right, but unless you have a loss higher than several % you will probably never notice it on a TCP/IP network.
Ping Plotter Freeware (Windows)
By understanding icmp messages and common tools such as ping, traceroute, mtr, or pathping you will be able to tell more details about network routing troubleshooting issues you might encounter.
James Fraze is an IT Security Consultant with 20+ years in IT who contracts through Romack Inc and also writes IT related articles. James can be reached at http://digitalcrunch.com/contact.