Catching a Networking Problem
I've been having a lot of problems lately with my Sun box at work. It's getting a lot of NFS errors, and on Solaris, that means the box locks up. If I unplugged the network, waited a minute, and plugged it back in, it would come back for a while. But it'd go away again. The Unix guys tried a new NIC in the box - no good... they tried different cables and ports on the switch... no good, finally we ordered a new box thinking that this was a bug we weren't going to find. Thankfully, I was wrong.
Yesterday, I looked at another box under my desk and noticed that it was on the same subnet as the troubled box. I decided to swap the network cables completely eliminating the network from the picture. Lo and behold... today my linux box (the one that I swapped the cable with) died. Yup... no pings. Never happened before in all the years it's been running for me. But it did today. So I called the network guys and told them of the problem. They found problems in the switch - maybe they weren't there before, don't know. But they found problems and they moved me to another port on the switch. Service was restored.
This was a super simple test that I should have looked into many months ago. There was no need to order the new box, but we'll keep it on hand as a spare, just in case. This test was able to completely isolate the network problem with a working machine and that's all it took. Now we know.
Whew!