Created a Simple Perl Monitoring Chat Bot

SwissJupiter.jpg

The Shop is a heavy user of chat - be it IRC-based (as it was for so many years) or the 'secure' chat that we now use, it's chat. It's very useful as a conduit to the users for all kinds of information - especially the monitoring of servers and processes. This morning I did a little 5 min job to convert back one of the server monitoring bots from the 'secure' chat back to IRC chat as we have one of the latter, and the global messaging group which controls the former is still not allowing bots back on their network after a particularly bad server meltdown caused by a few bad bots.

I wanted to move this one guy back to straight IRC because it's a very nice example of a perl-based bot that can monitor all kinds of interesting things. I then spent a few hours crafting a monitoring bot out of this starting point to replace the java-based monitoring tools that had caused me so much trouble yesterday. I asked the guy that created them to turn them off - save one development box (his choice), until such time as we can be assured that there's no conflict in the communications.

His response was that these monitors are providing vital data on the status of the ticks flowing from my injectors into the system(s). I told him I totally understood his position, and if he'd just let me know what the processes were monitoring, I'd be glad to give him that same functionality, quickly, in a less intrusive monitoring framework.

I didn't hear from him, but I started one anyway with the likely candidates. I also talked to the head support guy and he had a simple little monitor as well. I included his test in with my code and started banging on it.

The first cut was nice, but I also wanted to add a little additional chatting when a stalled log restarted so that the users monitoring the chat channel would know that the problem corrected itself and they didn't have to worry about restarting anything. In this particular system, it's common for an updating process to get stalled and then restart without any intervention. I wanted to make sure that this was being passed on to the operators so they didn't worry about a problem that's corrected itself.

I sent out an email explaining it, and how to stop/start/restart it along with where it chats, and what it chats about. Basic information. I haven't heard from the group that put the other monitoring tool, but I'm guessing they are not going to be happy about what I've done. Not in the least. I hope I'm wrong, but in the past there has been more than a little animosity between my group here, and the group that did the other tool. Sad, but there's nothing that would have kept them from writing the same thing. Nothing at all.