Trying to Work in a Twisted System
I've had quite a day today... seems when it rains it pours, just like the salt. The latest little nugget was a possible interaction with a Java-based monitoring toolset that was written by another group in the Bank. I'm sure it's got a lot of potential to monitor Java-based applications and possibly even system processes and scripts, but we ran into a really nasty little side-effect today with a program I wrote that uses a third-party API message bus.
I've had a lot of problems with this particular message bus API - it will seemingly hang on input of a message. I believe that it's related to the socket communications the API is using, but I can't prove that because the vendor won't let me have access to their protocol. But that's not the biggie.
The biggie was that this new monitoring app was put on the production and UAT machines with my software without any word to us as a "heads up". Things started failing and when I couldn't understand why they weren't working, I saw this monitoring package on the box. I killed the monitor and Shazam! everything worked.
While I think monitoring is good, and automated monitoring is really nice, it's got to be zero impact. You can't have a monitoring package that interferes with the normal operation of the application - and given that you can't know how the application works, you need to be very careful about how you communicate with your clients, etc. This package might not be really good, but in this case, we need to be a lot more careful about implementing it in our environment.
The twist in the system is that the people that put this monitoring package on the machines didn't bother to tell us what they were doing. They're supposed to be part of the Team working on this system, but they acted more like a bunch of little kids about this issue. They wanted to be able to do something "on their own" and so rather than talk it out and so it safely, they decide to go "joy riding" with the UAT and production systems. If they had done their homework and really checked things out, then this would be fine with me. But they didn't, so it's just another example of their unwillingness to really follow-through and do a good job.