Sometimes Bigger isn’t Necessarily Better

chat.jpg

Again today we're having problems with a key infrastructural communication system. Whose fault it is is really unimportant... we're not changing this component out anytime soon - it's a key component in a company that literally spans the globe. It's here.

That being said, it is interesting that today I get a mail from them saying it's going to be running in a restricted mode for the next week. This places a non-trivial squeeze on most of my applications. I use this in every app to communicate with it - control it... monitor it... check health... lots of stuff. So to be without it for another week is non-trivial and more than a little annoying.

Sa as I was saying, they are asking us if we can migrate to a less-secure version that we used to all be on about a year or so ago. It was a multi-year effort to get everyone migrated to the new secure product, but they want to know if we can move back in a few days. What? Oh... yeah, that's right... when you can't fix your own stuff, the best ploy is to ask the users to change and hope that they can change fast enough not to get sick of the outage and decide to move off your product for good. I mean, can these guys be sincere about this?

For me, it's less than a dozen applications, and in some cases the changes are simple configuration changes and some are more structural. But if we change to the old system it's going to mean a ton of logistics... new login credentials for each of the 120 connections I have... new names of channels... lots of chances to get something messed up in the change.

That's why it took more than a year to get the move up right. It had to be sure not to break anything. I don't know what the problem is with their system - it could be really tough to figure out. But to get hundreds of applications reconfigured only to reverse it when they got it all cleared is a little more than a simple exercise. It's silly.

They need to find and fix the problem. If they have to call in smarter people, then do it. If they have to get in new servers, do it. I can think of any of a number of ways to hack the system to work in a pinch, but they aren't asking me and I'm not offering unsolicited advice. But the mere idea that we can easily manage the risk of a global change like this in a few days is absurd. It'd take weeks, and they had better be able to fix this problem faster than that.