For the love of the Children... or your favorite religion... don't be an optimistic coder. That's the worst thing you can possibly be. This evening I've been struggling with the rollout of a new application and hitting a problem that defied explanation for at least two hours. This wasn't the only problem I had this evening, it was just the one that delayed me the most.
To start off, I was in the middle of the roll-out (London done, ready to do NYC) when I got a little "prairie dog" from my manager: "Hey Bob, can you hold off on the roll-out for a bit. I just want to check on something." Well... sure... I was 4 mins until the time I was to do the second of three phases of the roll-out, and somethings were already done from an infrastructural point of view, but sure, I'd hold off.
So I held.
For 45 mins.
Then he said "OK, go ahead." Nice guy, but really... the time to say "Hold off" is before I start the roll-out, not between phases I and II. It's a little bit of a problem when you do it that way. Since I didn't have control of the DNS entries, I was already at a point that rolling back phase I was going to be hard, so what's up? Never found out, but that's OK. We went ahead with it.
Then I got to the problem that held me up for a few hours.
I'm not one to give up easily. In fact, for a roll-out, I can't remember ever backing it out as opposed to fixing the issues right then and there. So I had to figure out why two of the four boxes in NYC were giving us grief. I was able to skip it and roll-out Chicago, but I came back to the problem boxes soon enough and had to face the music. It was nasty.
I looked at the evidence in the logs and it was as if the code simply stopped. It did one request, started another and that was it. Dead. No crash... no core... just stopped.
We tried network issues, DNS resolution issues... everything that might be a problem. In the end, I was just thinking of all the steps the code was doing and the memory popped into my head. I increased the memory on the JVM, and BINGO! It worked.
So here's the thing I can't stand about production coding: Optimistic Coders. The original coder of this little app had used a try/catch block in the code and 'swallowed' the Exceptions. He didn't think they'd ever be needed, I guess. Well... he was wrong. Had I been able to see a Java OutOfMemory exception, this would have held me up for about 2 mins and I'd never have wondered what was wrong - it would have been telling me what was wrong.
No, by hiding the true cause, this coder has hurt the reputation of Java, the developers in the group, and certainly himself. It's sad that the JVM can't take an arg that says "up to the limit of the box" for memory usage. But it can't. You have to "size" the apps. So be it. But when you assume that everything is going to be OK, and never check return values, never check to see if the thing you asked to create was, in fact, created, then you leave yourself open to all kinds of problems. All kinds.
It's over now, but I've lived through this so many times, I don't even bother trying to educate the unaware. I'll say something, in passing, and if he's interested in really understanding his problems, he'll ask. But I'll bet you he won't. If he was interested in doing a good job, then he'd have thought of it already. But he hasn't. He'll be like this as long as he's coding. Too bad.