Loads of Production Problems with Salesforce
This morning I spent all morning struggling with some production issues. The runs didn't complete, and I had to dig into the logs to find out why. Here, again, the way a lot of the Ruby devs function really hurts maintenance. This optimistic coding is something I've fought for a great number of years, and it seems that it's really systemic, or maybe endemic to the industry. People want to think "This works… and if it doesn't then it's not my fault". This might be true, but that doesn't make it right.
So first thing was figuring out what was wrong with the data. It seemed to be a data problem, so that's where I started digging. Pretty soon, I realized that the source of the data - Salesforce.com, wasn't returning the data - saying that the HTTP GET was invalid, but a POST was acceptable. I looked at the code, saw where we were doing GETs and figured out that we had the ability to do POSTs as well - changed them, retried, and still no good.
Got onto Campfire to explain the situation and try to find help. Clearly, something with Salesforce.com changed overnight and it was now no longer accepting the calls that were working yesterday.
After a lot of failed attempts, I was finally able to convince myself that there was nothing wrong with our code - that it was Salesforce.com that was simply refusing the API calls we had made yesterday. I was able to confirm this with one of our Salesforce support guys, and he thought he knew the problem, but not the solution. So off he went to figure it out.
In the end, Salesforce requires that when you deploy code, you have to manually recompile everything - or manually run all the tests to activate all the URLs in the code. Interesting.
Once that was fixed, the calls worked and everything was able to run. I finished the production runs at about 11:00 am.
What a morning.