Tracking Down Tricky Bug

I've been dealing with this nasty communications bug in the code I'm working with (MarketMash server), and I thought I'd solved the problem by eliminating a possible problem in the serialization of a list of pointers, but I was wrong. It wasn't fixed.

The basic protocol for the communication of one unit of work between the server and the calculation engine goes something like this:

  • the server sends the engine Are you ready?
  • the engine responds to the server Yup, I'm ready
  • the server sends a complete description of the calculation(s) to perform - serializing them out over the socket in a byte stream
  • the engine gets the request, processes it, and streams back the response

then the process repeats itself over and over again. The problem manifests itself as the engine is waiting at the top of the loop for an Are you ready? message, and the server is waiting for something from the engine. So, to try and nail down that the response is getting sent to the server and received properly, I've modified the protocol to look like this:

  • the server sends the engine Are you ready?
  • the engine responds to the server Yup, I'm ready
  • the server sends a complete description of the calculation(s) to perform - serializing them out over the socket in a byte stream
  • the engine gets the request, processes it, and streams back the response
  • the server receives the complete response and sends the engine Thank You
  • the engine logs the Thank You and responds to the server Welcome
  • the server receives the Welcome and logs if it doesn't get it

The goal of this is to make sure that I can see that the response is getting sent back to the server and received properly. If not, then the Thank You will not be received and I'll be able to tell that in the engine logs.

I sure hope this helps me track down what the problem really is.