Why Are Good Designs So Bloody Hard to Find?

GottaWonder.jpg

I'm not a prima donna of systems design, I know I've put together a lot of things that were good for their intended purpose, been pushed to their limits, and then some, and shown the effects of time. Everyone does. But what I have a hard time dealing with is the kind of interaction I've had this morning about something as simple as a DatingService built by the guys here at The Shop.

It's pretty standard stuff in the industry - you want to have a single place where you can get the definitive word on the business dates for any given calendar. You need to use this to calculate times to expiration, maturity, etc. It's static stuff, so it doesn't make sense to have it sitting in a database for thousands of people to read, but that's been done other places I've worked - better to have it loaded in-memory on a box that answers requests for dates, and delivers them very quickly.

But that's about the last good idea the DatingService had. Some time ago, the developers here - or maybe the management, decided that Google Protocol Buffers was The Way, and while there's nothing wrong with using them as a serialization/de-serialization scheme, you really shouldn't force the user to know that you're using them because that exposes unnecessary implementation details to the user.

If you want to make a service - make a Client too.

Have that client take only those parameters needed to identify the version/type/etc. of the service and then have all the calls return either objects defined in the API, or language-standard data types. It's simple to use, your client code can include auto-reconnect, auto-failover, all the things that a client might want to have. You are in control.

If you leave too many details to the user, then if and when you need to change any one of them, you're sunk - all users have to update which is a major coordination issue. Not ideal.

Take this example of a client I wrote to my CacheStation - a Reuters-based ticker plant that I wrote at my last position:

  CSClient    svr = null;
  try {
    svr = new CSClient(CSClient.CHICAGO_PROD);
    if (svr == null) {
      // ...handle the problem
    }
  } catch (BKException bke) {
    // handle any exceptions like missing server
  }

All details about the communication are handled in the constructor. There are exceptions thrown for a missing server, but you don't have to configure anything - the "servers" are defined as parameters in the client code so that I (as the owner of the code) can choose to have the real connection parameters in the code, or in a file, or in a database - and there's no difference to the user of the service.

The client auto-reconnects, auto-retries, and has methods for blocking and non-blocking calls so that the user can choose exactly how they want to get data from the server. As simply, the data is returned in objects that are defined in the package as well:

  Vector     symbols = new Vector();
  symbols.add("IBM");
  symbols.add("GOOG");
  Vector     prices = null;
  try {
    prices = svr.grab(symbols);
  } catch (BKException bke) {
    // ...handle the possible exception
  }
 
  MMPrice   ibm = prices.get(0);
  MMPrice   goog = prices.get(1);
  System.out.println("IBM volume: " + ibm.getVolume());
  // etc.

The point is that the API/Client should be simple, simple, simple. The more complex it is, the worse it is. So let's look at what I'm trying to deal with today.

To be fair, the code presented here is not mine, and had I written it, I'd have broken it up into logical steps, logging the success or failure of each, which would have helped in knowing what failed, but not in the fact that it failed. So here's the code for connecting to the DatingServer written here at The Shop:

  private void initializeCalendar() throws Exception {
    boolean  shouldInitializeService = false;
    try {
      readWriteLock.readLock().lock();
      shouldInitializeService = !serviceInitialized;
    } finally {
      readWriteLock.readLock().unlock();
    }
 
    if (shouldInitializeService) {
      try {
        readWriteLock.writeLock().lock();
 
        this.rpcClient = new RpcMessagingClient(transportEnvironment,
                                                messengerFileLocation);
        HolidayService holidayService = HolidayService.newStub(
                         rpcClient.getRpcChannel(
                             HolidayService.getDescriptor().getName(), ""));
        rpcHolidayServiceProxy = new RpcSyncProxy(holidayService);
        HolidayCalcRequest.Builder holidayCalcRequestBuilder =
                         HolidayCalcRequest.newBuilder();
        holidayCalcRequestBuilder.setHolidayCenter(holidayCalendarName);
        holidayCenterData = rpcHolidayServiceProxy.rpcCall("getHolidays",
                         holidayCalcRequestBuilder.build());
 
        serviceInitialized = true;
      } finally {
        readWriteLock.writeLock().unlock();
      }
    }
  }

I don't think I could imagine a more Rube Goldberg scheme for connecting to something as simple as a DatingService and getting a list of holidays for a given calendar name. And this doesn't even include the dozen lines for parsing the output of the RPC call which is another serious problem.

Why do I need to know about the RPC? Why do I need to make builders that serve no other reason than to get the data? It's a mess, a joke, and that people think this is good is just depressing.

And so we have my continuing problem with this service - when I was starting this application at 5:45 am, it was fine. The users wanted to have it running nearly continually, so I moved the start time to 00:45, and there, it's not working. Sure, the code isn't that great - goes the RpcMessagingClient need to be shutdown if the constructor succeeds but the other calls fail? The code ends up getting run make times, making a lot of connections with the RpcMessagingClient and then we run out of sockets and the web server is useless.

But if I restart it at 5:45 am, it's all fine and there are no errors.

So clearly there's a problem in their end of things, but with the code written the way it is, I can't give them any more information than the sequence of steps failed. I would think that's enough. But it's not.

No, they are pushing back saying it's fine on their end and that I need to provide them with the detailed error logs and stack traces. WHAT? I have to provide them with the reasons their code is not working? What's wrong with this picture?

Clearly, this place has a lot of dysfunction and a good deal more issues. I am playing by the rules my manager gives me, and if that means that the early morning hours of the web server are shot because they can't fix their stuff... well... I can't help that.

I give up.