What's it all about, Alfie?

Archive for the ‘Coding’ Category

Still Trying to Find More Speed

Tuesday, March 8th, 2011

I've spent a few hours today trying to find even more speed in my exchange decoder. It's the core part of the ticker plants as it's the first component in the chain, and the exchange datagrams come into this component, are converted into our own message format, and sending them downstream. The problem I'm seeing is that the time to get through this step is far too long, in my mind. I'm seeing times in the hundreds of milliseconds - and that's not right.

So I'm trying to find the problem. It's not easy because I can only do this during exchange hours, but even then, it's not obvious where the problem lies. I clearly need to do more work.

I'm concerned that it's in the OPRA decoding - that would be a tragic problem. Messing with that code could be really dangerous.

Posted in Coding, Cube Life | Comments Off on Still Trying to Find More Speed

The Conversion from Decimals to Integers

Monday, March 7th, 2011

This afternoon I've been working very hard to convert all the prices and decimal numbers in my ticker plant codebase from float values to uint32_t values with a given getDecimalMultiplier() on each message. This came up in a meeting regarding some other group's use of the codebase, and they currently don't use floating point numbers - but rather an integer and a multiplier. OK... I can fix that, and so I did.

First thing was realizing that a uint32_t was sufficient as that would give me a 10,000 multiplier and values in excess of $400,000.00, when divided out. Good enough. Then I had to go into the code, replace all the values, add constructors and methods to take either the floating point number, and convert it to the proper integer, or take the integer and just use it.

The next thing was to look at the conversion/sampling functions on the exchange data. A lot of these take an integer mantissa and a divisor code and generate a float. What I needed to do was to alter these, or make similar functional versions, where they would take the same arguments and generate the correct integer representation of the value - offset by 10000 (my new offset). Again, not really hard, but it's detail work - making sure you get all the conversions done and don't loose any precision in the process.

Next, I created getters and setters for the messages that allowed the user to get the integer or floating point value at their choice. The scheme I used was to say that getPrice() got the decimal number and getPriceAsInt() got the biased integer. Pretty simple, and I don't think I'm going to have a lot of confusion here, which is very important.

Finally, with nothing but a few float values remaining - and the getters and setters using float arguments, I decided it was better to do a complete conversion to double and get rid of any float values in the processing. It's cleaner, more easily dealt with at the chip level, better scale and accuracy -- it's just better.

With this, I have everything stored as integers, with the multiplier available to the clients, and even decimal getters if they don't want to hassle with the conversions themselves. It's about as clean as I can imagine making it.

Posted in Coding, Cube Life | Comments Off on The Conversion from Decimals to Integers

Finally Realizing One Size Never Fits All

Friday, March 4th, 2011

I originally designed my ticker plants to fit a specific client: the systems feeding the human traders. Eyeballs. There was no need to have everything up-to-date every millisecond - the human eye can't tell, and the systems don't update faster than a few times a second. It's just a waste. But what they do care about is that when they see the change, it's the latest data available. This means don't queue it up! You have to remember the order the ticks came in, but allow for updates to the data to replace the old with the new. This is commonly called conflation. It's a good thing for systems delivering data to humans.

But automated trading systems don't want this. They want every tick. They want it all as fast as possible. It's understandable - if you can make a machine able to see everything, then you have a much better chance of seeing opportunity and therefore making a profit. While I didn't design my ticker plants for these kinds of systems, several months ago, I was asked to make it work for these kinds of systems.

I've spent a lot of time trying to speed things up so that one system is capable of meeting the needs of both kinds of clients. It's been very difficult, and in a very real sense, what I've been doing is dumbing down my system to force the clients to handle every tick. If I could have done it, it would have been fantastic. But it really isn't possible. The compromises for one client are just too far from the compromises for the other.

So I finally had another little Ah! Ha! moment - Stop trying to make one size fit all. Elementary, but true, and an important understanding of really making something good for everyone.

If I made my ticker plants the way I started - for the 'slow' trading, and then had the 'fast' trading use an embedded ticker plant, then those that needed speed wouldn't even have to deal with a network hop. That's good. No serialization or deserialization. No worries about dropping packets from the server to the client. There are a lot of things that just "go away" when you decode and use the data in the same process.

I do this in my NBBO server - I have n exchange feeds all going into one NBBOEngine, and then sending it out to the clients. I don't take in the feed, process it, and then send it out - that'd take too long. I process the feed within the process space of the consuming application.

The resources to do this aren't horrible, two threads, less than a core and some memory. All this can be dealt with very easily by adding a box or two, if necessary. These boxes could be the "servers" you turned off because you no longer need them. In any case, it's a very solvable problem.

In the end, those that need conflation get it, and those that don't want it, get the data in-process as fast as possible. It's really the best of both worlds as it doesn't make compromises for one client or another.

Posted in Coding, Cube Life | Comments Off on Finally Realizing One Size Never Fits All

Google Chrome dev 11.0.686.3 is Out

Friday, March 4th, 2011

Seems there's another quick-fix for Google Chrome dev to bring it to 11.0.686.3 - this time about the autofill related crash. Fair enough - it's nice they are being this responsive, but if it's just a day, they could have waited the original release and not messed with these two updates. But maybe they had to release for political reasons.

Posted in Coding, Everything Else, Open Source Software | Comments Off on Google Chrome dev 11.0.686.3 is Out

Successful Tests with ZeroMQ – Time to Update

Thursday, March 3rd, 2011

ZeroMQ

I've had a very successful day testing ZeroMQ in my ticker plants with the updated parameters that had been hinted to me by a co-worker. It's not something I'd have thought to try, given that we're using OpenPGM - I thought the socket buffers were going to be controlled by OpenPGM, but I guess not.

In any case, if I create a socket and then set the send and receive buffers to 64MB and the peak speed to 500Mbps with a 100 msec recovery interval:

  // set the send and receive buffers to 64MB each
  static int64_t      __sndbuf = 67108864;
  static int64_t      __rcvbuf = 67108864;
  // have the maximum sending rate be 500Mbps
  static int64_t      __rate = 500000;
  // ...and the recovery interval 100 msec
  static int64_t      __recovery = 100;
 
  // create the socket...
  try {
    mSocket = new zmq::socket_t(*mContext, ZMQ_PUB);
    if (mSocket == NULL) {
      error = true;
      cLog.error("could not create the socket!");
    } else {
      // now let's set the parameters one by one...
      mSocket->setsockopt(ZMQ_SNDBUF, &__sndbuf, sizeof(__sndbuf));
      mSocket->setsockopt(ZMQ_RCVBUF, &__rcvbuf, sizeof(__rcvbuf));
      mSocket->setsockopt(ZMQ_RATE, &__rate, sizeof(__rate));
      mSocket->setsockopt(ZMQ_RECOVERY_IVL_MSEC, &__recovery,
                          sizeof(__recovery));
      // now let's connect to the right multicast group
      mSocket->connect(aURL.c_str());
    }
  } catch (zmq::error_t & e) {
    cLog.error("while creating the socket an exception was thrown!");
    if (mSocket != NULL) {
      delete mSocket;
      mSocket = NULL;
    }
  }

I've got a lot more testing to do, but these parameters really seem to help. Very nice.

The next step is to get the latest code from the GitHub git repo and try it. There are a ton of new features and lots of fixes which hopefully will clear up the last of the problems I'm seeing.

Posted in Coding, Cube Life | Comments Off on Successful Tests with ZeroMQ – Time to Update

Erlang Ring Benchmark from Chapter 8 of J Armstrong Book

Thursday, March 3rd, 2011

erlang

I've been trying to learn erlang from the book Programming Erlang by J Armstrong, and one of the first real challenges was the exercises in chapter 8 where he challenges me to:

Write a ring benchmark. Create N processes in a ring. Send a message round the ring M times so that a total of N * M messages get sent. Time how long this takes for different values of N and M.

Write a similar program in some other programming language you are familiar with. Compare the results. Write a blog, and publish the results on the Internet!

One thing I think he missed in the problem statement is that a message passed from one node to another really should have a response sent by the receiver. In my design, I planned to have a 'ping' message sent, received, and a 'pong' message sent back. The receipt of the 'pong' message would be a no-op, but it needed to be received. Other than that, the design was just like the problem statement.

My solution to the problem is this:

-module(ring).
-export([start/0, test/2]).
 
%% make the state container for the ring node
-record(state, {next, origin, caller}).
 
%% standard entry point for a 1000 node, 500 cycle test
start() ->
  test(1000, 500).
 
%% make a synchronous message call to the pid and wait for response
rpc(Pid, Request) ->
  Pid ! {self(), Request},
  receive
    {Pid, Response} ->
      Response
  end.
 
%% main messaging loop for all nodes in the ring
loop(State) ->
  receive
    %% the head of the ring needs to know it's the origin
    {From, {origin, Origin}} ->
      From ! {self(), Origin},
      loop(State#state{origin=Origin});
 
    %% building the ring is a countdown of creations
    {From, {build, Count}} when Count > 1 ->
      Node = spawn(fun() -> loop(State) end),
      rpc(Node, {build, Count-1}),
      From ! {self(), Count},
      loop(State#state{next=Node});
    %% ...to the final node that circles back to the origin
    {From, {build, Count}} ->
      From ! {self(), Count},
      loop(State#state{next=State#state.origin});
 
    %% starting the test kicks it off and saves the caller
    {From, {go}} ->
      State#state.next ! {self(), {ping}},
      loop(State#state{caller=From});
 
    %% the ping needs to answer and then stop or continue
    {From, {ping}} ->
      From ! {self(), {pong}},
      if
        State#state.origin =:= self() ->
          State#state.caller ! {self(), 1};
        true ->
          State#state.next ! {self(), {ping}}
      end,
      loop(State);
    %% ...the response to a pong is to do nothing
    {_, {pong}} ->
      loop(State)
  end.
 
%% build a ring o 'N' nodes, and run through this 'M' times...
test(Nodes,Cycles) ->
  io:format("starting the build and exercise of the ring...~n"),
  statistics(runtime),
  statistics(wall_clock),
  State = #state{},
  Head = spawn(fun() -> loop(State) end),
  rpc(Head, {origin, Head}),
  rpc(Head, {build, Nodes}),
  _ = [rpc(Head, {go}) || _ <- lists:seq(1,Cycles)],
  {_, Runtime} = statistics(runtime),
  {_, Walltime} = statistics(wall_clock),
  U1 = Runtime * 1000 / (Nodes*Cycles),
  U2 = Walltime * 1000 / (Nodes*Cycles),
  io:format("total cpu=~pms ... ~pus/op and wall=~pms ... ~pus/op~n",
            [Runtime, U1, Walltime, U2]).

There are several things I think are important watershed events in the code that really started to solidify my understanding of erlang. I think it's worth going over them to make sure it's easy to follow along.

There are Only Functions

Seems odd, but really the entire language is a series of functions. This may seem obvious to someone thinking Hey! It's a functional language, Bob! but it was't clear to me as I started this exercise. There are variables, but their scope is so limited that it's really just a series of function calls. If you want to build the structure of a ring, you have to have some idea of the head of the ring, the N-1 'other' nodes, and then loop it back to the head. This 'next' state is essential for a node, and it's not at all obvious where that's stored.

In truth, it's stored in the arguments to the loop() function. This was my first Ah! Ha! moment:

All state is maintained as function arguments.

Seems silly, but I wish he'd said that in the book. It sure would make things a lot easier. Again, think superconductor. You have state maintained in the "execution ring" of the typical loop() function. Once I got that, it was clear to stop trying my other methods.

State is Held in Records

Passing all this state-based data as arguments to functions gets ugly very fast. So the solution was to create records. Second Ah! Ha! moment:

State is conveniently held in records that are easily updated in parts.

This was major as it just isn't stated in the book that there's a reason for these records, and that state maintenance is it. They could really have said something and made it far easier to catch the major points.

Initializing Processes is a Method Call (or Two)

Because I create a process with the spawn() function, if you want it to refer to itself, other than the self() function, you have to send it a message. Lines 22-25 handle the method that's used to tell the Head of the ring that it is, in fact, the head of the ring. Since there's no state in the process other than what it maintains in a calling loop, you have to start that loop, and then "feed it" the data that it can "piece together" to form the complete state you want it to have.

This is more than a little complicated, because you really can have state in a process, but that state is really just held in a "ring" of looping calls like electrons in a superconductor. You have to set up the conditions under which they will flow, and then insert the data that flows.

I get it, but larger, more complex systems might be a real pain to keep straight. We'll have to see how things go.

Results

When I ran this test I got the following:

  29> c(ring).     
  {ok,ring}
  30> ring:start().
  starting the build and exercise of the ring...
  total cpu=2040ms ... 4.08us/op and wall=1692ms ... 3.384us/op
  ok
  31>

Now I haven't written my C++ equivalent - yet, but there's no way I'm not going to be able to beat this. First off, the CPU time is longer than the wall clock time? That makes no sense. I've double-checked the code, but yeah, it's longer. Even so, 4 μsec/op is not all that fast for as simple as it is. Again, I'll have to write the C++ version and see, but I'm guessing to be really able to beat this handily.

We'll see.

[3/14] UPDATE: I just made a C++ equivalent of this erlang code and it's not too bad. Yeah, it's about twice as long as the erlang code - in terms of number of lines, but it's clean, and it's got a lot more error checking than the erlang code does.

/**
 * ring.cpp - this is the C++ equivalent of the Armstrong Chapter 8
 *            exercise where you are supposed to make a ring of 'n'
 *            objects and have one fire another for a total of 'm' laps.
 */
//  System Headers
#include <stdint.h>
#include <iostream>
#include <sys/time.h>
 
//  Third-Party Headers
 
//  Other Headers
 
//  Forward Declarations
 
//  Public Constants
 
//  Public Datatypes
 
//  Public Data Constants
/**
 * These are the different messages that we're going to pass around
 * from Node to Node. They will be simple uint8_t values as they don't
 * need to be anything special.
 */
#define PING    0
#define PONG    1
 
 
 
/**
 * This is the node that will make up the ring. It's got a nice pointer
 * to the next Node in the ring and a few simple methods to make the
 * ring a little easier to build and use.
 */
class Node {
    public:
        // Constructors and Destructors
        Node() : mNext(NULL), mStopOnPing(false) { }
        ~Node() { }
 
        // Accessor Methods
        void setNext( Node *aNode ) { mNext = aNode; }
        void setStopOnPing( bool aFlag ) { mStopOnPing = aFlag; }
        bool stopOnPing() { return mStopOnPing; }
 
        // send the message to the target where it can respond
        bool send( Node *aTarget, uint8_t aMessage ) {
            bool        error = false;
            if (aTarget == NULL) {
                error = true;
            } else {
                error = !aTarget->onMessage(this, aMessage);
            }
            return !error;
        }
 
        // this method is called when a message is sent to this guy
        bool onMessage( Node *aSource, uint8_t aMessage ) {
            bool        error = false;
            switch (aMessage) {
                case PING:
                    if (((error = !send(aSource, PONG)) == false) &&
                        !mStopOnPing) {
                        error = !send(mNext, PING);
                    }
                    break;
                case PONG:
                    break;
                default:
                    error = true;
                    break;
            }
            return !error;
        }
 
        // this is a simple way to send a ping around the ring
        bool ping() {
            return send(mNext, PING);
        }
 
    private:
        // this is the next node in the ring - wrapping back around
        Node    *mNext;
        // ...lets me know if I need to stop on a PING (loop done)
        bool    mStopOnPing;
};
 
 
/**
 * This method just gives me a nice microseconds since epoch that I can
 * use for timing the operations.
 */
uint32_t snap() {
    struct timeval tp;
    gettimeofday(&tp, NULL);
    return (tp.tv_sec * 1000000) + tp.tv_usec;
}
 
 
/**
 * This is the main entry point that will build up the ring and then fire
 * it off 'm' times and then we'll see how fast it runs.
 */
int main(int argc, char *argv[]) {
    bool        error = false;
 
    // start off with the defaults for the program
    uint16_t    n = 1000;
    uint16_t    m = 500;
 
    // start the timer
    uint32_t    click = snap();
 
    // now, let's make the ring of the right size, holding onto the head
    Node    *head = NULL;
    if (!error) {
        std::cout << "Building the " << n << " element ring..."
                  << std::endl;
        if ((head = new Node()) == NULL) {
            error = true;
        } else {
            head->setStopOnPing(true);
        }
    }
    Node    *tail = head;
    for (uint16_t i = 0; !error && (i < (n - 1)); ++i) {
        Node    *newbie = new Node();
        if (newbie == NULL) {
            error = true;
            break;
        } else {
            tail->setNext(newbie);
            tail = newbie;
            tail->setNext(head);
        }
    }
 
    // now let's run it the right number of times
    if (!error) {
        std::cout << "Running the " << n << " element ring "
                  << m << " times..." << std::endl;
        for (uint16_t i = 0; i < m; ++i) {
            head->ping();
        }
    }
 
    // stop the timer
    if (!error) {
        click = snap() - click;
        std::cout << "Took " << click << " usec or "
                  << click*1000.0/(n*m) << " nsec/op"
                  << std::endl;
    }
 
    return (error ? 1 : 0);
}

When I run this guy, I get a much different runtime:

  peabody{drbob}23: c++ ring.cpp -o ring
  peabody{drbob}24: ring
  Building the 1000 element ring...
  Running the 1000 element ring 500 times...
  Took 16742 usec or 33.484 nsec/op
  peabody{drbob}25:

So the time it took for C++ to do the work was 33.484 nsec, and the erlang took 3.384 μsec -- a difference of about 100x - in favor of C++. Yeah, it's that much different. I'm shocked, but only by the margin. I expected erlang to have the code side advantage, but not by a factor of two. And I expected C++ to beat erlang in speed, but not by a factor of 100.

Wild stuff. Very neat.

Posted in Coding, Open Source Software | Comments Off on Erlang Ring Benchmark from Chapter 8 of J Armstrong Book

Integrating Vim with Gist at GitHub

Wednesday, March 2nd, 2011

This morning I expanded my world considerably by happening across a Vim plugin for access to Gist. This is one of the services that I've been amazed at for a long while. GitHub is simply amazing, and I really should just give them money because I love what they are doing and want to support their work. But gists, in particular, are exceptionally cool.

Sure, there are a lot of places where you can throw up text and then look at it. But GitHub is so clean and focused on what they are doing, it's joy to use. So here's how I got it working:

First, follow the instructions on this page to download the plugin to your ~/.vim/plugin/ directory. You'll need to make a few additions to your ~/.vimrc file:

  let g:gist_clip_command = 'pbcopy'
  let g:gist_detect_filetype = 1
  let g:github_user = 'yourname'
  let g:github_token = '...big long hex number...'

and the instructions for getting your token are on the plugin page. Pretty simple stuff.

One thing I didn't like was the fact that when a new gist was downloaded, it was put into a split window. I don't like that. I have MacVim, and I open up new tabs and use them. So I changed the code in the plugin just a little. It was originally:

  if winnum != -1
    if winnum != bufwinnr('%')
      exe "normal \<c-w>".winnum."w"
    endif
    setlocal modifiable
  else
    exec 'silent split gist:'.a:gistid
  endif

and I changed line 299 to:

  if winnum != -1
    if winnum != bufwinnr('%')
      exe "normal \<c-w>".winnum."w"
    endif
    setlocal modifiable
  else
    exec 'silent edit gist:'.a:gistid
  endif

and everything worked just like I wanted it to. What an amazing little plugin for Vim! I can now edit, post, update, pull - all the things I'd like to be able to do on a gist, now from within Vim. What a treat.

Posted in Coding, Open Source Software | Comments Off on Integrating Vim with Gist at GitHub

Google Chrome dev 11.0.686.1 is Out

Wednesday, March 2nd, 2011

Seems the Google Chrome dev 11.0.686.1 is out with a fix for an HTML5 issue about playing videos on Vimeo.com. I guess they had to plug it into their Flash player or something. It's sad they dropped the embedded video tag, but they did, and I'm guessing this is collateral damage. So they quickly pushed out an update that fixes this issue. No surprise there.

Posted in Coding, Everything Else, Open Source Software | Comments Off on Google Chrome dev 11.0.686.1 is Out

Switched Back to ZeroMQ for my Ticker Plants

Wednesday, March 2nd, 2011

ZeroMQ

This morning I swapped out the UDP transport system I'd written over the last few days for the ZeroMQ-based one that I'd been using for months prior to that. I have the feeling that because of the bugs I fixed in the rest of the codebase, it's very likely that the ZeroMQ transport system will work just fine. Additionally, I've been able to get the master of ZeroMQ on GitHub to compile and work with OpenPGM, so it's possible to update all our installs of ZeroMQ as well.

We'll see what happens today. If we can run all day at the same levels as we did yesterday with the UDP transport, then I can stop trying to put reliability into my UDP transport and just go with ZeroMQ. That would be nice. One less thing to do.

Posted in Coding, Cube Life | Comments Off on Switched Back to ZeroMQ for my Ticker Plants

Hammering, Banging, Pushing and Pulling – Searching for Speed

Tuesday, March 1st, 2011

Today has been a reasonably successful day as long as we stretch the meaning of 'today'. Everything has been running well for the next round of tests, but I was still trying to get just a little more speed out of the UDP broadcaster, or maybe the NBBO engine. When all you do is look at the logs of a process you get thinking that maybe you can cut that 200 msec to 100 msec, and if you can do that then you can speed everything up. It's a vicious cycle.

So I spent a lot of the day working up different approaches and testing them against what I've already got in the test system. In every case, what I had was faster. Darn. But maybe it's good news in that I'd done a good job already, so I didn't have any real improvement to do. Like I said, it depends on how to define 'today'.

I have a successful system, but it was already that way. I just proved that I couldn't make it any better. Well... that's something.

Posted in Coding, Cube Life | Comments Off on Hammering, Banging, Pushing and Pulling – Searching for Speed

What's it all about, Alfie?

Archive for the ‘Coding’ Category

Still Trying to Find More Speed

The Conversion from Decimals to Integers

Finally Realizing One Size Never Fits All

Google Chrome dev 11.0.686.3 is Out

Successful Tests with ZeroMQ – Time to Update

Erlang Ring Benchmark from Chapter 8 of J Armstrong Book

There are Only Functions

State is Held in Records

Initializing Processes is a Method Call (or Two)

Results

Integrating Vim with Gist at GitHub

Google Chrome dev 11.0.686.1 is Out

Switched Back to ZeroMQ for my Ticker Plants

Hammering, Banging, Pushing and Pulling – Searching for Speed

Pages

Archives

Categories