Archive for the ‘Coding’ Category

Erlang Ring Benchmark from Chapter 8 of J Armstrong Book

Thursday, March 3rd, 2011

erlang

I've been trying to learn erlang from the book Programming Erlang by J Armstrong, and one of the first real challenges was the exercises in chapter 8 where he challenges me to:

Write a ring benchmark. Create N processes in a ring. Send a message round the ring M times so that a total of N * M messages get sent. Time how long this takes for different values of N and M.

Write a similar program in some other programming language you are familiar with. Compare the results. Write a blog, and publish the results on the Internet!

One thing I think he missed in the problem statement is that a message passed from one node to another really should have a response sent by the receiver. In my design, I planned to have a 'ping' message sent, received, and a 'pong' message sent back. The receipt of the 'pong' message would be a no-op, but it needed to be received. Other than that, the design was just like the problem statement.

My solution to the problem is this:

  1. -module(ring).
  2. -export([start/0, test/2]).
  3.  
  4. %% make the state container for the ring node
  5. -record(state, {next, origin, caller}).
  6.  
  7. %% standard entry point for a 1000 node, 500 cycle test
  8. start() ->
  9. test(1000, 500).
  10.  
  11. %% make a synchronous message call to the pid and wait for response
  12. rpc(Pid, Request) ->
  13. Pid ! {self(), Request},
  14. receive
  15. {Pid, Response} ->
  16. Response
  17. end.
  18.  
  19. %% main messaging loop for all nodes in the ring
  20. loop(State) ->
  21. receive
  22. %% the head of the ring needs to know it's the origin
  23. {From, {origin, Origin}} ->
  24. From ! {self(), Origin},
  25. loop(State#state{origin=Origin});
  26.  
  27. %% building the ring is a countdown of creations
  28. {From, {build, Count}} when Count > 1 ->
  29. Node = spawn(fun() -> loop(State) end),
  30. rpc(Node, {build, Count-1}),
  31. From ! {self(), Count},
  32. loop(State#state{next=Node});
  33. %% ...to the final node that circles back to the origin
  34. {From, {build, Count}} ->
  35. From ! {self(), Count},
  36. loop(State#state{next=State#state.origin});
  37.  
  38. %% starting the test kicks it off and saves the caller
  39. {From, {go}} ->
  40. State#state.next ! {self(), {ping}},
  41. loop(State#state{caller=From});
  42.  
  43. %% the ping needs to answer and then stop or continue
  44. {From, {ping}} ->
  45. From ! {self(), {pong}},
  46. if
  47. State#state.origin =:= self() ->
  48. State#state.caller ! {self(), 1};
  49. true ->
  50. State#state.next ! {self(), {ping}}
  51. end,
  52. loop(State);
  53. %% ...the response to a pong is to do nothing
  54. {_, {pong}} ->
  55. loop(State)
  56. end.
  57.  
  58. %% build a ring o 'N' nodes, and run through this 'M' times...
  59. test(Nodes,Cycles) ->
  60. io:format("starting the build and exercise of the ring...~n"),
  61. statistics(runtime),
  62. statistics(wall_clock),
  63. State = #state{},
  64. Head = spawn(fun() -> loop(State) end),
  65. rpc(Head, {origin, Head}),
  66. rpc(Head, {build, Nodes}),
  67. _ = [rpc(Head, {go}) || _ <- lists:seq(1,Cycles)],
  68. {_, Runtime} = statistics(runtime),
  69. {_, Walltime} = statistics(wall_clock),
  70. U1 = Runtime * 1000 / (Nodes*Cycles),
  71. U2 = Walltime * 1000 / (Nodes*Cycles),
  72. io:format("total cpu=~pms ... ~pus/op and wall=~pms ... ~pus/op~n",
  73. [Runtime, U1, Walltime, U2]).

There are several things I think are important watershed events in the code that really started to solidify my understanding of erlang. I think it's worth going over them to make sure it's easy to follow along.

There are Only Functions

Seems odd, but really the entire language is a series of functions. This may seem obvious to someone thinking Hey! It's a functional language, Bob! but it was't clear to me as I started this exercise. There are variables, but their scope is so limited that it's really just a series of function calls. If you want to build the structure of a ring, you have to have some idea of the head of the ring, the N-1 'other' nodes, and then loop it back to the head. This 'next' state is essential for a node, and it's not at all obvious where that's stored.

In truth, it's stored in the arguments to the loop() function. This was my first Ah! Ha! moment:

All state is maintained as function arguments.

Seems silly, but I wish he'd said that in the book. It sure would make things a lot easier. Again, think superconductor. You have state maintained in the "execution ring" of the typical loop() function. Once I got that, it was clear to stop trying my other methods.

State is Held in Records

Passing all this state-based data as arguments to functions gets ugly very fast. So the solution was to create records. Second Ah! Ha! moment:

State is conveniently held in records that are easily updated in parts.

This was major as it just isn't stated in the book that there's a reason for these records, and that state maintenance is it. They could really have said something and made it far easier to catch the major points.

Initializing Processes is a Method Call (or Two)

Because I create a process with the spawn() function, if you want it to refer to itself, other than the self() function, you have to send it a message. Lines 22-25 handle the method that's used to tell the Head of the ring that it is, in fact, the head of the ring. Since there's no state in the process other than what it maintains in a calling loop, you have to start that loop, and then "feed it" the data that it can "piece together" to form the complete state you want it to have.

This is more than a little complicated, because you really can have state in a process, but that state is really just held in a "ring" of looping calls like electrons in a superconductor. You have to set up the conditions under which they will flow, and then insert the data that flows.

I get it, but larger, more complex systems might be a real pain to keep straight. We'll have to see how things go.

Results

When I ran this test I got the following:

  29> c(ring).     
  {ok,ring}
  30> ring:start().
  starting the build and exercise of the ring...
  total cpu=2040ms ... 4.08us/op and wall=1692ms ... 3.384us/op
  ok
  31>

Now I haven't written my C++ equivalent - yet, but there's no way I'm not going to be able to beat this. First off, the CPU time is longer than the wall clock time? That makes no sense. I've double-checked the code, but yeah, it's longer. Even so, 4 μsec/op is not all that fast for as simple as it is. Again, I'll have to write the C++ version and see, but I'm guessing to be really able to beat this handily.

We'll see.

[3/14] UPDATE: I just made a C++ equivalent of this erlang code and it's not too bad. Yeah, it's about twice as long as the erlang code - in terms of number of lines, but it's clean, and it's got a lot more error checking than the erlang code does.

/**
 * ring.cpp - this is the C++ equivalent of the Armstrong Chapter 8
 *            exercise where you are supposed to make a ring of 'n'
 *            objects and have one fire another for a total of 'm' laps.
 */
//  System Headers
#include <stdint.h>
#include <iostream>
#include <sys/time.h>
 
//  Third-Party Headers
 
//  Other Headers
 
//  Forward Declarations
 
//  Public Constants
 
//  Public Datatypes
 
//  Public Data Constants
/**
 * These are the different messages that we're going to pass around
 * from Node to Node. They will be simple uint8_t values as they don't
 * need to be anything special.
 */
#define PING    0
#define PONG    1
 
 
 
/**
 * This is the node that will make up the ring. It's got a nice pointer
 * to the next Node in the ring and a few simple methods to make the
 * ring a little easier to build and use.
 */
class Node {
    public:
        // Constructors and Destructors
        Node() : mNext(NULL), mStopOnPing(false) { }
        ~Node() { }
 
        // Accessor Methods
        void setNext( Node *aNode ) { mNext = aNode; }
        void setStopOnPing( bool aFlag ) { mStopOnPing = aFlag; }
        bool stopOnPing() { return mStopOnPing; }
 
        // send the message to the target where it can respond
        bool send( Node *aTarget, uint8_t aMessage ) {
            bool        error = false;
            if (aTarget == NULL) {
                error = true;
            } else {
                error = !aTarget->onMessage(this, aMessage);
            }
            return !error;
        }
 
        // this method is called when a message is sent to this guy
        bool onMessage( Node *aSource, uint8_t aMessage ) {
            bool        error = false;
            switch (aMessage) {
                case PING:
                    if (((error = !send(aSource, PONG)) == false) &&
                        !mStopOnPing) {
                        error = !send(mNext, PING);
                    }
                    break;
                case PONG:
                    break;
                default:
                    error = true;
                    break;
            }
            return !error;
        }
 
        // this is a simple way to send a ping around the ring
        bool ping() {
            return send(mNext, PING);
        }
 
    private:
        // this is the next node in the ring - wrapping back around
        Node    *mNext;
        // ...lets me know if I need to stop on a PING (loop done)
        bool    mStopOnPing;
};
 
 
/**
 * This method just gives me a nice microseconds since epoch that I can
 * use for timing the operations.
 */
uint32_t snap() {
    struct timeval tp;
    gettimeofday(&tp, NULL);
    return (tp.tv_sec * 1000000) + tp.tv_usec;
}
 
 
/**
 * This is the main entry point that will build up the ring and then fire
 * it off 'm' times and then we'll see how fast it runs.
 */
int main(int argc, char *argv[]) {
    bool        error = false;
 
    // start off with the defaults for the program
    uint16_t    n = 1000;
    uint16_t    m = 500;
 
    // start the timer
    uint32_t    click = snap();
 
    // now, let's make the ring of the right size, holding onto the head
    Node    *head = NULL;
    if (!error) {
        std::cout << "Building the " << n << " element ring..."
                  << std::endl;
        if ((head = new Node()) == NULL) {
            error = true;
        } else {
            head->setStopOnPing(true);
        }
    }
    Node    *tail = head;
    for (uint16_t i = 0; !error && (i < (n - 1)); ++i) {
        Node    *newbie = new Node();
        if (newbie == NULL) {
            error = true;
            break;
        } else {
            tail->setNext(newbie);
            tail = newbie;
            tail->setNext(head);
        }
    }
 
    // now let's run it the right number of times
    if (!error) {
        std::cout << "Running the " << n << " element ring "
                  << m << " times..." << std::endl;
        for (uint16_t i = 0; i < m; ++i) {
            head->ping();
        }
    }
 
    // stop the timer
    if (!error) {
        click = snap() - click;
        std::cout << "Took " << click << " usec or "
                  << click*1000.0/(n*m) << " nsec/op"
                  << std::endl;
    }
 
    return (error ? 1 : 0);
}

When I run this guy, I get a much different runtime:

  peabody{drbob}23: c++ ring.cpp -o ring
  peabody{drbob}24: ring
  Building the 1000 element ring...
  Running the 1000 element ring 500 times...
  Took 16742 usec or 33.484 nsec/op
  peabody{drbob}25: 

So the time it took for C++ to do the work was 33.484 nsec, and the erlang took 3.384 μsec -- a difference of about 100x - in favor of C++. Yeah, it's that much different. I'm shocked, but only by the margin. I expected erlang to have the code side advantage, but not by a factor of two. And I expected C++ to beat erlang in speed, but not by a factor of 100.

Wild stuff. Very neat.

Integrating Vim with Gist at GitHub

Wednesday, March 2nd, 2011

MacVim.jpg

This morning I expanded my world considerably by happening across a Vim plugin for access to Gist. This is one of the services that I've been amazed at for a long while. GitHub is simply amazing, and I really should just give them money because I love what they are doing and want to support their work. But gists, in particular, are exceptionally cool.

Sure, there are a lot of places where you can throw up text and then look at it. But GitHub is so clean and focused on what they are doing, it's joy to use. So here's how I got it working:

First, follow the instructions on this page to download the plugin to your ~/.vim/plugin/ directory. You'll need to make a few additions to your ~/.vimrc file:

  let g:gist_clip_command = 'pbcopy'
  let g:gist_detect_filetype = 1
  let g:github_user = 'yourname'
  let g:github_token = '...big long hex number...'

and the instructions for getting your token are on the plugin page. Pretty simple stuff.

One thing I didn't like was the fact that when a new gist was downloaded, it was put into a split window. I don't like that. I have MacVim, and I open up new tabs and use them. So I changed the code in the plugin just a little. It was originally:

  1. if winnum != -1
  2. if winnum != bufwinnr('%')
  3. exe "normal \<c-w>".winnum."w"
  4. endif
  5. setlocal modifiable
  6. else
  7. exec 'silent split gist:'.a:gistid
  8. endif

and I changed line 299 to:

  1. if winnum != -1
  2. if winnum != bufwinnr('%')
  3. exe "normal \<c-w>".winnum."w"
  4. endif
  5. setlocal modifiable
  6. else
  7. exec 'silent edit gist:'.a:gistid
  8. endif

and everything worked just like I wanted it to. What an amazing little plugin for Vim! I can now edit, post, update, pull - all the things I'd like to be able to do on a gist, now from within Vim. What a treat.

Google Chrome dev 11.0.686.1 is Out

Wednesday, March 2nd, 2011

Seems the Google Chrome dev 11.0.686.1 is out with a fix for an HTML5 issue about playing videos on Vimeo.com. I guess they had to plug it into their Flash player or something. It's sad they dropped the embedded video tag, but they did, and I'm guessing this is collateral damage. So they quickly pushed out an update that fixes this issue. No surprise there.

Switched Back to ZeroMQ for my Ticker Plants

Wednesday, March 2nd, 2011

ZeroMQ

This morning I swapped out the UDP transport system I'd written over the last few days for the ZeroMQ-based one that I'd been using for months prior to that. I have the feeling that because of the bugs I fixed in the rest of the codebase, it's very likely that the ZeroMQ transport system will work just fine. Additionally, I've been able to get the master of ZeroMQ on GitHub to compile and work with OpenPGM, so it's possible to update all our installs of ZeroMQ as well.

We'll see what happens today. If we can run all day at the same levels as we did yesterday with the UDP transport, then I can stop trying to put reliability into my UDP transport and just go with ZeroMQ. That would be nice. One less thing to do.

Hammering, Banging, Pushing and Pulling – Searching for Speed

Tuesday, March 1st, 2011

GeneralDev.jpg

Today has been a reasonably successful day as long as we stretch the meaning of 'today'. Everything has been running well for the next round of tests, but I was still trying to get just a little more speed out of the UDP broadcaster, or maybe the NBBO engine. When all you do is look at the logs of a process you get thinking that maybe you can cut that 200 msec to 100 msec, and if you can do that then you can speed everything up. It's a vicious cycle.

So I spent a lot of the day working up different approaches and testing them against what I've already got in the test system. In every case, what I had was faster. Darn. But maybe it's good news in that I'd done a good job already, so I didn't have any real improvement to do. Like I said, it depends on how to define 'today'.

I have a successful system, but it was already that way. I just proved that I couldn't make it any better. Well... that's something.

Google Chrome dev 11.0.686.0 is Out

Tuesday, March 1st, 2011

V8 Javascript Engine

This morning I noticed that Google Chrome dev 11.0.686.0 was out and with it a few really nice features: the V8 engine is now 3.1.6.1, and the accelerated compositing is on by default. They also fixed some bugs in the Mac code related to the new infobar. In all, a nice upgrade and I'm hoping to see a little bit of that faster rendering, but we'll have to wait and see.

ZeroMQ master at GitHub has OpenPGM Working Again

Monday, February 28th, 2011

ZeroMQ

Now that I have good results with the UDP-based delivery protocol, I thought I'd take the time to go back and see if the master branch of ZeroMQ at GitHub was working yet. The problem I've been having in the past is that there were some significant changes in ZeroMQ post-2.1.0 release that broke the OpenPGM protocols. I've been working with the maintainers to try and fix these things, but the best I can do is test as I haven't dug into the code and unraveled all the changes they made.

Last time I checked, it was still broken, and then I started on the UDP-based transport and haven't looked back - until now. It's pretty easy:

  $ git pull
  $ ./autogen.sh
  $ ./configure --with-pgm
  $ make clean
  $ make

and then in src/.libs I have all the libraries I need. I simply put them into the LD_LIBRARY_PATH before the installed libraries and I'm in business.

I'm very pleased to say that as of this morning, it's working again! Yup, working like a champ. I'm not positive what all the changes include, but they are significant, and possibly much needed. There's still the issue of everything being so "hidden" in ZeroMQ, but that's nothing new. It is, however, a downside when we have a competing transport that we can use that's not so heavily encapsulated.

Still... I haven't made the UDP transport reliable, and I know that's needed. So the question will be, what's the effect of swapping out the UDP-based transport with the latest ZeroMQ transport? I'll have to wait until everyone is happy with the UDP-based transport, and then swap back in the ZeroMQ one and see the difference.

If I had to bet, I'd say the difference will be minimal. I believe all the problems were really mine. Humbling to admit, but I believe it's the case.

Finally a Win for the Good Guys!

Monday, February 28th, 2011

smiley.jpg

Well... it's been a while getting here, but I honestly think that this morning was a break-through for my ticker plants. The maximum delay was considerably less than in the past (thanks to finding the silly second consumer thread) and even far better than I remembered from the other group's tests. It's a big day for the good guys, and it's been a long-time in coming.

To be sure, a lot of the problems I came across were because of the changes in the target market. But it took me quite a while to get all the details worked out. But Holy Cow! It's a great feeling.

Multiple Transports for Ticks – Political Issues Abound

Friday, February 25th, 2011

cubeLifeView.gif

Well... with the last threading bug figured out for my new UDP transport, I have to wonder if ZeroMQ is really just fine. There are several reasons to use it - OK, one: reliability, but that's a biggie. Still, it might be nice to try it again. The UDP transport, if I can add in the necessary reliability, will be better, for sure, because it's less "black box" and more transparent to all involved. It's certainly already suppressed a few nagging naysayers because they feel that PGM is not reliable at all, and they'd rather have straight UDP. Giving that to them allows them to feel like they have a victory in this "battle", and so adoption will go a little easier.

In short: Politics.

Every time I come to a crossroads like this where a technical decision is given a back-seat to a political decision, I've always felt the wrong decision was made. Always. I'm hoping to give them their UDP transport, and make it technically better than what they have, and so not have to come to this decision.

Alternatively, since they have been so negative in this adoption, it would be nice to simply do all the work myself and then point out to management how we can "re-task" these "resources" to "different efforts". Problem with not being part of the solution is that after the solution is done, you've proven yourself to be unnecessary.

Right out of a job.

Anyway... for now, I have one good transport, and I think I have two. But I'll have to run with these changes for a bit to convince people it's as good as what they have. Gotta prove they aren't needed any more. Sad but true.

So Amazingly Easy to Make Threading Mistakes

Friday, February 25th, 2011

bug.gif

Today I've been fighting a lot of issues in my new UDP-based transport for my ticker plants, and I've tried changing queues - multiple times, more logging, structural changes - all kinds of things that seemed to be the issue, but the real issue seems to be that I was not being careful enough starting my threads. Yup... threading is hard, and it's easy to make mistakes.

The set up is that I have a lot of "single-consumer" queues in my system. If you try to hit them with multiple consuming threads, you're going to be very sorry. Couple that with the fact that I was trying to be clever in my thread initialization and processing code and "covering all my bases" by putting redundant code in the initialization and processing blocks. The problem is, if you get the initialization complete, the next thing is the first trip through the processing code, and if you duplicate the code there, but have an off-cache flag set, it might appear as though you haven't properly initialized the threads.

So you start more.

Ouch.

The obvious answer is don't try to be cute. Have code in one place and one place only and then make sure it's successful. When I did that, all the other problems "magically" disappeared. Amazing.

Not really. It's obvious if you have multiple consumers on a single-consumer queue you're going to get into trouble. Also, it'll appear as though you are loosing data, when you really aren't. It's a mess.

Well... it seems to have cleared up and I'm glad. It's been a major pain.