Erlang Ring Benchmark from Chapter 8 of J Armstrong Book

erlang

I've been trying to learn erlang from the book Programming Erlang by J Armstrong, and one of the first real challenges was the exercises in chapter 8 where he challenges me to:

Write a ring benchmark. Create N processes in a ring. Send a message round the ring M times so that a total of N * M messages get sent. Time how long this takes for different values of N and M.

Write a similar program in some other programming language you are familiar with. Compare the results. Write a blog, and publish the results on the Internet!

One thing I think he missed in the problem statement is that a message passed from one node to another really should have a response sent by the receiver. In my design, I planned to have a 'ping' message sent, received, and a 'pong' message sent back. The receipt of the 'pong' message would be a no-op, but it needed to be received. Other than that, the design was just like the problem statement.

My solution to the problem is this:

  1. -module(ring).
  2. -export([start/0, test/2]).
  3.  
  4. %% make the state container for the ring node
  5. -record(state, {next, origin, caller}).
  6.  
  7. %% standard entry point for a 1000 node, 500 cycle test
  8. start() ->
  9. test(1000, 500).
  10.  
  11. %% make a synchronous message call to the pid and wait for response
  12. rpc(Pid, Request) ->
  13. Pid ! {self(), Request},
  14. receive
  15. {Pid, Response} ->
  16. Response
  17. end.
  18.  
  19. %% main messaging loop for all nodes in the ring
  20. loop(State) ->
  21. receive
  22. %% the head of the ring needs to know it's the origin
  23. {From, {origin, Origin}} ->
  24. From ! {self(), Origin},
  25. loop(State#state{origin=Origin});
  26.  
  27. %% building the ring is a countdown of creations
  28. {From, {build, Count}} when Count > 1 ->
  29. Node = spawn(fun() -> loop(State) end),
  30. rpc(Node, {build, Count-1}),
  31. From ! {self(), Count},
  32. loop(State#state{next=Node});
  33. %% ...to the final node that circles back to the origin
  34. {From, {build, Count}} ->
  35. From ! {self(), Count},
  36. loop(State#state{next=State#state.origin});
  37.  
  38. %% starting the test kicks it off and saves the caller
  39. {From, {go}} ->
  40. State#state.next ! {self(), {ping}},
  41. loop(State#state{caller=From});
  42.  
  43. %% the ping needs to answer and then stop or continue
  44. {From, {ping}} ->
  45. From ! {self(), {pong}},
  46. if
  47. State#state.origin =:= self() ->
  48. State#state.caller ! {self(), 1};
  49. true ->
  50. State#state.next ! {self(), {ping}}
  51. end,
  52. loop(State);
  53. %% ...the response to a pong is to do nothing
  54. {_, {pong}} ->
  55. loop(State)
  56. end.
  57.  
  58. %% build a ring o 'N' nodes, and run through this 'M' times...
  59. test(Nodes,Cycles) ->
  60. io:format("starting the build and exercise of the ring...~n"),
  61. statistics(runtime),
  62. statistics(wall_clock),
  63. State = #state{},
  64. Head = spawn(fun() -> loop(State) end),
  65. rpc(Head, {origin, Head}),
  66. rpc(Head, {build, Nodes}),
  67. _ = [rpc(Head, {go}) || _ <- lists:seq(1,Cycles)],
  68. {_, Runtime} = statistics(runtime),
  69. {_, Walltime} = statistics(wall_clock),
  70. U1 = Runtime * 1000 / (Nodes*Cycles),
  71. U2 = Walltime * 1000 / (Nodes*Cycles),
  72. io:format("total cpu=~pms ... ~pus/op and wall=~pms ... ~pus/op~n",
  73. [Runtime, U1, Walltime, U2]).

There are several things I think are important watershed events in the code that really started to solidify my understanding of erlang. I think it's worth going over them to make sure it's easy to follow along.

There are Only Functions

Seems odd, but really the entire language is a series of functions. This may seem obvious to someone thinking Hey! It's a functional language, Bob! but it was't clear to me as I started this exercise. There are variables, but their scope is so limited that it's really just a series of function calls. If you want to build the structure of a ring, you have to have some idea of the head of the ring, the N-1 'other' nodes, and then loop it back to the head. This 'next' state is essential for a node, and it's not at all obvious where that's stored.

In truth, it's stored in the arguments to the loop() function. This was my first Ah! Ha! moment:

All state is maintained as function arguments.

Seems silly, but I wish he'd said that in the book. It sure would make things a lot easier. Again, think superconductor. You have state maintained in the "execution ring" of the typical loop() function. Once I got that, it was clear to stop trying my other methods.

State is Held in Records

Passing all this state-based data as arguments to functions gets ugly very fast. So the solution was to create records. Second Ah! Ha! moment:

State is conveniently held in records that are easily updated in parts.

This was major as it just isn't stated in the book that there's a reason for these records, and that state maintenance is it. They could really have said something and made it far easier to catch the major points.

Initializing Processes is a Method Call (or Two)

Because I create a process with the spawn() function, if you want it to refer to itself, other than the self() function, you have to send it a message. Lines 22-25 handle the method that's used to tell the Head of the ring that it is, in fact, the head of the ring. Since there's no state in the process other than what it maintains in a calling loop, you have to start that loop, and then "feed it" the data that it can "piece together" to form the complete state you want it to have.

This is more than a little complicated, because you really can have state in a process, but that state is really just held in a "ring" of looping calls like electrons in a superconductor. You have to set up the conditions under which they will flow, and then insert the data that flows.

I get it, but larger, more complex systems might be a real pain to keep straight. We'll have to see how things go.

Results

When I ran this test I got the following:

  29> c(ring).     
  {ok,ring}
  30> ring:start().
  starting the build and exercise of the ring...
  total cpu=2040ms ... 4.08us/op and wall=1692ms ... 3.384us/op
  ok
  31>

Now I haven't written my C++ equivalent - yet, but there's no way I'm not going to be able to beat this. First off, the CPU time is longer than the wall clock time? That makes no sense. I've double-checked the code, but yeah, it's longer. Even so, 4 μsec/op is not all that fast for as simple as it is. Again, I'll have to write the C++ version and see, but I'm guessing to be really able to beat this handily.

We'll see.

[3/14] UPDATE: I just made a C++ equivalent of this erlang code and it's not too bad. Yeah, it's about twice as long as the erlang code - in terms of number of lines, but it's clean, and it's got a lot more error checking than the erlang code does.

/**
 * ring.cpp - this is the C++ equivalent of the Armstrong Chapter 8
 *            exercise where you are supposed to make a ring of 'n'
 *            objects and have one fire another for a total of 'm' laps.
 */
//  System Headers
#include <stdint.h>
#include <iostream>
#include <sys/time.h>
 
//  Third-Party Headers
 
//  Other Headers
 
//  Forward Declarations
 
//  Public Constants
 
//  Public Datatypes
 
//  Public Data Constants
/**
 * These are the different messages that we're going to pass around
 * from Node to Node. They will be simple uint8_t values as they don't
 * need to be anything special.
 */
#define PING    0
#define PONG    1
 
 
 
/**
 * This is the node that will make up the ring. It's got a nice pointer
 * to the next Node in the ring and a few simple methods to make the
 * ring a little easier to build and use.
 */
class Node {
    public:
        // Constructors and Destructors
        Node() : mNext(NULL), mStopOnPing(false) { }
        ~Node() { }
 
        // Accessor Methods
        void setNext( Node *aNode ) { mNext = aNode; }
        void setStopOnPing( bool aFlag ) { mStopOnPing = aFlag; }
        bool stopOnPing() { return mStopOnPing; }
 
        // send the message to the target where it can respond
        bool send( Node *aTarget, uint8_t aMessage ) {
            bool        error = false;
            if (aTarget == NULL) {
                error = true;
            } else {
                error = !aTarget->onMessage(this, aMessage);
            }
            return !error;
        }
 
        // this method is called when a message is sent to this guy
        bool onMessage( Node *aSource, uint8_t aMessage ) {
            bool        error = false;
            switch (aMessage) {
                case PING:
                    if (((error = !send(aSource, PONG)) == false) &&
                        !mStopOnPing) {
                        error = !send(mNext, PING);
                    }
                    break;
                case PONG:
                    break;
                default:
                    error = true;
                    break;
            }
            return !error;
        }
 
        // this is a simple way to send a ping around the ring
        bool ping() {
            return send(mNext, PING);
        }
 
    private:
        // this is the next node in the ring - wrapping back around
        Node    *mNext;
        // ...lets me know if I need to stop on a PING (loop done)
        bool    mStopOnPing;
};
 
 
/**
 * This method just gives me a nice microseconds since epoch that I can
 * use for timing the operations.
 */
uint32_t snap() {
    struct timeval tp;
    gettimeofday(&tp, NULL);
    return (tp.tv_sec * 1000000) + tp.tv_usec;
}
 
 
/**
 * This is the main entry point that will build up the ring and then fire
 * it off 'm' times and then we'll see how fast it runs.
 */
int main(int argc, char *argv[]) {
    bool        error = false;
 
    // start off with the defaults for the program
    uint16_t    n = 1000;
    uint16_t    m = 500;
 
    // start the timer
    uint32_t    click = snap();
 
    // now, let's make the ring of the right size, holding onto the head
    Node    *head = NULL;
    if (!error) {
        std::cout << "Building the " << n << " element ring..."
                  << std::endl;
        if ((head = new Node()) == NULL) {
            error = true;
        } else {
            head->setStopOnPing(true);
        }
    }
    Node    *tail = head;
    for (uint16_t i = 0; !error && (i < (n - 1)); ++i) {
        Node    *newbie = new Node();
        if (newbie == NULL) {
            error = true;
            break;
        } else {
            tail->setNext(newbie);
            tail = newbie;
            tail->setNext(head);
        }
    }
 
    // now let's run it the right number of times
    if (!error) {
        std::cout << "Running the " << n << " element ring "
                  << m << " times..." << std::endl;
        for (uint16_t i = 0; i < m; ++i) {
            head->ping();
        }
    }
 
    // stop the timer
    if (!error) {
        click = snap() - click;
        std::cout << "Took " << click << " usec or "
                  << click*1000.0/(n*m) << " nsec/op"
                  << std::endl;
    }
 
    return (error ? 1 : 0);
}

When I run this guy, I get a much different runtime:

  peabody{drbob}23: c++ ring.cpp -o ring
  peabody{drbob}24: ring
  Building the 1000 element ring...
  Running the 1000 element ring 500 times...
  Took 16742 usec or 33.484 nsec/op
  peabody{drbob}25: 

So the time it took for C++ to do the work was 33.484 nsec, and the erlang took 3.384 μsec -- a difference of about 100x - in favor of C++. Yeah, it's that much different. I'm shocked, but only by the margin. I expected erlang to have the code side advantage, but not by a factor of two. And I expected C++ to beat erlang in speed, but not by a factor of 100.

Wild stuff. Very neat.