Archive for April, 2008

Coalescing Queues and the Myth of Real-Time Data

Tuesday, April 22nd, 2008

MarketData.jpg

It's interesting working with developers and users when it comes to market data. They are often convinced that they need real-time data for prices and calculations driven off prices and they don't really stop to think Why? I do that a lot, and have come to the conclusion that most, and by 'most' I mean anything involving a human being in the loop does not need real-time data. Period. Here's why.

First, there's the calculations. Most applications aren't simple tickers - those are the trading apps, and they need prices that are as close to the market as possible, but then again, if you're trying to watch a few hundred symbols, the odds that you have a powerful enough desktop machine to actually keep up with the ticks from a data source like Reuters is iffy at best. You need it to be as close to the market as possible while not crushing your machine and making it virtually a single-use terminal for data.

So if you have calculations, like exposure, or running P/L, and it's aggregated in any way, then there's very little chance that you have an efficient enough system to actually handle all the ticks that a real-time price feed can dish out. Getting backed-up isn't the answer because then you're behind the market and still have to play catch-up. Nope... you need to be intelligent about what you do.

Secondly, even if you could keep up with the flow, the human watching isn't going to be able to respond to all the ticks individually - heck, it takes us 0.7 sec to hit the brake in an accident situation, there's no way someone if going to respond to a tick for a very liquid stock twice (or more) a second. No way. Automated trading systems are a different beast, but they don't have a human in the loop.

So... the reality of the situation is that for price data feeds you really need a good set of prices. Something that's very close to the market, say less than 3 sec, but not real-time because that's too much. Problem is developers want to have real-time systems because they sound neat. Yeah... I can see that, but it's not reasonable, and when you try and tell them this they aren't at all interested. Rather, they tell you that they can make it happen... and they've thought it all through and have all it takes to do it. This is must likely when the buzzwords come out with newfangled messaging systems to boot.

So you have to back up and explain the realities of these feeds to them. It takes about 30 to 45 mins to get through to most decent developers, and then they start to see the real scale of the problem. Statements like But I'm only registering for 400 symbols turns into "Yes, but you're registering for the most liquid 400 symbols that is going to be a significant real time load." Oh... I didn't think of that. Yes... I know.

Once it's explained, you end up with the standard market data 'bet': Let's try it my way, OK? It's already built, debugged, and ready to go. If this isn't good enough for you and your customers, then we'll do it the other way, OK? In all my experience I've never had to come up with the 'other' way of doing things.

But today I thought 'Why even get into it? Make something that appears to be streaming, even if it's not?' and so I did. I started by creating a nifty set of coalescing queues (FIFO and LIFO) where the push() method takes a key and a value. The key is the primary identifier and the property on which the coalescing will take place. For prices, this is the name of the ticker, but for other things it could be the address, or primary key from a database. The idea is that the order of the queue will be preserved but if you push() a value onto the queue that's already there (as defined by the key) then the value will be replaced, but the order in the queue will be maintained. This means that if you're using this for market data, the prices will keep updating even when you're not servicing the queue, so that when you do service the queue, the order is maintained but the data is the most recent data possible.

I put this into the client code for my price server and then made it possible for users to subscribe for prices and then 'turn on' the delivery of updates and simply "watch the queue" for updates. The queue has all the thread-safety and conditional code in it to make it very easy to simply ask for something from the queue and then as soon as something is ready, it's returned and you can process it and start at the top of the loop again. It's easy to put this in a simple service thread that does nothing but pick things off this queue as they arrive.

It's the illusion of real-time without the headaches. I let the users think they are getting real-time prices and not polling when they really don't know the mechanism that's getting those prices into the queue in the first place. Additionally, they aren't having to deal with identical prices and filtering them out - I do that before I put the prices on the queue in the first place. What it does is really short-cut the argument a bit by saying "try this, and let me know" - and then not hearing from them ever again.

Beautiful solution. I love it.

Interesting Evolutions in Technologies – Chat, FTP, Telnet

Tuesday, April 22nd, 2008

Colloquy.jpg

I've been involved with computers since before the days of widespread Chat, ftp and telnet. It was initially modems with teletypes or (if you were lucky) a glass tty (terminal). There was nothing to download to - save the box you were logged into, and then it was pretty much your entire world. If you wanted to look at another system you hung up and dialed it. Period. It was all we had, and in relation to what came before (keycards and terminal rooms) it was heaven on earth. But things changed.

Ethernet changed a lot, we got networks of machines in college - not yet PCs, and the old hobby machines of the 70's didn't have ethernet cards in them. If you wanted to have them talk to one another you did a special serial cable with Tx/Rx crossed and then you were good to go. But by the time you had widespread adoption of ethernet, you had networks, and when you had networks it didn't take long to have telnet, ftp and chat.

I can remember first using ftp to get at things across the globe while in grad school. There were newsgroups that might publish the site and directory of something useful, and then with anonymous ftp, you could go get it. It wasn't fast, and there were no Google-like search engines in place... you had to log in, search the directory tree for the file you were looking for... download it and hope it made it down before you lost your connection. Still, this was big computer to big computer, it wasn't until the PCs came out that you really saw the growth of the Kermits, XModem, YModem, ZModem file transfer protocols. FTP was almost forgotten.

But IRC Chat staged a massive growth, and telnet was the way to get from one machine to another. So not everything was forgotten. But fast-forward to today. It's amazing to me to see that ftp, chat, and telnet (ssh) are as strong now as they were in the early days - not because something better hasn't come along, but because the work put into the documentation way back then.

Look at the RFC for FTP or Telnet someday... it's amazing the detail they went into. I've implemented both, and was immediately impressed that the RFC was right on point with the intended audience - people wanting to understand and implement the protocol. The docs are very well written, complete and detailed, but not overly verbose and wordy. These things were written by people wanting to pass on this knowledge to others in the industry and make sure that there were no lingering questions and problems.

FTP is, to this day, a great way to get files around, and is in every web browser around. Telnet (SSH) is virtually unchanged - save for the additional security, but is just as useful. It's Chat that seems to have taken on almost mythical proportions.

Look at all the Twittes, IMs, IRC Chat clients. It's all the same, basic premise - I type and you see it, you type and I see it. They first version I used was even called type on the Unix BSD4.x systems at Purdue. But look how many ways you can now communicate through this little concept. You've got the store-n-forward of SMS or Twitter... you've got the IRC Chat and a ton of different IMs. It's as if this - communication between people - was the real killer app of the network. Sharing a computer meant files and resources of one were available to another. Putting the machines on different desks took that away - only to be brought back by the network.

I use Chat (Colloquy), IM (Adium), and Twitter (Twitterrific) all day every day. It's fun to think about where things started and how they have evolved over the years. I'm sure the video chat will be bigger when the bandwidth is there, but right now, it's just not. Give it time, though, and it'll be just like all the Sci-Fi movies you've seen. It's a great time to be alive.

Getting Followed on Twitter by a Stranger

Monday, April 21st, 2008

Twitterrific.jpg

OK, this is a really odd feeling... I got an email that someone is following me on Twitter, which is not a big deal, but the odd thing is that I don't know this guy, and I can't imagine why he things the tweets I've sent are of such quality that they all deserve to be read.

I mean, it's not that I didn't want people to follow me - but I expected friends, co-workers, ex-co-workers, that kind of stuff. I didn't expect total strangers to think what I had to say 140 bytes at a time was worth really reading. Kind of wild. No... really wild. I guess this might be what it's like to have people want to read a comic you wrote. You did it for a reason, but didn't really expect some stranger to like it. Wild.

Who knows... it could be someone following 32,768 people, so it's not like there's something valuable I have to say. Could be someone that follows everyone. Anyway, it was odd enough to warrant a post.

Totally Blown Away by MacVim

Friday, April 18th, 2008

MacVim.jpg

I've been a big fan of vi and Vim for a while. I started using it back in grad school and it's been on any Unix system (or Windows for that matter) that I've ever come across. Mac OS X has shipped with it in the 'console' mode since it came out, and yet they haven't spent the time to make it really a Cocoa app - and it's understandable. Priorities.

So when I came across the Vim for Mac OS X web site I was really jazzed. They had, essentially, gvim for Mac OS X. Nice. There were several things that kept me from using it full-time on the Mac and those were primarily limitations in gvim itself - to have multiple windows you had to have multiple gvim instances. But then today I was checking to see if the Vim for Mac OS X web site had an update from the 7.0.224 it's had for a while, and I went to the Vim wiki and it led me to MacVim.

Amazing. Nothing short of brilliant and stunning.

This guy, has put gVim to the level of a regular Mac OS X text editor. Multiple windows in the same running application instance... tabs to show multiple file buffers in the same window... transparency on the windows - it's amazing! It's a complete Mac app, but it's Vim!

I can leave it running without a window open, I can open multiple files in a window - open multiple windows, Cmd-W to close the window... it's everything that I had hoped for in Vim and it's working on Mac OS X now. It's even got code to check for updates! This is without a doubt the way to enjoy Vim on the Mac.

UPDATE: OK... I'm about as jazzed as I've been in a long time. This release of MacVim is amazing! For BBEdit, I built etags and get them into the Makefiles of a few projects. While it's not perfect because it'd be very difficult to be able to determine the context of the method invocation, for a lot of things, tags are really useful. It's nice to be able to jump around the code easily without having to move your hands from the keyboard. I guess that's the thing I like most about Vim - it's all Old School - just like me.

Update Fever Week – Like Shark Week only Better

Friday, April 18th, 2008

Safari.jpg

Yesterday Apple updated Safari to 3.1.1 fixing two bugs but not putting in the new WebKit that's passing the Acid3 test 100/100. Too bad. I was looking forward to that, but they may have a lot more work to do on that guy before it's ready for release. Still, it's nice that they keep on top of the problems folks have found and release updates as often as they do. Sometimes it's still amazing how big a memory footprint Safari can get to be on my laptop - easily passing the 200MB size with only two tabs open. The cache and 'history' of the working browser has got to be enormous. Anyway, it's great to have that updated.

Transmit.jpg

The next update I got this morning was Transmit 3.6.5, and while I really like the work the Panic guys do, I'm still a little surprised that they don't have a more 'minimal' interface to Transmit. What they have is fine, if you're using an FTP client like a file browser, but most of the time for me it's more like a small extension to the system. Make it as seamless as possible - no need for big borders - more like Cyberduck. That's a minimalistic interface that I like for file transfer. Nothing that gets in the way or takes up pixels that it doesn't need. I like all the features, and I appreciate what they are doing, but I think it'd just be nice to be able to have an alternate GUI that was less into borders fluff, and more into "just the goods".

Camino.jpg

The last update this morning was Camino 1.6. While I'm not a big user of Camino, I can appreciate what they are doing, and at times it's nice to be able to fire up another browser and see if the web site I'm having problems with works any better with Camino than Safari. It's fast, clean, and a nice Mac app, so you gotta keep up to date and make sure they didn't throw in something really cool that could make it the best browser on the platform. This release is supposed to have a few nice GUI improvements and I'm sure they're nice. I just haven't had a lot of time to play with it. Maybe this weekend.

Like any self-respecting computer goober, I love getting the updates and seeing what others have done with their code. In each of these cases, it's more than just an app - these are things that the developers are seriously proud of, and they should be. This is some of the best software out there, written by passionate, dedicated folks. And it shows. Gotta love that.

Giving Twitter (and Twitterrific) a Go

Thursday, April 17th, 2008

Twitterrific.jpg

I was chatting with an old friend this morning and he asked if I had done anything with Twitter, and I told him that I'd looked into it, and the Mac client, Twitterrific, but hadn't done anything. He mentioned his primary reason was the same as mine - it seemed that a lot of indie Mac developers are using it, and it might be a way to learn something. However, I had looked at enough posts that I was pretty sure that the twits were not about coding as much as they were about personal-stuff. Which isn't bad, it's just not technical.

But since he asked, I figured it'd be something I'd give a go. So I went to Twitter and signed up as drbobbeaty and then downloaded Twitterrific and got that configured and ready to go. Then we sent a few tweets back and forth, and realized that it's not as immediate as IM - which I use all the time, but it does have a 'chat room' component to it that IM doesn't have. You can post a tweet and many people can see it at roughly the same time. While I do wish it were more of a 'service' such that the connection wasn't polling the web site, I can see that it's done this way to make it more reliable as the connections don't have to stay up and that helps get things going faster.

I was pleased that Twitterrific was a serious Mac app - looking and acting like a solid Mac app and not like something written in Java using Swing. That's always nice. Also, it seems like there's a critical mass on this app, so if anything is going to get where I'd like it to be, this is the most likely candidate. So I registered it. Only $15, and to support indie Mac developers, that's a good deal.

I'll see what happens as we go... I think the real thing will be if I monitor enough 'friends' to make it interesting, or just enough to make it like IM. And for IM I don't think it's possible to beat Adium. Well... I'll give it a go with Bret, and see where it goes from there.

Weird Ethernet Problem with Avahi and HAL on Fedora Core 5

Tuesday, April 15th, 2008

tux.jpg

I got an email this morning from the UnixEng crew - a good lot of guys and gals, to be sure, and they were saying that one of the machines we have here in the development group was having a ton of network errors. The switch was set to 100/Full, and typically, that's what we need because we've learned that the auto/auto negotiation seems to have more problems than it's worth.

So the network guy switched the port to auto/auto, and that helped a little, but I wanted to set the linux box to 100/Full because I know that's better than leaving it full auto. Typically, we do this when we build the box, but a lot of times it gets skipped (forgotten) because things start out working, only to fail at a later date.

Anyway, the nice tools to remember here are ethtool and ifconfig. If you run ifconfig and look at the data for the ethernet port (eth0 in this case), you can see if there's a likely problem in the port.

Typically, you might see:

   eth0   Link encap:Ethernet  HWaddr 00:17:A4:99:07:EF
          inet addr:146.180.7.94  Bcast:146.180.7.127  Mask:255.255.255.128
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:123456 errors:0 dropped:0 overruns:0 frame:0
          TX packets:123456 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelength:1000

but if you see large numbers for the errors or the collisions then you're probably looking at a duplex mismatch. The way to see what your card is set at right now is to run ethtool on the port:

   ethtool eth0

and a lot of nice, useful information about eth0 is going to spill out.

If the network card needs to be forced into a mode, then the easiest way to do this is to edit /etc/sysconfig/network-scripts/ifcfg-eth0 and add the line:

   ETHTOOL_OPTS="speed 100 duplex full autoneg off"

and a quick reboot later and things should be fine. That is they should be. Today it was something different.

When I made the changes to the /etc/sysconfig/network-scripts/ifcfg-eth0 file and rebooted, the services Avahi and HAL failed to start. When HAL failed to start, the box hung. Thankfully, I could get it to boot into single-user mode and start to look at things.

Google pointed to SELinux, but that was very clearly disabled on this box - as it should be. We looked at a lot of things - X11 drivers, etc. In the end, I was faced with the fact that it was the only thing that has changed on the box, and so I removed that one line.

Bingo! That was it. Now what's odd is that that line is on another box with the same hardware spec - so we're baffled as to why for this machine there's a problem. It could be the version of the kernel or packages - a yum update would help, but it's working now, so we may put that off. It's also possible that it's the switch ont he other end... but that's going to be very hard to pin down.

In the end, we're going to keep an eye on it, but what a mess trying to fix something like this.

When Standing Still Means You’re Falling Behind

Saturday, April 12th, 2008

Solaris.gif

I really love Sun hardware and software. My first exposure to Sun workstations was the 3/50 in grad school and it was the most impressive box I'd ever seen. Oh, I'd had an Amiga, and it was nice, but when I was at school and could work on a 3/50 - or later a 3/60, it was an entirely different experience. Night and day. Nothing beat Sun stuff. Nothing.

Then I went to Auburn, and again, Sun was king. Oh, we had PCs, and I had a nice Mac II, and later a IIci, but if you wanted to crunch numbers and display them nicely, Mac OS 6 or 7, or Windows 3.11 just didn't cut it. You had to get Sun.

Little did I know at the time that had I seen an SGI workstation, my opinion of Sun would have changed, and I would probably never have looked at Macs the same again. At heart, I'm a scientist and a numbers guy. Scientific apps and data visualization is what I love doing most.

Anyway... Sun ruled the roost. Then they stopped innovating. Solaris 2.5 was nice, 2.6 was good... heck, they all are nice, but they aren't anything in comparison to linux or Mac OS X these days. And here's the case that brought it to a head this week.

I've been working on writing a C++ wrapper for libcurl and I wrote it on the Mac where I knew I had a recent version of libcurl, with 32 and 64-bit support. Then I moved it to linux -in this case Fedora Core 7 on x86_64 and it again compiled without a hitch. Then I moved it to my Sun box to make sure it built there. I have to do this, and usually do it in this order as it's the fastest and easiest way I've found to get the cross-platform builds working.

When I went to Solaris I had to add an include for bzero() which on the Mac and Linux is in the more standard includes, but on Solaris 8 (9 and 10), it's in strings.h. No big deal, I added that. Then started the build again. It compiled the 32-bit version just fine and then hit the wall at the 64-bit version.

Oh, my code compiled just fine - it's generic C++ with a little STL, but it's the linking phase that messed up, and that's where Sun has been standing still.

Sun was one of the first to support 32-bit and 64-bit on the same deployment on the same hardware. It was an elegant solution - two sets of files. True, Apple had a trick up it's sleeve in it's historical use of the quad-FAT binary (HP-UX, Solaris, Intel, Motorola). This allowed them to jump ahead to a single file with multiple architectures and word-sizes. Linux copied Sun's ideas and delivered multiple libraries pretty early on, which brings everyone up to about the same level.

Almost.

The problem is that Sun has failed to recognize that libraries like libcurl, openssl, zlib, and the others that I've gotten from SunFreeware.com need to be available in the default installation of Solaris. I've got Solaris 8 and 10 and what's missing form them is significant. I know they have SunFreeware, but even there, they don't seem to have multiple architecture binaries of the packages. If they had, I'd be a happy camper. I have already gotten most of SunFreeware installed on my machines, but it's all the single 32-bit versions of the packages.

I can't be the only one that wants 64-bit versions of these libraries. Maybe Sun doesn't have to include them in the OS, but it'd be nice if I could get a CD of all the libraries on a linux box built into packages for installation. Disk space is cheap... install all of them both 32 and 64-bit. Then, should I want to make something in 64-bit mode, I can.

So I'm left looking at the Sun box and thinking how nice it could have been had Sun just kept moving while it was ahead. Keep moving on the UltraSPARC chips... innovate... be the best of the best. But in the end, clever people with commodity hardware have essentially put Sun in the position of an also-ran, and that's sad.

Like when I heard SGI was filing Chapter 11. Sad.

So I've emailed Steven C. at SunFreeware and asked him what it might take to build OpenSSL and zlib and curl for 64-bit on Solaris 8 and 10 SPARC. I expect that it'll be more than I want to do at this time. It's getting harder and harder to justify Sun hardware when I can get an 8-core x86_64 box with amazing speed for next to nothing. Cost will drive the market.

I'd love to keep working on Sun, and I'll never get rid of my SPARC 20, but I can see a time very soon that work will be driven off linux, and it's a solid platform that I've enjoyed using for years. It's just too bad that Sun was in the lead, and they stood still.

UPDATE: I've received email from Steve of SunFreeware that the most recent versions of Solaris do have things like OpenSSL, zlib, and curl in 64-bit as part of the install. If that's the case, then I'm happy for Sun, but a little disappointed that it wasn't made backward compatible in the sense of Solaris 8 and 10.

Creating a Nice C++ Binding for cURL

Friday, April 11th, 2008

CKit.jpg

The other day a co-worker came up to me and asked me if I knew anything about this particular project in the Bank for getting index decompositions - basically, the instruments that make up the different indexes on the world markets. I hadn't heard about the project, but was very interested in the idea of adding it to my market data server as an additional data source.

So I looked at their web site and they had APIs in Java and C#, but nothing in C++ for several more months. I could wait, but I decided to fire a note to the developer as I'd worked with him in the past, and didn't know if he might have had a pre-release C++ API.

He mentioned that for the request/response work I'd be doing, the WebServices API was every bit as fast as the C++ was going to be, and since I could use the WebServices API right now, I decided to give it a go. Problem was, in CKit, I didn't have a nice way to get data back from URLs where you have any kind of complex request. So I decided to build one.

I looked around, and it was pretty clear that the best web services tool was going to be based on cURL. I know it's on every linux box, I can get it from SunFreeware.com for my Solaris boxes, and it's part of Mac OS X. It would handle a ton of different protocols and options, and it was pretty simple (looking) to work with. So I set off wrapping up cURL into CKit.

I have to say that I was more than a little surprised about the state of cURL. First, it does cover a ton of platforms. It's also got a ton of features. But what amazed me was the seemingly lack of attention to the details of really using the code. I mean it's not hard, but it is on ver. 7.18.1 and by that time I would have expected that they would have figured out how to get rid of these issues:

  • Global Initializer - this blows me away. I read the docs and they say that it's because some of the libraries cURL uses are not themselves thread-safe, but to say that the cURL global initializer is not only not thread-safe, but needs to be done with only one thread active in the application is downright crazy... and sad. I've done what I can to try and make it as nice as possible, but it's certainly possible that I'm going to run into serious problems because of this. I just hope it's a good plan for the simple stuff.
  • Keeping String Pointers - while I can certainly understand why they don't copy arguments passed in, there's no reason to have the 'easy' interface do that. Face it, 'easy' ought to mean 'fool-proof', and the way to do that is to control as much of the data as you can. Copy those arguments - don't require the caller to retain them for as long as you might need them. How's he to know when you're done with them?
  • URL Structure Knowledge - this is something that I think they should have done - don't require the developer to know how to for a URL. Why make the developer encode the data when you know full well how to do it? Have the user give the API the data it needs and then have the API piece it together, encode it as necessary and then ship it off.
  • Field Manipulation - when you add the POST variables to the handle, why not make it so that you can add them as key/value pairs? Why make the user encode them as a single string (which he has to keep around) and then pass them to you? Make it smarter than that. It's not hard - a list of key/value pairs - it's all strings, anyway. Make the data in the handle more manageable than it is now.

These are just the biggest problems I have with cURL. I mean it works, and it's found on almost all platforms, so I'll keep using it, but when I think that there has to have been a ton of revisions on this and these things aren't addressed, it make me think that the person writing it isn't really thinking about how it's being used.

That said, the CKURL does work, and does overcome each of these limitations. It doesn't require the user to do any global initializers, it copies all the data it needs from the arguments passed in, it creates the URL syntax from the general data you've given it, and it allows general field manipulation. All these things are making it a much more enjoyable piece of code to work with. But at the heart, it's still cURL. I'd just love to see them make a really 'easy' version.

Wild IPv6 Conflict with Name Resolution

Tuesday, April 8th, 2008

tux.jpg

Over the course of the last few days, I've been having problems with getting a certain hostname resolved on my linux workstations at work. Basically, everything was fine until I tried to reference the host chatcentral and then apps like traceroute would hang for 20 sec and then work normally. Ping worked just fine, however. Very odd.

This first came up when I was trying to connect to this host with a simple C++ and Java socket connection. We did the migration to MindAlign recently, and I had written native connection clients to the MindAlign system and when switching over on my development workstation it appeared that the connection was hanging - in reality it was timing out on the hostname resolution. I didn't give it 20 sec, and thought "Hey, it's dead". I figured that when I was using different machines things worked fine, so I did the rest of the migration on those machines. But it was still bothering me, so I came back to it.

What I found with a little bit of Googling was that IPv6 was getting in the way of the normal host name resolution chain. If I put the machine into my /etc/hosts file then all was fast and well. But if I didn't want that, I needed to do something to get rid of the timeouts. With the help of a local Unix admin, we realized that the timeout was on the name lookup for IPv6. From there, it was a simple Google to see that the solution was to add the following line to /etc/modprobe.conf:

    alias ipv6 off

After a reboot, this fixed the problem.

Interestingly, this doesn't seemingly disable IPv6 as ifconfig -a still shows the IPv6 address for eth0, so I'm not sure exactly what's happening, but in this case, where the folks didn't put the hostname in the primary YP server, this really helps.