Archive for the ‘Vendors’ Category

Google’s GDrive Replacing PCs? Can’t Believe It

Friday, January 30th, 2009

NetworkedWorld.jpg

I was watching diggnation today and heard (again) about the rumored GDrive, so I decided to do a little reading to see what it's about. I found this article about it and while I can see Google opening up the storage and making a desktop client for something like the MacFUSE, I can't see this being what is being portrayed in the article:

As the latest rumors surfaced, The Guardian told the world that Google was planning to "make PCs history." This was promptly echoed by the likes of FoxNews.

"The Google Drive, or 'GDrive,' could kill off the desktop computer, which relies on a powerful hard drive," The Guardian burbled. "Instead a user's personal files and operating system could be stored on Google's own servers and accessed via the internet."

Apparently, this is part of a Google-grab scheme to put a Googlephone into every hand. "The PC would be a simpler, cheaper device acting as a portal to the web," the paper went on, "perhaps via an adaptation of Google's operating system for mobile phones, Android."

On diggnation, Kevin pointed out why he (and about a billion other people) would never store all their information on a server they don't physically control: security. When it first gets cracked, and it will, it's only a matter of time and energy, then the crackers are going to go after the people with the most to loose - celebrities. Understandable. But there's more to it than that. Everyone probably has something they don't want others to mess with, and if GDrive isn't backed-up in my house, then if GDrive gets nuked, all my stuff is gone.

Unacceptable.

For under $300 I can get a drive that backs up all my data and does it safely. I'll never loose my photos, the sounds of my kids, my docs, my code. No way. So why hand it over to anyone?

And as for the 'PCs are a thing of the past'... yeah, right... Sun tried that with the JavaStation and it failed just as badly as this will. There's no way a phone will replace a computer. The phone is nice, but it's not something you're going to type a sales report on, or do work for your employer. Employers are never going to use it, do the PC industry will survive, and if they are being built, there are going to be reasons to have one at home.

No, it's an interesting idea, and I might very well throw some stuff up there, but I'll use it like a public or semi-private file repository and that's it. No way I'm trusting it to the most important things I have.

Making an App Fault-Tolerant with Intolerant Components

Wednesday, January 28th, 2009

SwissJupiter.jpg

I noticed this morning that one of my price injectors was hung up in a poor, sick little infinite loop when the service (vendor provided) had died and then I restarted it. I had not coded up the library to close and re-open the connection. In an attempt to make my application fault-tolerant to this service's restarts, I decided to dig in and add in all the pieces I needed to properly reconnect when the service was restarted.

If only it were easy.

The first thing I had to do was to unroll where I was in the processing so that I'd pause what I was doing (or trying to do) when an error with the service was detected. That didn't take too long, but I wanted to make sure I didn't put in the logical equivalent of the goto statement, so it took a little bit of work to handle it properly.

Once that was done I needed to have the main thread detect this condition and then close/re-open the connection. Here's where I really started to run into problems. While the code appeared to be what I needed, I would get just a few attempts and then a double free core dump. Every time. And it was always in the vendor's API code. It seemed that no matter what I did there was no way to avoid this problem. Crud.

So what if I tried to "go around the horn" and exit the app with an error condition and then have the guardian script that started the app, and restarts it in the case of a core dump would see this and restart the app. That's OK, but what if it fails right away? Well... that's the problem. So what I tried next was to put a retry loop on the creation of the connection to the service. Maybe that would work.

Better. It seems that so long as the connection isn't ever really made, you can call the open() call as many times as you need to get the job done. I'm getting a lot closer. Now that I have a way to exit the app with an error and restart it with a retry loop all I had to do was to make sure we didn't litter the directory with core files. The final problem was that trying to close a troubled connection lead to the same double frees that I was getting in the first place.

So I had to put in even more logic to the wrapper classes on the vendor's API so that I could be assured that the application could exit cleanly and then the restart would take care of waiting until the service was up again.

Finally I had something. It took a few hours, but in the end I have a system that's fault-tolerant to the vendor's service restarts and that's what I wanted to build today. It's going to make it much stronger a system. Good news.

Getting a Really Clear View of Python

Monday, January 26th, 2009

python.jpg

I've decided this afternoon that I haven't been able to really give Python a clear chance. I've been using it for over a year as part of this vendor's application - it's the embedded language that virtually everything is done. That, in itself is not bad, as one of Python's strengths is the ability to embed it easily in C/C++ applications. No... it's what they have done to it that makes it hard to get a really good read on Python.

For example, you should be able to run a python script, and upon proper loading of libraries, get all the added functionality of the loaded module - like sybdb. Standard stuff. In fact, you should be able to use any python of the same version on the same box, and if you can load those libraries you should be good to go

The problem is, they have fiddled and monkeyed with the language to the point that this isn't really possible. You can get some things but others are only half-working and others still are completely broken. This means that you need to run their python and set up a very complex environment to get this running.

This represents a huge initial cost to running a python script. No such thing as a quickie... no sir. You have to really want to run a new python script. It's a pain. They are not really flexible. It's a system that makes python look bad. And I'm only just coming to realize what part of this train wreck is the vendor's stuff and what part is the python.

As I get more experience with the system I realize that there are a few things I'm not a fan of that are indeed python. However, they are completely overshadowed by the vendor's mistakes and limitations. It's amazing.

So I'm trying to give python the benefit of the doubt and realize that 99% of all the problems I'm seeing with this system is not the fault of python, but the implementation they slapped around it. Too bad. I'm sorry, python.

UPDATE: case in point: the difference between the '=='/'!=' and 'is'/'is not' operators in an if statement. For example, consider the two code samples:

  if value == None:
      print 'Value is not defined.'

and:

  if value is None:
      print 'Value is not defined.'

The difference is that the equality ('=='/'!=') operators call a method on the objects to do a value comparison, and the instance operators ('is'/'is not') do a instance comparison (pointer comparison) on the two instances. This means that the latter is significantly faster than the former, and in the case of None, it's the preferred method as well as there is one and only one None object in the python runtime. This makes the latter test preferred, and faster. That's pretty cool.

Another of the Many Ways Not to Build a System

Friday, January 23rd, 2009

SwissJupiter.jpg

I was pulled in this morning to a problem with trade committals using a Python server that interfaces to a certain vendor's application. The python server is pretty nice - uses the built-in XMLRPC server and it's as clean and easy as pie to work on. Python - Good. The system it feeds - not so much.

I was getting an exception in committing a new trade to the system. I was getting a very unhelpful exception message, and as a result I had to run this guy several times with the same inputs and different levels of logging statements to see where in the code this was throwing the exception. It's python 2.3, so the really nice exception stack trace printing wasn't available to me, and the custom logging package in use wasn't my pick and I had no idea if it re-directed stdout/stderr.

What followed was about 30 mins of print debugging and about 5 mins of input data conditioning to make sure that none of the incoming data values were illegal and causing the problem. Good stuff to do, but given that this is all system-to-system interaction, this shouldn't have been strictly necessary. Nevertheless...

So I kept looking and finally got it down to the one field that was causing the problem. It was a string that was 23 characters long and the data description in the vendor's docs said it was limited to 20. OK, that's understandable in some systems like old-style client/server stuff. But in this day and age why are we limiting ourselves to 20 chars when we know any decent database has varchar fields and by their very nature, they are variable in length up to a point. Make that point 256 chars, or even 1k - what's the harm? Yes, it might not all fetch back in one packet, but given gigabit ethernet, is this still really a concern?

But even if it is, how about giving me a really useful exception like "Data value out of range" or something like that. Then you can use that for integers that are too big, strings, etc. It is pretty universal and then you are really helping out the guy trying to debug the problem.

As it was, I was forced to check "Why?" by searching around and finding in the docs the limit. But wait... there's more.

This vendor publishes a limit of n, but the limit is really n-1. Why? Good question. If the limit is 20, make the field in the database 21 or something. In fact, most varchar fields can hold up to their maximum, so why the offset? I can only imagine it's something from the designers/developers that is so silly as to be laughable.

I'm not laughing. I'm shaking my head.

So I finally put in the code to clean up the limits on the strings and log that data was getting truncated. Then I passed it off to the guys that were supposed to have figured this out, and they ran the tests and things worked. But we're still not done because with this truncation they have to make sure that it's not going to break anything moving forward. At this point, I don't know and don't really care. It's a messed-up system with horrible exception messages, pitiful documentation, and tech support that's virtually non-existant. I don't like it, and hope soon to be rid of it.

Making Code Deadlock-Proof

Tuesday, January 20th, 2009

SwissJupiter.jpg

I had another lock-up in my tick injector that uses the Vendor library that I've had so much trouble with in the past. It appeared that it was in my code this time, and that surprised me quite a bit because I was convinced that there was no way I could have a deadlock because there was no section of code that had both locks in place at one time. If you don't have a thread that holds two locks then there's no way you can have a deadlock. Delay, sure, but no deadlock.

So I started going through all the code and checking to see if I missed anything. Well... sure enough, I had a little section where I had a stack locker active and then called a method that grabbed the other lock for a bit. I was able to clean this up, and I'm hoping this had to be it, but in truth, I'd have to have another section with two locks to have a deadlock. And I didn't find that section section.

I did, however, find a few places where I thought that encapsulation was a better plan, and so I made methods that were, themselves, thread-safe, and then put those into the code where I had originally had the locks. These should be essentially "no-ops", but in fact the locking will be slightly different and in that I may have helped myself even though I hadn't seen a section of code with two active locks.

I've also added in some more logging into my addInstrument() call - where it appears the lock-up was, just to see what happens next time. I'll run with this and we'll see if and when it fails what the logs tell us.

Some May Call it Tedious, I Call it Interesting

Thursday, January 15th, 2009

SwissJupiter.jpg

I'm still on the trail of the problems in the message bus API. I stayed a little late last night to try a few things while my price injector wasn't injecting ticks. Turns out, I learned a bit, but then this morning I learned quite a bit more. It's getting downright interesting.

First, last night, when I wasn't injecting prices, I would see the poll() working as it should. This morning, I noticed that as soon as I started sending price messages, I started getting tons of poll() hits on the incoming message socket. Why on earth are they doing that?

More to the point, the poll() loop was working perfectly and I could cross that off my list. It was certainly a possibility that the polling loop was just stopping - an error, an exception, something could have stopped it and that would explain it. However, this morning, I saw the polling loop running like a champ, but the messages I sent this morning were not getting delivered to my code.

Interesting fact #2: the act of sending a message in this API causes the system poll() to return true even when there's no data waiting at the socket. I can imagine a few reasons why this might be the case, but all of them are pretty bad. The criteria for a positive result from poll() should always mean that there's data there on the socket waiting to be read. But clearly, this is not the case with this API.

So one good, one bad. What I'm left with is that I needed to add in more logging on the processing of the message data stream and hope that it's a trap-able error in the code I can control. If not, then there's no hope for a solution on my end, and I'll have to football all this information back to the vendor and hope that they can figure this out with the data I provide them.

I'm not holding my breath.

Tracking Down Problems in Vendor Libraries

Wednesday, January 14th, 2009

Detective.jpg

Today I got an email from a tech support guy for a vendor we use, asking if I'd tried their latest version to see if it fixed the problem I was having. Basically, it's a proprietary message bus that's very simple, and at the same time, fast. The problem is, it's not really fast enough to warrant being on it's own, but it's so old, that when it was new, it was probably something pretty useful - in the context of their system.

He asked a bunch of questions, all things I'd gone over with the last guy that contacted me on the 18 month old bug, but I wrote back a detailed message saying what I'd tried and what I believed to be the issue. However, because of the way they wrote their library, there's really no way I can know what's happening under the covers.

I suspect that it's in the socket handling - specifically the use of the poll() system call, and what it returns in different conditions. I know from experience that it can be tricky on linux if the socket gets in a weird state. My friend who wrote our C++ wrapper to the C API from the vendor did the simple poll() system call to see if there was a pending message on the socket, and if so, he'd call their handling method. But he used the simple poll() system call.

I decided that I'd see if the improved poll() I had written for CKit would help. It's got a lot better error handling and maybe the issue is with that. Certainly a good place to start. So I added that code, and we'll see what happens.

If that's not the case, then I'm going to start logging the activity around the poll() call to see if the vendor's handler function is hanging, or if there's even data at the socket to read. I'm not sure what's going to come of this, but it's an interesting diversion. Maybe I can come up with a work-around and fix this horrible problem.

Interesting Confirmation on the Dual-GPU MacBook Pros

Wednesday, January 7th, 2009

MacBookPro17.jpg

I was talking to a friend today about the new 17-inch MacBook Pro and it's dual-GPU configuration, and why I thought that was Apple's design for more cores as opposed to putting a quad-core CPU into the laptop. He didn't think the two could be powered at the same time, and pointed to the need for a reboot in the current hardware to switch from one to the other. I didn't remember where I'd read the news from NVidia, but I found it today.

The article points out that the NVidia representative confirmed:

Besides confirming that you'll see it in other notebooks soon, they definitively answered some lingering questions about the chip's capabilities: It can support up to 8GB of RAM. It can do on-the-fly GPU switching. And it can work together with the MacBook Pro's discrete 9600M GT. But it doesn't do any of those things. Yet.

This makes perfect sense then.

Apple invests in OpenCL and gets some of the ObjC code in the OS to use it and then all of a sudden it's got a tremendous advantage over those machines that don't have the similar capability. I was a little surprised to see the dual-GPU in the 15-inch MacBook Pro, but wrote it off to the fact that the integrated GPU was "free", and removing it was more expensive than just using it. But it wasn't good enough, so they added the "good" one.

Not so, I think now.

This is going to be a CPU/GPU machine where the integrated GPU is connected directly to the system RAM, and that makes perfect sense for fast processing. Sure, it's going to require a lot of work to get OpenCL integrated into the OS, or core Frameworks, but when it does what a boost!

Yup, I've got to order one of these boxes. Gotta.

SubEthaEdit is Up to v3.2.1

Tuesday, December 30th, 2008

subethaedit.jpg

It's been a while since I checked on SubEthaEdit, and this morning I decided to check and see if it's been updated recently. To my surprise, it was up to 3.2.1 - from 3.1 I'd been running for a long time. Turns out there are a ton of changes - and all of them good.

There were a few things that I didn't really like about SubEthaEdit, and the first was the syntax highlighting was slower than I could type. It would look like "normal" text for half a second and then convert to the correct style. Not horrible, but not as fast as BBEdit or MacVim, and those are the two that I use daily. Even the editor in Xcode was faster at the change. Well... with 3.2 (and then 3.2.1) they have improved the syntax highlighting performance significantly.

They have also improved the PHP, Perl, CSS, Javascript, and a few other modes. This is going to be a lot faster. I'm very excited about what they have been doing here. If they only added the ability for me to mark/jump and also navigate by the keyboard and fix the method sorting.

There are a few other things: the scrolling 'jumps' when you get to the bottom of the screen and while this is fine for people that didn't get used to 'vi' or other editors that scrolled one line at a time, it's really distracting for me as I have to refocus on the moved text.

It's getting better, and I'm really happy to support them as they get better, but for now it's not going to supplant BBEdit or MacVim.

BBEdit 9.1 is Out!

Tuesday, December 16th, 2008

BBEdit.jpg

Amazingly, BBEdit 9.1 had just the feature I was hoping it'd add - stripping white space at the end of lines. I do that a lot - even have a macro for it, but now I don't have to. Just set it up in the Text Files preferences pane and every time you save, it's going to strip those nasty trailing whitespace characters.

The release notes have a lot more - like a new font: Consolas Regular. It's remarkably like my current favorite: Panic Sans from the guys at Panic who distributed it with Coda, their all-in-one web development app. It's clean, clear, and the only difference I can see is that the point sizes are a little different. Consolas 10pt is Panis Sans 9pt. That's it.

Yet there's even more. BBEdit 9.1 has a much improved FTP/SFTP interface that is supposed to be much faster than the previous version. Excellent. RCS keywords are handled better - it's amazing, really. The list goes on and on. It's still my favorite editor on the Mac.

UPDATE: I've been looking at the new Consolas font and how it differs from Panic Sans... it's got more white space above and below the line. More even, than the same point size in Panic Sans. Meaning that if Consolas 10pt is the same horizontal size as Panic Sans 9pt it's got more vertical room than Panis Sans 10pt. That's very interesting. More whitespace in between the lines may make this more readable while making the horizonal compression able to get more columns in the same real-estate. Interesting.