I've been slugging out this idea of infix to prefix conversion and this afternoon I finally cracked it. Well, to be fair, I cracked it on paper this morning and I got it all coded up this afternoon. This comes about because the lisp-like language that I put together for a project at The Shop would require that all arithmetic expressions be written in prefix. Seems obvious, but a lot of the users weren't really excited about the difficulties in putting together prefix expressions.
They seem to be pretty content to say:
(and (condition 1)
(condition 2)
(condition 3))
as they like the short-hand it gives them, but when they need to do something with more than simple addition, like, say the quadratic equation, then things get a little more difficult, and they start to get confused:
(set a 1)
(set b -7)
(set c 12)
(set x1 (/ (+ (-b) (sqrt (- (* b b) (* 4 a c)))) (* 2 a)))
(set x2 (/ (- (-b) (sqrt (- (* b b) (* 4 a c)))) (* 2 a)))
and while I can agree it's not trivial, the infix representation isn't all that easy, either:
(set x1 (/ ((-b) + (sqrt (b * b - 4 * a * c)))/(2 * a)))
(set x2 (/ ((-b) - (sqrt (b * b - 4 * a * c)))/(2 * a)))
But they wanted infix in their math expressions. But not pure infix - just selective infix to make decoding as hard as possible. In short, any expression (a section of code enclosed in parentheses) can be either prefix or infix, and it's up to the script parser to figure out which it is, and act accordingly.
Oh... and they also didn't like the required spaces.
This lead to something like this:
(set x1 (/ ((-b) + (sqrt (b*b - 4*a*c))) (* 2 a)))
(set x2 (/ ((-b) - (sqrt (b*b - 4*a*c))) (* 2 a)))
where there is infix in parts and prefix in other parts. Not ideal, to be sure, but it's the users that want this, so we need to find some way to make it happen.
Getting the Easy Stuff First
The easy stuff was missing spaces and operator ordering. The missing spaces simply meant that I had to have foreknowledge of the different operators that I could run into, and use them as separators. Not hard, but it was more time-consuming on the parsing, and made the code quite a bit uglier. Still, it was easy to make:
(*5 2)
return 10.
The next easy one was the operator order. If you have:
(10 = (* 5 2))
it's easy to see that the first token is not an operator, and to put the first token as the first argument to the expression. When you hit the second argument, it is an operator, and can be put in that place in the expression. In fact, it's easy to say:
(10 5 3 2 1 *)
as there's only one operator in the mix.
Only slightly harder is to ignore duplicated operators:
(10 + 5 + 3 + 2 + 1)
for I can look at the first operator for the expression, and if the next operator in the expression is the same, I just ignore it. We're getting pretty far, actually.
The last easy one was putting expressions before operators, but I already had that with the numbers before operators, so we get for free:
((5*2) = (2+2+2+2+2))
Now for the Hard Stuff
In truth, all that stuff only took me about an hour to figure out. The hard part is operator precedence. I struggled with operator precedence for several days until I happened to some across an idea stewing in my head. The basic problem is that I did not want to have a multi-pass parser where the first pass tokenizes the expression, the next orders the tokens based on their operator precedence, the next creates the prefix mode, the next makes the evaluation tree, etc. It's not hard to see that a multi-pass parser is a very good way to go, but I'd already put so much effort into my parser (it's a single-pass), that I didn't want to throw all that away if I didn't have to.
All I needed was to come up with a way to handle the operator precedence and I was golden. But it was a pain to come up with. In the end, I had something that worked, and it seemed to be pretty solid. I had a stack of expressions I was parsing into. I started with the top-level expression, and then when I hit a different operator of greater precedence, you pull the last argument off the expression, make a new expression, and put the new operator and argument in the new expression.
It's not really easy to see, and I'm not convinced that it's any really easier than a multi-pass parser, but it works. The reverse is to see the different, looser-binding operator, and enclose the expression in another expression and use the looser operator as the new 'main' operator. Again, not easy to see, but it's working and that's what really matters.
I'm ready to get on to something else.