Updated WordPress Code Highlighter to GeSHi 1.0.8.6

Today I've been tracking down a little rendering bug with MarsEdit and WordPress. It's in the pre tags that run through the Code Highlighter plugin that I've been using for quite a while. It's the "less-than-sign". If I use the HTML escape code then I get that as the literal text in the MarsEdit preview pane as well as the WordPress page. But if I place the single character, MarsEdit thinks it's the start of a tag, and gets confused on the syntax highlighting.

So the two questions are: why can't the Code Highlighter handle the HTML escape for the less-than-sign, and why is MarsEdit not rendering it properly? I'm going to attack the first, and let Daniel handle the second.

First thing I noticed was that the Code Highlighter is based on the GeSHi engine and the version it's working on was 1.0.7.x whereas the current version is 1.0.8.6. So let's see if we can update the GeSHi engine in the plugin without breaking anything.

Turns out, it's a pretty simple encapsulation of the GeSHi engine. I was able to drop in the new version without too much trouble. In the doing, I've upgraded the languages that this guy can work with considerably. That's a very nice little perk.

Unfortunately, this didn't solve the problem with the less-than-sign. I did a cursory look in the GeSHi code and didn't see where it'd be doing any conversions. I'll probably spend a little more time on it - just to see if it's possible. But even if that doesn't come to anything, we have a far better selection of supported languages:

4cs            div            lscript       python
abap           dos            lsl2          qbasic
actionscript   dot            lua           rails
actionscript3  eiffel         m68k          rebol
ada            email          make          reg
apache         erlang         mapbasic      robots
applescript    fo             matlab        rsplus
apt_sources    fortran        mirc          ruby
asm            freebasic      mmix          sas
asp            fsharp         modula3       scala
autohotkey     gambas         mpasm         scheme
autoit         gdb            mxml          scilab
avisynth       genero         mysql         sdlbasic
awk            gettext        newlisp       smalltalk
bash           glsl           nsis          smarty
basic4gl       gml            oberon2       sql
bf             gnuplot        objc          systemverilog
bibtex         groovy         ocaml-brief   tcl
blitzbasic     haskell        ocaml         teraterm
bnf            hq9plus        oobas         text
boo            html4strict    oracle11      thinbasic
c              idl            oracle8       tsql
c_mac          ini            pascal        typoscript
caddcl         inno           per           vb
cadlisp        intercal       perl          vbnet
cfdg           io             perl6         verilog
cfm            java           php-brief     vhdl
cil            java5          php           vim
clojure        javascript     pic16         visualfoxpro
cmake          jquery         pike          visualprolog
cobol          kixtart        pixelbender   whitespace
cpp-qt         klonec         plsql         whois
cpp            klonecpp       povray        winbatch
csharp         latex          powerbuilder  xml
css            lisp           powershell    xorg_conf
cuesheet       locobasic      progress      xpp
d              logtalk        prolog        z80
dcs            lolcode        properties
delphi         lotusformulas  providex
diff           lotusscript    purebasic

[5/13] UPDATE: I was doing some more digging into the GeSHi engine - actually the Code Highlighter plugin, and I found what I thought was going to be a good place to fix this problem. In the codehighlighter.php file, we see:

  1. if ($lang != null) {
  2. $tabstop = 2;
  3.  
  4. $code = trim($matches[5], '\r\n');
  5. $code = str_replace('< /pre>', '</pre>', $code);
  6.  
  7. $geshi =& new GeSHi($code, $lang);
  8. $geshi->set_tab_width($tabstop);

where it's clear in the comments that he's allowing for the special case use of the pre tag, and I decided to try a simple modification of that for these less-than and greater-than signs I'm having trouble with:

  1. if ($lang != null) {
  2. $tabstop = 2;
  3.  
  4. $code = trim($matches[5], '\r\n');
  5. $code = str_replace('< /pre>', '</pre>', $code);
  6. $code = str_replace('\&\l\t\;', '<', $code);
  7. $code = str_replace('\&\g\t\;', '>', $code);
  8.  
  9. $geshi =& new GeSHi($code, $lang);
  10. $geshi->set_tab_width($tabstop);

This is a little odd in the way I have to show it, but it's pretty simple to understand - you replace the HTML escape sequence with the single character in the code. From there, you let the GeSHi engine do it's thing.

What I found was that it worked wonderfully! What a treat. Now I can use either method, and hopefully Daniel will have a fix for MarsEdit sooner rather than later.

The next thing I wanted to tackle with the Code Highlighter was the line numbers. There was far too much space between the lines in a code sample with line numbers. Turns out, there's a style for that in GeSHi. Simply edit the geshi.php file:

  1. /**
  2.   * Line number styles
  3.   * @var string
  4.   */
  5. var $line_style1 = 'font-weight: normal; vertical-align:top;';
  6.  
  7. /**
  8.   * Line number styles for fancy lines
  9.   * @var string
  10.   */
  11. var $line_style2 = 'font-weight: bold; vertical-align:top;';

to be:

  1. /**
  2.   * Line number styles
  3.   * @var string
  4.   */
  5. var $line_style1 = 'margin: 0; font-weight: normal; vertical-align:top;';
  6.  
  7. /**
  8.   * Line number styles for fancy lines
  9.   * @var string
  10.   */
  11. var $line_style2 = 'margin: 0; font-weight: bold; vertical-align:top;';

and the extra border space that the default WordPress theme puts into the li tag will be removed and it'll look much better.

The last little annoyance is the blank lines that start, and end, the code section when you use line numbers. It's just plain annoying. It makes it hard to get the numbers right, and it's whitespace that's not needed. It's a little more involved, but not too bad. In the geshi.php file, you need to change:

  1. // Get code into lines
  2. /** NOTE: memorypeak #2 */
  3. $code = explode("\n", $parsed_code);
  4. $parsed_code = $this->header();

to:

  1. // Get code into lines
  2. /** NOTE: memorypeak #2 */
  3. $code = explode("\n", $parsed_code);
  4. // remove a blank first and last line
  5. if ('' == trim($code[count($code) - 1])) {
  6. unset($code[count($code) - 1]);
  7. $code = array_values($code);
  8. }
  9. if ('' == trim($code[0])) {
  10. unset($code[0]);
  11. $code = array_values($code);
  12. }
  13. $parsed_code = $this->header();

Now I can imagine a way that might be a little more efficient, but I'm not worried at this point. It's not all that bad, and it's very solid. If the first or last lines are empty of code, they get removed and the array is re-indexed. Simple.

With this, I have a really nicely workable solution for my code. Nice.