Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

On-the-fly all-languages syntax highlighting

by gmax (Abbot)
on Dec 13, 2003 at 14:08 UTC ( #314528=CUFP: print w/replies, xml ) Need Help??

Syntax highlighting has always been one of my favorite features in every editor I have used. After becoming addicted to it, I have always regretted not being able to share colored code with other people.

If we talk about Perl code alone, there are a few available tools to create colored code in HTML and publish it on the web.

When you need to show code in more than one language, though, things become more difficult. Think about presenting the installation and customization of a complex system. You'd need to show Perl code, Apache configuration files, HTML code, SQL queries, and perhaps some XML.

When you publish this code on the web, what was clearly highlighted and easily understandable in your editor screen becomes a flat sequence of black on white text.

Introducing Text::VimColor

Two wonderful features of Vim are its ability of highlighting different languages (384 as of today) and producing an HTML page with the same layout of the code on screen.

(And don't forget Vim's ability to highlight nested syntax, such as Perl embedded in HTML or SQL embedded in Perl.)

Producing code manually with vim is not user friendly and it is quite slow. If you need to publish code on a regular basis, producing HTML pages manually from Vim is a hassle.

Enter Geoff Richards' Text::VimColor, a module that removes your need to remember difficult commands and to cut and paste your code snippets.

Given these code fragments:

# package CodeSamples; our $ctext = <<'CTEXT'; #include <stdio.h> int main() { printf("hello world\n"); return 0; } CTEXT our $perltext = <<'PTEXT'; my $query = qq{SELECT mycol, COUNT(*) FROM mytable WHERE mycol <= 10 GROUP BY mycol}; print $$_,$/ for @{ $dbh->selectcol_arrayref($query) }; # Notice that $query has nested SQL syntax PTEXT 1;

This script will produce nicely highlighted code (HTML + CSS).

#!/usr/bin/perl -w # use strict; use CGI qw/:standard/; use Text::VimColor; use CodeSamples; # contains code samples in C and Perl my $csyntax = Text::VimColor->new( string => $CodeSamples::ctext, filetype => 'c' )or die("can't create C object ($!)\n"); my $perlsyntax = Text::VimColor->new( string => $CodeSamples::perltext, filetype => 'perl' )or die("can't create perl object ($!)\n"); my $fperlsyntax = Text::VimColor->new( file => $0, filetype => 'perl' )or die("can't create perl object ($!)\n"); print start_html(-title=>"Text::VimColor test", -style=>{'src'=>'light.css'} ), h2("C"), pre( $csyntax->html), hr, h2("Perl"), pre( $perlsyntax->html), hr, h2("Perl (file)"), pre( $fperlsyntax->html), hr, h2("CSS"), pre(Text::VimColor->new(file =>'light.css', filetype => 'css')->ht +ml), hr, h2("Perl (package)"), pre(Text::VimColor->new(file =>'', filetype => 'per +l')->html), hr, h2("Perl (another package)"), pre(Text::VimColor->new(file =>'', filetype => 'p +erl')->html);

See the colorful result.

Highlighting systems overview

Before continuing, let me show the alternatives. I have tried all of them, and I have good and bad feelings for each one of them. I am currently in favor of Text::VimColor for the reason given before, and more.

Application Pro Con
perltidy Fast and accurate Only Perl
GNU source highlight Very fast Hard to customize.
Only a few languages
Syntax::Highlight::Perl Fast and customizable Only Perl
Text::VimColor All languages.
Easily customizable
Slower than other modules.
Works only on Unix (as of today)

Improving performance

As I said, Text::VimColor main deficiency is its poor performance compared to other modules. Although the latest version (0.07) is twice as fast as the previous one, it is still way too slow for any sensible web usage.

Therefore, I decided to create a caching object, to improve Text::VimColor basic performance.

The simplest way I could think of was a tied hash with DB_File. I have also simplified the object interface, to make it easier to use.

package VimColorCache; use strict; use warnings; use Text::VimColor; use Digest::MD5 qw/md5_hex/; use DB_File; our $VERSION = '0.1'; sub new { my $class = shift; my $filename = shift || 'VimColorCache.db'; my %code_items; tie %code_items, 'DB_File', $filename or return undef; my $self = bless { code_items => \%code_items }, $class; return $self; } sub _get_text { my $filename = shift; my $text = undef; open IN, $filename or return undef; local $/; $text = <IN>; close IN; return $text; } sub draw { my $self = shift; my $text = shift; # either the code or the file name my $input = shift; # file or string my $syntax_type = shift; # syntax type (perl, c, sql, html, xml, +etc) my $output = shift; # output mode return undef unless $output =~ /^(?:html|xml)$/; return undef unless $input =~ /^(?:file|string)$/; my $code = $text; if ($input eq 'file') { $code = _get_text($text) or return undef; } $code =~ s/\t/ /g; # turns tabs into 4 spaces my $signature = md5_hex($code); if (exists $self->{code_items}->{$output.$signature}) { return $self->{code_items}->{$output.$signature} } else { my $syntax = Text::VimColor->new ( $input => $text, filetype => $syntax_type ) or return $code; my $out = $syntax->$output; $self->{code_items}->{$output.$signature} = $out; return $out; } } sub remove { my $self = shift; my $text = shift; # either the code or the file name my $input = shift; # file or string my $output = shift; my $code = $text; if ($input eq 'file') { $code = _get_text($text) or return undef; } my $signature = md5_hex($code); delete $self->{code_items}->{$output.$signature}; } 1; __END__ =head1 NAME VimColorCache - caches the result of Text::VimColor =head1 SYNOPSIS use VimColorCache; my $filename = 'syntax.db'; my $vcc = VimColorCache->new($filename); print $vcc->draw('print $_,$/ unless m/^\s*$/g', 'string', 'perl', 'html'); print $vcc->draw('hello.c', 'file', 'c', 'html'; =head1 class methods =over 4 =item new() The constructor accepts an optional filename where to store previously highlighted code snippets. The default file name is VimColorCache.db =item draw() Returns a properly highlighted text. It needs some parameters: - text either a filename or a string containing code to be formatted - input 'file' or 'string' - syntax_type language to use (perl, c, php, xml, SQL) The same as Vim's 'filetype' - output either 'html' or 'xml' If the code passed as string or file has already been processed, then the corresponding formatted text is returned, otherwise a Text::VimCol +or object is created and the highlighted syntax is processed from scratch +. print $vcc->draw('hello.c','file', 'c', 'html'); open IN, "hello.c" or die "can't open\n"; my $c_text = do { local $/; <IN> }; close IN; print $vcc->draw($c_text,'string', 'c', 'html'); These two instructions will print the same output. Only, the first one will be slow, the second one will be extremely fast. =item remove() Removes an item from the repository. It needs the same parameters as draw(), except "syntax_type" $vcc->remove('hello.c','file', 'html'); =back =head1 AUTHOR Giuseppe Maxia, a.k.a. gmax ( =head1 COPYRIGHT Same as Perl itself. =cut

VimColorCache is a layer between the application and the highlighting module. It works on the assumption that, in most cases, code is published once and shown many times. Sometimes it is modified, but mostly it is just published and then left on the page for public consumption. In this scenario, the first poster has to wait one or two seconds for the highlighting engine to do its job, but every further request of the same code is resolved instantly.

The first example shown in this node could be rewritten using VimColorCache as follows:

#!/usr/bin/perl -w # use strict; use CGI qw/:standard/; use VimColorCache; use CodeSamples; my $vcc = VimColorCache->new or die("can't create object ($!)\n"); print start_html(-title=>"VimColorcache test", -style=>{'src'=>'light.css'} ), h2("C"), pre( $vcc->draw($CodeSamples::ctext, 'string', 'c', 'html')), hr, h2("Perl"), pre($vcc->draw($CodeSamples::perltext, 'string', 'perl', 'html')), hr, h2("Perl (file)"), pre($vcc->draw($0, 'file', 'perl', 'html')), hr, h2("CSS"), pre($vcc->draw('light.css', 'file', 'css', 'html')), hr, h2("Perl (package)"), pre($vcc->draw('', 'file', 'perl', 'html')), hr, h2("Perl (another package)"), pre($vcc->draw('', 'file', 'perl', 'html'));

And the second colorful result shows exactly the same output as the previous one, except for the page title.

update (1)
CAVEAT. If you pass a file to Text::VimColor and at the same time you are editing the same file with Vim, it will return an error. You should either ensure that your file is not currently in use by Vim before passing it to the class constructor, or slurp it into a scalar and pass it as a string.

Update (2)
In case you are wondering just how slow is Text::VimColor without a cache, here is an example.
Processing times are acceptable for small scripts, but become unbearable for large ones.

Application Time to highlight
(2.6 KB)
(22 KB)
(221 KB)
(226 KB)
perltidy 0.30 0.86 1.76 2.77
source-highlight 0.00 0.03 0.19 0.21
Text::VimColor 0.34 2.35 16.62 17.23
VimColorCache 0.01 0.01 0.01 0.02


 _  _ _  _  
(_|| | |(_|><

Replies are listed 'Best First'.
Re: On-the-fly all-languages syntax highlighting
by TVSET (Chaplain) on Dec 14, 2003 at 22:15 UTC
      Well done, great to show us more than Text::Vimcolor. I just detected Perl::Tidy and I love it!
Re: On-the-fly all-languages syntax highlighting
by SavannahLion (Pilgrim) on Dec 16, 2003 at 02:00 UTC

    You write that your module is slower than the other highlighters. I would like to test it, but I haven't quite gotten far enough with Perl to work out how to time running scripts (it's on my TO-DO list somewhere).

    What kind of performance differences are we looking at here? I've been wanting to implement something like this for my website (specifically C/C++ and a custom script), but slogging through pages and pages of search engine results yielded really great color highlighters. But none, so far, that is a Perl script for highlighting other languages.

    At one point, I even wondered if I could utilize the B module, but I fear that's too much of a security risk.

    Is it fair to stick a link to my site here?

    Thanks for you patience.

      See the documentation on the Benchmark module that comes with Perl.

      Using B (probably with some backend such as B::Terse, I canít imagine you want the bare B module) would only be a security issue if you want to highlight code submitted to your site by anyone, and thatís not because of B itself so much as because using B requires Perl to have compiled the code, which is inseparably connected with the possibility of having code run at compile-time.

      Makeshifts last the longest.

        Using B (probably with some backend such as B::Terse, I can't imagine you want the bare B module) would only be a security issue if you want to highlight code submitted to your site by anyone, and that's not because of B itself so much as because using B requires Perl to have compiled the code, which is inseparably connected with the possibly of having code run at compile-time.

        That's exactly the problem I'm worried about. The site is open ended and allows text to be uploaded. I mentioned B since I saw that it appeared to have some C specific routines that might be utilized for what I want. I toyed with that idea for all of 40 seconds when I realized that B is ultimately designed to compile C and that someone might be able to invoke the compiler through some bug somewhere I wasn't aware of. A risk I didn't want to take.

        About Text::VimColor. I looked over the documentation for Text::VimColor and I don't think this is a viable alternative for my needs. The wya I figure it, I suppose I could get away caching the output (I already have stubbed code for exactly that sort of thing), but am I correct in understanding that this module utilizes Vim?

        Is it fair to stick a link to my site here?

        Thanks for you patience.

Emacs fontified buffer => HTML
by calin (Deacon) on Dec 17, 2003 at 19:12 UTC

    There is an Emacs package called htmlize that can be used to export to HTML any fontified Emacs buffer, including font-locked programming modes of course.

code completion
by oha (Friar) on Dec 20, 2003 at 12:44 UTC
    I've seen a vim plugin written in ruby, which let you code-complete in vim with OO languages. this project is no longer maintained since a year ago, but.

    Maybe would be interesting to write a perl plugin which do something like, for example complete the perl variables names in scope and kind-of. (at least, to whom like me who doesn't know ruby :)

Re: On-the-fly all-languages syntax highlighting
by shushu (Scribe) on Dec 25, 2003 at 08:13 UTC
    In my product we used PerlTidy for highlighting for a long period, ubt unfortunately found out his performance is much slowere then Syntax::Highlight::Perl.
    When considering performance I really recommend on the latter.
    On the other hand, PerlTidy is very useful for his main purpose - beautifying the code.
Re: On-the-fly all-languages syntax highlighting
by Anonymous Monk on Dec 17, 2003 at 22:02 UTC
    Another Perl script for doing syntax highlighting in various languages that you might want to be aware of is Peter Palfrader's Code2HTML. I wrapped it into a single .pm perl module and regularly use it to generate dynamic formatting from apache/mod_perl.

    You can find the code and see it in action at

      The "Anonymous Monk" suggesting Code2HTML is, in this case, me.
      (I'd foolishly forgotten to log on before posting.)
Re: On-the-fly all-languages syntax highlighting
by Anonymous Monk on Dec 16, 2003 at 06:39 UTC
    Note also the ( currently somewhat stalled for lack of coder time ) PPI project has an alternative syntax highlighter.

      Isn't PPI meant just for Perl though? The original poster says that his module highlights something on the order of 300+ different languages.

      Is it fair to stick a link to my site here?

      Thanks for you patience.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://314528]
Approved by barrd
Front-paged by dbwiz
[holli]: Corion is well off. I am sure there is a script of his running in the frankfurt banking data center, diligently mining bitcoins ;-)
[Corion]: holli: Hahaha :-D
[choroba]: When working at a bank, we had a colleague who entered mining early enough to have lots of bitcoins. He used to say "I go to work just because I'm too bored at home."

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (10)
As of 2017-09-21 15:24 GMT
Find Nodes?
    Voting Booth?
    During the recent solar eclipse, I:

    Results (249 votes). Check out past polls.