If you're working with RTF, sometimes you'll want to compare two RTF files to see if they're different. Traditional diff falls down here, because RTF can have all sorts of crazy whitespace, some of which is significant, some of which isn't. rtfdiff, below, rasterizes the token streams from two rtf files, and then diffs those, allowing you to easily see if two rtf files are the same :-) Tada!
#!/usr/bin/perl
# Compares a tokenized view of two RTF files
use strict;
use RTF::Tokenizer;
use Text::Diff;
my $first_file = pretty_print( $ARGV[0] );
my $second_file = pretty_print( $ARGV[1] );
print diff \$first_file, \$second_file;
sub pretty_print {
my $filename = shift;
my $output;
my $tokenizer = RTF::Tokenizer->new( file => $filename );
while (1) {
my ( $type, $token, $argument ) = $tokenizer->get_token();
last if $type eq 'eof';
$argument =~ s/\n/[n]/g;
$argument =~ s/\t/[t]/g;
$argument =~ s/\r/[r]/g;
$output .= "($type) $token $argument\n";
}
return $output;
}
a handy utility, sheriff. i've got a couple suggestions:
use warnings there's no reason not to, this code is clean under warnings.
specify the module version for RTF::Tokenizer, as in use RTF::Tokenizer 1.01 qw();. the functionality of new has changed to allow the syntax you've specified. i had to upgrade mine from 1.00 to get this to work correctly.