http://www.perlmonks.org?node_id=942277

Since Shakespeare is in the public domain, you can find it on any number of websites, and one of the most popular is the MIT Shakespeare, but just because a thing is popular doesn't mean it's actually good. The MIT Shakespeare editions get a lot of flack for not being the best editions on the net, but they are, quite frankly, for most applications, good enough, and they are highly convenient. That last one matters to theatre folk.

While I was poking around the MIT Shakespeare editions, I noticed that they were actually very nicely coded pieces of HTML. One of the complaints about these editions is that they don't have the line numbers that the people who study/use Shakespearean texts have come to expect; the truth is that they do, just coded into the HTML. Perl can fix that.

#!/usr/bin/perl # # addFifthLine # # Parse an HTML file formatted according to the MIT Shakespeare format +, # and print every fifth line number at the end of the line. # # Input: an HTML file using the MIT Shakespeare Format # Output: that same file, but with every fifth line printed after the +line. # # Add the following to the <head> element of your resulting HTML file +to # offset the line numbers to the right of the text. You may need to in +crease # or decrease the left offset (default is 550px). # # <style type="text/css"> # .lineNum { # position: absolute; # left: 550px; # } # </style> use strict; use warnings; my $input_file; if ($ARGV[0]) { $input_file = $ARGV[0]; } else { die "File must be given as first arg\n"; } open INPUT, "$input_file" or die "Couldn't open file: $!\n"; # Read each line of the file. If it's a line of text, we have # to do some math, but otherwise we can just print the line. while (<INPUT>) { my $line = $_; chomp $line; if ($line =~ /^<a name=\"[B]*\d?\.\d?\.(\d+)\">/ || $line =~ /^<a name=\"(\d+)\">/) { # If the line we've read is a line of text, we need to # figure out if it's a 5th line. If it's not, we can just # print it, but if it is, we need to insert that line number. my $line_num = $1; if ($line_num % 5 == 0) { $line =~ s/<\/a><br>$/<span class=\"lineNum\">$line_num<\/span>< +\/a><br>/g; print "$line\n"; } else { print "$line\n"; } } else { # The line isn't special, so we can just print it # But not quite yet... print "$line\n"; } }

Download your favorite Shakespeare play (or scene), run it through the parser, and capture the output into a new html file. Then insert this quick little style sheet into the head of that file:

<style type="text/css"> .lineNum { position: absolute; left: 550px; } </style>

And viola! You've just added line numbers to the MIT Shakespeare edition. Granted, this is only to a local copy of the file that you keep with you, but it's easy enough to upload the file to your own web site for your own convenient access, anyway.

<string>Updated 16 Dec. 2011 to incorporate fix suggested below and accommodate different scene numbering schemes.

Replies are listed 'Best First'.
Re: Adding Line Numbers to the MIT Shakespeare
by tobyink (Canon) on Dec 07, 2011 at 23:27 UTC

    The id attribute is supposed to be unique per-document. That is, you must not have two elements in the same document with the same value for id.

    Try replacing id="lineNum" with class="lineNum" in your HTML, and replacing #lineNum with .lineNum in your stylesheet.

      Hi,

      Sorry for asking maybe a very simple question, but where should I enter the file (or the name of the file) for parsing?

      Thanks in advance,

      Eduardo

        Input: an HTML file using the MIT Shakespeare Format ... die "File must be given as first arg\n";

        Provide the input file when running the script, like this:

        perl scriptname.pl inputfile.html