Well, after some reading, I ended up attempting RecDescent (only because it ended up 1st on my list), and have started as follows:
#!/usr/bin/perl
use vars qw(%VARIABLE);
use Data::Dumper;
use Parse::RecDescent;
$::RD_ERRORS = 1;
$::RD_HINT = 1;
$::RD_WARN = 1;
$::RD_TRACE = 1;
my %album = (
Title => 'The Collected Works of Mozart',
Performer => 'The Royal Symphonic Orchestra',
Barcode => '1234567890123',
);
my %hash1 = (
Title => 'Disk 1',
Type => 'Audio',
Foo => 'bar',
);
my %hash2 = (
Title => 'Disk 2',
Type => 'Audio',
Foo => 'FooBar',
);
my %hash3 = (
Title => 'Chopsticks',
Performer => 'Pascal Roge',
ISRC => 'AABBB1122222',
); # Example data for illustration purposes.
$album{'Disc'}[0]=\%hash1; # Example data, stored as Disc[0].
$album{'Disc'}[1]=\%hash2; # Example data, stored as Disc[1].
$album{'Disc'}[1]{'Track'}[0]=\%hash3; # Disc 2 Track 1
#=========== Start of actual parsing code ============================
+========
my $file = '/home/Media/Music/tmp/01.toc';
{
local $/;
undef $/;
open my $grammarfh, '<', 'TOC.bnf' or die "Arghh! Cannot open gramma
+r.\n";
$grammar = <$grammarfh>;
open my $fh, '<', $file or die "Arghh! Cannot open file.\n";
$text = <$fh> ;
}
my $parser = new Parse::RecDescent($grammar) || die "Bad Grammar!\n";
my $cd = $parser->contents($text);
push @{$album{'Disc'}}, $cd; # Not quite right! Check.. Cop
+y data, not store a reference.
print Dumper(\%album);
print Dumper(\%VARIABLE); # Perhaps we should store the parsed
+info in here?
print Dumper($cd);
sub subroutine {
shift; print "Entered Subroutine\n";
my ($foo, $bar) = @_;
return $foo;
It has been drafted specifically to load the grammar from an external file. It allows me to edit thta just a little easier, but also allows me to reuse the same code por parsing a CUE file later. However, it is the grammar that is proving frustrating. This is what I have so far...
#===============================================#
# RecDescent grammar to parse a CD TOC file. #
#===============================================#
{
# Nothing here yet.
}
# Grammar:
contents: line(s?) # <skip: qr/[^\S\n]/>
line: text { }
| Parameter {$return = $item{'Parameter'};}
| word foo { $main::VARIABLE{$item{'word'}}=$item{'foo
+'} } # not quite sure how this will be useful...
| text
| word {
$return = $item{'word'};
}
| BlankLine
# | Comment
| <error>
# Next line not quite right. Consider using $VARIABLE
Parameter: word qstring { $return = { $item{'word'} => $item{'qstri
+ng'} }; }
# CD_TEXT is *always* followed by a <CR>, then LANGUAGE_MAP or LANGUAG
+E.
# Should I be considering recursion here?
text: /CD_TEXT {/ { return main::subroutine(@item) }
setting: /LANGUAGE_MAP \d/ { print "Map\n"; }
| /LANGUAGE \d/ { print "Lang\n"; }
# Tokens:
BlankLine: <skip: q{}> /^\s+$/m
Comment: <skip: qr{\s* (/[*] .*? [*]/ \s*)*}x>
word: /\w+/
msf: /\d\d:\d\d:\d\d/
newline: /\n/
number: /\d+/
qstring: '"'/[^"]+/'"' { $return = $item[2]; }
#qstring: <perl_quotelike> # See http://www.perlmonks.o
+rg/?node_id=485933
# { my ($marker, $quote, $text) = @{$item[0]}[0..2]
+; }
foo: /\d+.\d+.\d+/ # This will match both 14:43:
+00 and 38935137
Apologies - it is quite awful at the moment, but I am too tired and confused to start tidying it up... If you have the time, I could do with a pointer or two. I have a feeling that I should be calling recursively to parse the CD_TEXT, but I am afraid I don't know RecDescent well enough.