Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re^2: Parsing and extracting data from files.

by WhiteTraveller (Novice)
on Apr 09, 2013 at 21:30 UTC ( #1027849=note: print w/replies, xml ) Need Help??

in reply to Re: Parsing and extracting data from files.
in thread Parsing and extracting data from files.

Hello again.

I only code because of the journey. I would say that, in the last 20 years, there has not been a single piece I am proud of. However, I generally manage to hack a workable solution, and the achievement is usually enough.

The end result is fairly well planned - as is the sql to get it there.

my %album = ( name => "Collection Name", upc_ean => "123456789012", disc => [{ title => "CD 1", disc_id => "12345", track => [{ title => "Track 1 title", isrc => "aa-aaa-13-12345" }] }], );

This summarises the structure well enough. Add in sections of CD-Text where appropriate. Few are mandatory, as far as I am concerned.

My current reading centres on either Marpa, Parse::RecDescent or Regexp::Grammars. I suspect one of these will do what I am looking for...

I'll update later, when I have something cobbled together...

Replies are listed 'Best First'.
Re^3: Parsing and extracting data from files.
by WhiteTraveller (Novice) on Apr 18, 2013 at 23:43 UTC

    Well, after some reading, I ended up attempting RecDescent (only because it ended up 1st on my list), and have started as follows:

    #!/usr/bin/perl use vars qw(%VARIABLE); use Data::Dumper; use Parse::RecDescent; $::RD_ERRORS = 1; $::RD_HINT = 1; $::RD_WARN = 1; $::RD_TRACE = 1; my %album = ( Title => 'The Collected Works of Mozart', Performer => 'The Royal Symphonic Orchestra', Barcode => '1234567890123', ); my %hash1 = ( Title => 'Disk 1', Type => 'Audio', Foo => 'bar', ); my %hash2 = ( Title => 'Disk 2', Type => 'Audio', Foo => 'FooBar', ); my %hash3 = ( Title => 'Chopsticks', Performer => 'Pascal Roge', ISRC => 'AABBB1122222', ); # Example data for illustration purposes. $album{'Disc'}[0]=\%hash1; # Example data, stored as Disc[0]. $album{'Disc'}[1]=\%hash2; # Example data, stored as Disc[1]. $album{'Disc'}[1]{'Track'}[0]=\%hash3; # Disc 2 Track 1 #=========== Start of actual parsing code ============================ +======== my $file = '/home/Media/Music/tmp/01.toc'; { local $/; undef $/; open my $grammarfh, '<', 'TOC.bnf' or die "Arghh! Cannot open gramma +r.\n"; $grammar = <$grammarfh>; open my $fh, '<', $file or die "Arghh! Cannot open file.\n"; $text = <$fh> ; } my $parser = new Parse::RecDescent($grammar) || die "Bad Grammar!\n"; my $cd = $parser->contents($text); push @{$album{'Disc'}}, $cd; # Not quite right! Check.. Cop +y data, not store a reference. print Dumper(\%album); print Dumper(\%VARIABLE); # Perhaps we should store the parsed +info in here? print Dumper($cd); sub subroutine { shift; print "Entered Subroutine\n"; my ($foo, $bar) = @_; return $foo;

    It has been drafted specifically to load the grammar from an external file. It allows me to edit thta just a little easier, but also allows me to reuse the same code por parsing a CUE file later. However, it is the grammar that is proving frustrating. This is what I have so far...

    #===============================================# # RecDescent grammar to parse a CD TOC file. # #===============================================# { # Nothing here yet. } # Grammar: contents: line(s?) # <skip: qr/[^\S\n]/> line: text { } | Parameter {$return = $item{'Parameter'};} | word foo { $main::VARIABLE{$item{'word'}}=$item{'foo +'} } # not quite sure how this will be useful... | text | word { $return = $item{'word'}; } | BlankLine # | Comment | <error> # Next line not quite right. Consider using $VARIABLE Parameter: word qstring { $return = { $item{'word'} => $item{'qstri +ng'} }; } # CD_TEXT is *always* followed by a <CR>, then LANGUAGE_MAP or LANGUAG +E. # Should I be considering recursion here? text: /CD_TEXT {/ { return main::subroutine(@item) } setting: /LANGUAGE_MAP \d/ { print "Map\n"; } | /LANGUAGE \d/ { print "Lang\n"; } # Tokens: BlankLine: <skip: q{}> /^\s+$/m Comment: <skip: qr{\s* (/[*] .*? [*]/ \s*)*}x> word: /\w+/ msf: /\d\d:\d\d:\d\d/ newline: /\n/ number: /\d+/ qstring: '"'/[^"]+/'"' { $return = $item[2]; } #qstring: <perl_quotelike> # See http://www.perlmonks.o +rg/?node_id=485933 # { my ($marker, $quote, $text) = @{$item[0]}[0..2] +; } foo: /\d+.\d+.\d+/ # This will match both 14:43: +00 and 38935137

    Apologies - it is quite awful at the moment, but I am too tired and confused to start tidying it up... If you have the time, I could do with a pointer or two. I have a feeling that I should be calling recursively to parse the CD_TEXT, but I am afraid I don't know RecDescent well enough.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1027849]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (6)
As of 2017-06-25 06:20 GMT
Find Nodes?
    Voting Booth?
    How many monitors do you use while coding?

    Results (564 votes). Check out past polls.