Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Logic trouble parsing a formatted text file into hashes of hashes (of hashes, etc.)

by ambrus (Abbot)
on Oct 16, 2004 at 20:31 UTC ( #399813=note: print w/ replies, xml ) Need Help??


in reply to Logic trouble parsing a formatted text file into hashes of hashes (of hashes, etc.)

This is a tricky question, especially because some of the parenthisized entries in the input contain only a word, some both a word and :foo(...) pairs, some only :foo(...) pairs, so it's not obvious what data structure to use.

Here's my guess for interpreting it (you might want to tidy it a bit of course, like changing what's allowed in words and what's not, or adding my vars).

use Data::Dumper; $s = \%p; @s = (); while (<>) { while (/\G\s*(?:([-\ +w.]+)|:([-\w.]+)\s*\(|(\)))/gc) { if (defined($1)) { defined($$s{""}) + and die "parse error: two"; $$s{""} = $1; } elsif (defined($2)) { pu +sh @s, $s; $s = $$s{$2} = {}; } elsif (defined($3)) { @s or die "pars +e error: close"; $s = pop @s; } } /(\S.*)/g and die "parse error: jun +k: $1"; } $! and die "read error"; $s == \%p or die "parse error: ope +n"; print Dumper(\%p);

Update 2006 jun 2: this works for the examples in the node only, not the full example int appears.


Comment on Re: Logic trouble parsing a formatted text file into hashes of hashes (of hashes, etc.)
Select or Download Code
Re^2: Logic trouble parsing a formatted text file into hashes of hashes (of hashes, etc.)
by idnopheq (Chaplain) on Oct 16, 2004 at 21:32 UTC
    THX, ambrus!

    I'm running with chromatic's idea for the moment.

    I like yours as well at first glance, since it only requires Data::Dumper, which every perl-enabled machine has by dafault.

    THX
    --
    idnopheq
    Apply yourself to new problems without preparation, develop confidence in your ability to to meet situations as they arrise.

Re^2: Logic trouble parsing a formatted text file into hashes of hashes (of hashes, etc.)
by ambrus (Abbot) on Jul 30, 2006 at 22:36 UTC

    Here's a corrected version that can parse the long sample on your scratchpad (which I also copy below so that it wouldn't disappear unexpextedly). The original script (that in the parent thread) couldn't parse the longer sample because of features it had that I couldn't have guessed from the small samples on the node, and by the time I wrote that you didn't give us the long sample. There are three differences: firstly, this script expects that the data starts with an opening parenthesis, secondly, it accepts a lone colon instead of a colon with a keyword after it, thirdly, it accepts double-quoted strings.

    perl -we 'use Data::Dumper; $s = \%p; @s = (); while (<>) { our $f++ o +r $_ = ": " . $_; while (/\G\s*(?:([-\w.]+|"[^"]*")|:([-\w.]*)\s*\(|( +\)))/gc) { if (defined($1)) { defined($$s{""}) and die "parse error: +two"; $$s{"@"} = $1; } elsif (defined($2)) { push @s, $s; $s = $$s{$2 +} = {}; } elsif (defined($3)) { @s or die "parse error: close"; $s = +pop @s; } } /(\S.*)/g and die "parse error: junk: $1"; } $! and die " +read error"; $s == \%p or die "parse error: open"; print Dumper(\%p); +'

    A historical note. I did the correction because someone has asked on an irc channel how to parse a file of this exact format.

    Here's the long sample

    Update: a version of the above converted to a real script (not a one-liner using global variables) is here. This one also removes double-quotes from double-quoted strings and accepts multi-line strings. The file format has backslash-escaped double quotes in double-quoted strings it seems, and possibly other things this can't parse.

    use warnings; use strict; use Data::Dumper; sub parse { my($f) = @_; my($s, %p, @s, $b); $s = \%p; while (<$f>) { $b++ or $_ = ": " . $_; while (/\G\s*(?:([-\w.]+)|"([^"]*)"|("[^"]*$)|:([-\w.] +*)\s*\(|(\)))/gc) { if (defined($1) || defined($2)) { defined($$s{""}) and die "parse error: + two"; $$s{"@"} = defined($1) ? $1 : $2; } elsif (defined($3)) { $_ = $+ . <$f>; } elsif (defined($4)) { push @s, $s; $s = $$s{$+} = {}; } elsif (defined($5)) { @s or die "parse error: close"; $s = pop @s; } } /(\S.*)/g and die "parse error: junk: $1"; } $! and die "read error"; $s == \%p or die "parse error: open"; \%p; } my $p = parse(*ARGV); print Dumper($p); __END__

    Update: defined($$s{""}) and die "parse error: two"; shoud be changed to defined($$s{"@"}) and die "parse error: two"; in both scripts I belive.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://399813]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (9)
As of 2015-07-07 00:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (86 votes), past polls