Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Parsing to a hash suggestions

by chasdavies (Initiate)
on Nov 29, 2010 at 18:18 UTC ( #874313=perlquestion: print w/replies, xml ) Need Help??

chasdavies has asked for the wisdom of the Perl Monks concerning the following question:

I have a structured file with content similar to this:
( File "foo" ( Header ( Key ( physical ) ) ( Version ( 1.0 ) ) ( Revision ( log 0 ) ( phy 0 ) ( oth 0 ) ) ( Contents ( Flags ) ( Dictionary ) ( Properties ) ) ( Precision ( Units mil ) ( Dec 1 ) ) ) )
I need to parse the file and put it into a hash structure that would look like this:
my $hash = { File => { value => "foo", Header => { Key => { value => "phy", }, Version => { value => 1.0, }, Revision => { log => { value => 0 }, phy => { value => 0 }, oth => { value => 0 } }, Contents => { Flags => {}, Dictionary => {}, Properties => {} }, Precision => { Units => { value => "mil" }, Dec => { value => 1 } } } } };
I was wondering if anyone has a recommendation on the most effective method to do this.
Many thanks for any and all responses.
Regards,
Charlie

Replies are listed 'Best First'.
Re: Parsing to a hash suggestions
by Anonymous Monk on Nov 29, 2010 at 19:02 UTC
    I was wondering if anyone has a recommendation on the most effective method to do this.

    The most effective method is to have someone else do it for you, seriously :)

    Your data look vaguely like LISP, so I'd search for lisp parsers on CPAN.

    Something similar Re: String Search, Re^5: String Search

Re: Parsing to a hash suggestions
by sundialsvc4 (Abbot) on Nov 29, 2010 at 20:10 UTC

    Also consider Parse::RecDescent as a general-purpose parsing solution (albeit complicated) that is good for most things.

Re: Parsing to a hash suggestions
by BrowserUk (Pope) on Dec 01, 2010 at 05:45 UTC

    With the exception of converting the token "physical" into "phy" which I assume to be a typo, the code below seems to do a good job of matching your requirements. The output at the top is from my program; at the bottom your 'desired output' reformatted to match:

    C:\test>874313 { File => { Header => { Contents => { Dictionary => {}, Flags => {}, Properties => + {} }, Key => { value => "physical" }, Precision => { Dec => { value => 1 }, Units => { value => +"mil" } }, Revision => { "log" => { value => 0 }, oth => { value => 0 }, phy => { value => 0 } }, Version => { value => "1.0" }, }, value => "\"foo\"", }, } my $hash = { File => { Header => { Contents => { Flags => {}, Dictionary => {}, Properties => + {} }, Key => { value => "phy", }, Precision => { Units => { value => "mil" }, Dec => { value + => 1 } } Revision => { log => { value => 0 }, phy => { value => 0 }, oth => { value => 0 } }, Version => { value => 1.0, }, } value => "foo", } };

    Your 'desired output' has an inconsistency, in as much as it requires that a list containing a single, single element value (eg. ( Key ( physical ) ) become a hash containing the key 'value' with the single element its value: Key => { value => "physical", } rather than Key => { physical => {} }.

    Where as a list more than one single element value: eg. ( Contents ( Flags ) ( Dictionary ) ( Properties ) ) become a hash with the single elements as keys and empty hashes as their values: Contents => { Flags => {}, Dictionary => {}, Properties => {} }.

    Without that anomaly, the parse sub could be simplified by the removal of the final nested if block. It would then produce:

    { File => { Header => { Contents => { Dictionary => {}, Flags => {}, Properties => + {} }, Key => { physical => {} }, Precision => { Dec => { value => 1 }, Units => { value => +"mil" } }, Revision => { "log" => { value => 0 }, oth => { value => 0 }, phy => { value => 0 } }, Version => { "1.0" => {} }, }, value => "\"foo\"", }, }

    which is more consistent, and I think would therefore be easier to use. If that makes sense to you, just remove the final nested ifs.

    The parser should be robust in the face of variable white-space, but it does require all tokens to be white-space delimited.

    The code:

    #! perl -slw use strict; use Data::Dump qw[ pp ]; $|++; sub seeNextToken { my( $next ) = $_[0] =~ m[\s*(\S+)]; return $next; } sub getNextToken { $_[0] =~ s[\s*(\S+)\s+][] or die; return $1; } #my $depth = 0; sub parse { local $^W; ## alias rather than copy the input, so that we can modify it our $in; local *in = \$_[0]; my $ref = {}; my $token = getNextToken( $in ); die 'No open paren' unless $token eq '('; my $name = getNextToken( $in ); my $value; if( seeNextToken( $in ) !~ '[()]' ) { $value = getNextToken( $in ); } $ref->{ value } = $value if defined $value; # printf "%s n:$name v:$value (next:%s) in:$in\n", ' .' x $depth++, seeNextToken( $in ); while( seeNextToken( $in ) eq '(' ) { my( $name, $value ) = parse( $in ); $ref->{ $name } = $value; } die 'Missing close paren' unless getNextToken( $in ) eq ')'; ## fix up the single, single anomaly if( keys( %$ref ) == 1 ) { my( $key, $value ) = each %$ref; if( ref $value eq 'HASH' and keys( %$value ) == 0 ) { delete $ref->{ $key }; $ref->{ value } = $key; } } # --$depth; return $name, $ref; } my $input = do{ local $/; <DATA> }; $input =~ s[\s+][ ]gsm; my $ref = { parse( $input ) }; pp $ref; __DATA__ ( File "foo" ( Header ( Key ( physical ) ) ( Version ( 1.0 ) ) ( Revision ( log 0 ) ( phy 0 ) ( oth 0 ) ) ( Contents ( Flags ) ( Dictionary ) ( Properties ) ) ( Precision ( Units mil ) ( Dec 1 ) ) ) )

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Thank you very much. This is exactly what I needed. I have a tested it on more robust sample of data and it worked flawlessly, albeit I did need to tweak it a bit. Many many thanks and hope you have a great day.

      Best regards,
      Charlie
Re: Parsing to a hash suggestions
by aquarium (Curate) on Nov 30, 2010 at 00:35 UTC
    define effective...because it would be possibly more effective if you specified which program the input file came from. matlab? lisp? other? if it's from any well known program, most likely already had a parser written for it, in perl or other that you could adapt.
    the hardest line to type correctly is: stty erase ^H
      The sample file is a snippet from one of our CAD tools. It is a Lisp structure. There is a Lisp-based programming API to access the data from within the CAD tool, but we need to access this data externally. The CAD company does not provide externally accessible APIs. I will check on the CAD forums to see if someone may have written a Perl parser. Thanks for the suggestion.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://874313]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2020-08-05 16:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which rocket would you take to Mars?










    Results (36 votes). Check out past polls.

    Notices?