Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

With the exception of converting the token "physical" into "phy" which I assume to be a typo, the code below seems to do a good job of matching your requirements. The output at the top is from my program; at the bottom your 'desired output' reformatted to match:

C:\test>874313 { File => { Header => { Contents => { Dictionary => {}, Flags => {}, Properties => + {} }, Key => { value => "physical" }, Precision => { Dec => { value => 1 }, Units => { value => +"mil" } }, Revision => { "log" => { value => 0 }, oth => { value => 0 }, phy => { value => 0 } }, Version => { value => "1.0" }, }, value => "\"foo\"", }, } my $hash = { File => { Header => { Contents => { Flags => {}, Dictionary => {}, Properties => + {} }, Key => { value => "phy", }, Precision => { Units => { value => "mil" }, Dec => { value + => 1 } } Revision => { log => { value => 0 }, phy => { value => 0 }, oth => { value => 0 } }, Version => { value => 1.0, }, } value => "foo", } };

Your 'desired output' has an inconsistency, in as much as it requires that a list containing a single, single element value (eg. ( Key ( physical ) ) become a hash containing the key 'value' with the single element its value: Key => { value => "physical", } rather than Key => { physical => {} }.

Where as a list more than one single element value: eg. ( Contents ( Flags ) ( Dictionary ) ( Properties ) ) become a hash with the single elements as keys and empty hashes as their values: Contents => { Flags => {}, Dictionary => {}, Properties => {} }.

Without that anomaly, the parse sub could be simplified by the removal of the final nested if block. It would then produce:

{ File => { Header => { Contents => { Dictionary => {}, Flags => {}, Properties => + {} }, Key => { physical => {} }, Precision => { Dec => { value => 1 }, Units => { value => +"mil" } }, Revision => { "log" => { value => 0 }, oth => { value => 0 }, phy => { value => 0 } }, Version => { "1.0" => {} }, }, value => "\"foo\"", }, }

which is more consistent, and I think would therefore be easier to use. If that makes sense to you, just remove the final nested ifs.

The parser should be robust in the face of variable white-space, but it does require all tokens to be white-space delimited.

The code:

#! perl -slw use strict; use Data::Dump qw[ pp ]; $|++; sub seeNextToken { my( $next ) = $_[0] =~ m[\s*(\S+)]; return $next; } sub getNextToken { $_[0] =~ s[\s*(\S+)\s+][] or die; return $1; } #my $depth = 0; sub parse { local $^W; ## alias rather than copy the input, so that we can modify it our $in; local *in = \$_[0]; my $ref = {}; my $token = getNextToken( $in ); die 'No open paren' unless $token eq '('; my $name = getNextToken( $in ); my $value; if( seeNextToken( $in ) !~ '[()]' ) { $value = getNextToken( $in ); } $ref->{ value } = $value if defined $value; # printf "%s n:$name v:$value (next:%s) in:$in\n", ' .' x $depth++, seeNextToken( $in ); while( seeNextToken( $in ) eq '(' ) { my( $name, $value ) = parse( $in ); $ref->{ $name } = $value; } die 'Missing close paren' unless getNextToken( $in ) eq ')'; ## fix up the single, single anomaly if( keys( %$ref ) == 1 ) { my( $key, $value ) = each %$ref; if( ref $value eq 'HASH' and keys( %$value ) == 0 ) { delete $ref->{ $key }; $ref->{ value } = $key; } } # --$depth; return $name, $ref; } my $input = do{ local $/; <DATA> }; $input =~ s[\s+][ ]gsm; my $ref = { parse( $input ) }; pp $ref; __DATA__ ( File "foo" ( Header ( Key ( physical ) ) ( Version ( 1.0 ) ) ( Revision ( log 0 ) ( phy 0 ) ( oth 0 ) ) ( Contents ( Flags ) ( Dictionary ) ( Properties ) ) ( Precision ( Units mil ) ( Dec 1 ) ) ) )

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re: Parsing to a hash suggestions by BrowserUk
in thread Parsing to a hash suggestions by chasdavies

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2024-04-26 09:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found