comment on

With the exception of converting the token "physical" into "phy" which I assume to be a typo, the code below seems to do a good job of matching your requirements. The output at the top is from my program; at the bottom your 'desired output' reformatted to match:

C:\test>874313
{
    File => {
        Header => {
            Contents => { Dictionary => {}, Flags => {}, Properties =>
+ {} },
            Key => { value => "physical" },
            Precision => { Dec => { value => 1 }, Units => { value => 
+"mil" } },
            Revision => { 
                "log" => { value => 0 }, 
                oth => { value => 0 }, 
                phy => { value => 0 } 
            },
            Version => { value => "1.0" },
        },
        value  => "\"foo\"",
    },
}


my $hash = {
    File => {
        Header => {
            Contents => { Flags => {}, Dictionary => {}, Properties =>
+ {} },
            Key => { value => "phy", },
            Precision => { Units => { value => "mil" }, Dec => { value
+ => 1 } }
            Revision => { 
                log => { value => 0 }, 
                phy => { value => 0 }, 
                oth => { value => 0 } 
            },
            Version => { value => 1.0, },
        }
        value => "foo",
    }
};
[download]

Your 'desired output' has an inconsistency, in as much as it requires that a list containing a single, single element value (eg. ( Key ( physical ) ) become a hash containing the key 'value' with the single element its value: Key => { value => "physical", } rather than Key => { physical => {} }.

Where as a list more than one single element value: eg. ( Contents ( Flags ) ( Dictionary ) ( Properties ) ) become a hash with the single elements as keys and empty hashes as their values: Contents => { Flags => {}, Dictionary => {}, Properties => {} }.

Without that anomaly, the parse sub could be simplified by the removal of the final nested if block. It would then produce:

{
    File => {
        Header => {
            Contents => { Dictionary => {}, Flags => {}, Properties =>
+ {} },
            Key => { physical => {} },
            Precision => { Dec => { value => 1 }, Units => { value => 
+"mil" } },
            Revision => { 
                "log" => { value => 0 }, 
                oth => { value => 0 }, 
                phy => { value => 0 } 
            },
            Version => { "1.0" => {} },
        },
        value  => "\"foo\"",
    },
}
[download]

which is more consistent, and I think would therefore be easier to use. If that makes sense to you, just remove the final nested ifs.

The parser should be robust in the face of variable white-space, but it does require all tokens to be white-space delimited.

The code:

#! perl -slw
use strict;
use Data::Dump qw[ pp ];

$|++;

sub seeNextToken {
    my( $next ) = $_[0] =~ m[\s*(\S+)];
    return $next;
}

sub getNextToken {
    $_[0] =~ s[\s*(\S+)\s+][] or die;
    return $1;
}

#my $depth = 0;
sub parse { local $^W;
    ## alias rather than copy the input, so that we can modify it
    our $in; local *in = \$_[0];
    my $ref = {};

    my $token = getNextToken( $in );
    die 'No open paren' unless $token eq '(';

    my $name = getNextToken( $in );
    my $value;
    if( seeNextToken( $in ) !~ '[()]' ) {
        $value = getNextToken( $in );
    }
    $ref->{ value } = $value if defined $value;

#    printf "%s n:$name v:$value (next:%s) in:$in\n", 
         ' .' x $depth++, seeNextToken( $in );

    while( seeNextToken( $in ) eq '(' ) {
        my( $name, $value ) = parse( $in );
        $ref->{ $name } = $value;
    }
    die 'Missing close paren' unless getNextToken( $in ) eq ')';

    ## fix up the single, single anomaly
    if( keys( %$ref ) == 1 ) {
        my( $key, $value ) = each %$ref;
        if( ref $value eq 'HASH' and keys( %$value ) == 0 ) {
            delete $ref->{ $key };
            $ref->{ value } = $key;
        }
    }
#    --$depth;
    return $name, $ref;
}

my $input = do{ local $/; <DATA> };
$input =~ s[\s+][ ]gsm;

my $ref = { parse( $input ) };
pp $ref;

__DATA__

( File "foo"
    ( Header
        ( Key
            ( physical )
        )
        ( Version
            ( 1.0 )
        )
        ( Revision
            ( log 0 )
            ( phy 0 )
            ( oth 0 )
        )
        ( Contents
            ( Flags )
            ( Dictionary )
            ( Properties )
        )
        ( Precision
            ( Units mil )
            ( Dec 1 )
        )
    )
)
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re: Parsing to a hash suggestions by BrowserUk
in thread Parsing to a hash suggestions by chasdavies

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Don't ask to ask, just ask
	PerlMonks