With the exception of converting the token "physical" into "phy" which I assume to be a typo, the code below seems to do a good job of matching your requirements. The output at the top is from my program; at the bottom your 'desired output' reformatted to match:
C:\test>874313
{
File => {
Header => {
Contents => { Dictionary => {}, Flags => {}, Properties =>
+ {} },
Key => { value => "physical" },
Precision => { Dec => { value => 1 }, Units => { value =>
+"mil" } },
Revision => {
"log" => { value => 0 },
oth => { value => 0 },
phy => { value => 0 }
},
Version => { value => "1.0" },
},
value => "\"foo\"",
},
}
my $hash = {
File => {
Header => {
Contents => { Flags => {}, Dictionary => {}, Properties =>
+ {} },
Key => { value => "phy", },
Precision => { Units => { value => "mil" }, Dec => { value
+ => 1 } }
Revision => {
log => { value => 0 },
phy => { value => 0 },
oth => { value => 0 }
},
Version => { value => 1.0, },
}
value => "foo",
}
};
Your 'desired output' has an inconsistency, in as much as it requires that a list containing a single, single element value (eg. ( Key ( physical ) ) become a hash containing the key 'value' with the single element its value: Key => { value => "physical", } rather than Key => { physical => {} }.
Where as a list more than one single element value: eg. ( Contents ( Flags ) ( Dictionary ) ( Properties ) ) become a hash with the single elements as keys and empty hashes as their values: Contents => { Flags => {}, Dictionary => {}, Properties => {} }.
Without that anomaly, the parse sub could be simplified by the removal of the final nested if block. It would then produce:
{
File => {
Header => {
Contents => { Dictionary => {}, Flags => {}, Properties =>
+ {} },
Key => { physical => {} },
Precision => { Dec => { value => 1 }, Units => { value =>
+"mil" } },
Revision => {
"log" => { value => 0 },
oth => { value => 0 },
phy => { value => 0 }
},
Version => { "1.0" => {} },
},
value => "\"foo\"",
},
}
which is more consistent, and I think would therefore be easier to use. If that makes sense to you, just remove the final nested ifs.
The parser should be robust in the face of variable white-space, but it does require all tokens to be white-space delimited.
The code:
#! perl -slw
use strict;
use Data::Dump qw[ pp ];
$|++;
sub seeNextToken {
my( $next ) = $_[0] =~ m[\s*(\S+)];
return $next;
}
sub getNextToken {
$_[0] =~ s[\s*(\S+)\s+][] or die;
return $1;
}
#my $depth = 0;
sub parse { local $^W;
## alias rather than copy the input, so that we can modify it
our $in; local *in = \$_[0];
my $ref = {};
my $token = getNextToken( $in );
die 'No open paren' unless $token eq '(';
my $name = getNextToken( $in );
my $value;
if( seeNextToken( $in ) !~ '[()]' ) {
$value = getNextToken( $in );
}
$ref->{ value } = $value if defined $value;
# printf "%s n:$name v:$value (next:%s) in:$in\n",
' .' x $depth++, seeNextToken( $in );
while( seeNextToken( $in ) eq '(' ) {
my( $name, $value ) = parse( $in );
$ref->{ $name } = $value;
}
die 'Missing close paren' unless getNextToken( $in ) eq ')';
## fix up the single, single anomaly
if( keys( %$ref ) == 1 ) {
my( $key, $value ) = each %$ref;
if( ref $value eq 'HASH' and keys( %$value ) == 0 ) {
delete $ref->{ $key };
$ref->{ value } = $key;
}
}
# --$depth;
return $name, $ref;
}
my $input = do{ local $/; <DATA> };
$input =~ s[\s+][ ]gsm;
my $ref = { parse( $input ) };
pp $ref;
__DATA__
( File "foo"
( Header
( Key
( physical )
)
( Version
( 1.0 )
)
( Revision
( log 0 )
( phy 0 )
( oth 0 )
)
( Contents
( Flags )
( Dictionary )
( Properties )
)
( Precision
( Units mil )
( Dec 1 )
)
)
)
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.