Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

How to analyse structured data to get a hash

by merrymonk (Friar)
on Jun 03, 2017 at 14:44 UTC ( #1192056=perlquestion: print w/replies, xml ) Need Help??
merrymonk has asked for the wisdom of the Perl Monks concerning the following question:

I have a file with rows of data which are typically like

Attr num="101" name="Created" desc="Time file was created." type="t" ord="3" value="2017-06-03T11:27:23+01:00"

I want to analyse this so that I can get a hash with data like

Attr num ->101 Name ->Created Desc ->Time file was created Type ->t Ord ->3 Value->2017-06-03T11:27:23+01:00

Therefore it would be the equivalent of

$attr{Name} = Created etc

What is the simplest way of analysing the line to get the hash data?

Replies are listed 'Best First'.
Re: How to analyse structured data to get a hash
by Marshall (Abbot) on Jun 04, 2017 at 00:22 UTC
    To show another common Perl technique for your toolbox...
    Use match global to create a list of pairs: Key,value,key2,value2,etc and simply assign that list to a hash variable. Bingo, a multi-key hash with the value assignments from one line of data. I recommend careful study of the regex to understand the specifics of exactly how it works. Update: the concept of a minimal vs a maximal match applies.
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; while (<DATA>) { my %hash = /\s*(.*?)="(.*?)"/g; print Dumper \%hash; } =Prints: $VAR1 = { 'value' => '2017-06-03T11:27:23+01:00', 'type' => 't', 'desc' => 'Time file was created.', 'ord' => '3', 'Attr num' => '101', 'name' => 'Created' }; =cut __DATA__ Attr num="101" name="Created" desc="Time file was created." type="t" o +rd="3" value="2017-06-03T11:27:23+01:00"
Re: How to analyse structured data to get a hash
by hippo (Abbot) on Jun 05, 2017 at 08:23 UTC
    What is the simplest way of analysing the line to get the hash data?

    Here's an SSCCE showing this by combining split with hash assignment. Note that this is to answer precisely the question you asked - namely it is the simplest method. That doesn't mean that it is the most efficient or secure or robust or best documented or the winner in any other criterion.

    #!/usr/bin/env perl use strict; use warnings; use Test::More tests => 1; my $in = q/Attr num="101" name="Created" desc="Time file was created." + type="t" ord="3" value="2017-06-03T11:27:23+01:00"/; my %want = ( 'Attr num' => 101, name => 'Created', desc => 'Time file was created.', type => 't', ord => 3, value => '2017-06-03T11:27:23+01:00' ); my %have = split (/="|" ?/, $in); is_deeply (\%have, \%want);

    Update: typo fix (thanks, choroba)

Re: How to analyse structured data to get a hash
by huck (Vicar) on Jun 03, 2017 at 16:12 UTC

    my %datahash; while (my $line=<DATA>) { chomp $line; $line=~s/^\s+//; $line=~s/\s+$//; my ($var,$val0)=split('=',$line,2); $val0='' unless (defined $val0); if ($val0=~m/^\"(.*)\"$/) { $datahash{$var}=$1;} else {$datahash{$var}=$val0;} } use Data::Dumper; print Dumper({datahash=>\%datahash}); __DATA__ Attr num="101" name="Created" desc="Time file was created." type="t" ord="3" value="2017-06-03T11:27:23+01:00" noeq noval= noquotes=noquotes
    If you dont do your own homework you wont learn anything.

    Result

    $VAR1 = { 'datahash' => { 'name' => 'Created', 'Attr num' => '101', 'noeq' => '', 'noquotes' => 'noquotes', 'ord' => '3', 'noval' => '', 'value' => '2017-06-03T11:27:23+01:00', 'desc' => 'Time file was created.', 'type' => 't' } };

      A small point:
      chomp $line; #### not needed with below ### $line=~s/^\s+//; $line=~s/\s+$//; # this deletes all line ending char's # so the chomp is unnecessary.
Re: How to analyse structured data to get a hash
by LanX (Bishop) on Jun 03, 2017 at 14:49 UTC
Re: How to analyse structured data to get a hash
by sundialsvc4 (Abbot) on Jun 05, 2017 at 13:09 UTC

    Something else that immediately occurs to me:   “is this XML?”

    For instance, if the data actually looked like:

    <Attr num="101" name="Created" desc="Time file was created." type="t" ord="3" value="2017-06-03T11:27:23+01:00"></Attr>
    ... then it would be in a very well-known data format (XML) that is well-supported by Perl and most other languages.   So, take a moment to be sure that you are not actually, in fact, dealing with a problem that has not already been well-solved before.   “Structured data” is very often expressed in XML because this format allows for the easy expression of nested data structures.

      And even if it isn't XML, you could jam it into XML in order to take advantage of existing XML parsing facilities. To wit:

      use XML::Simple; while (<>) { chomp; my $hashref = XMLin("<$_/>"); ... }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1192056]
Approved by Marshall
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2017-10-18 04:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My fridge is mostly full of:

















    Results (242 votes). Check out past polls.

    Notices?