Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

parsing metadata

by Anonymous Monk
on Sep 05, 2014 at 14:08 UTC ( [id://1099672]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

Could some one help me with parsing this metadata. I have the metadata slurped in to an array .

my @results = qx(/software/bin/imeta ls -d $path); my @created_date = grep ( /^(value: )(\d{4}-\d{2}-\d{2}T\w+)/,@results +);

But I want to get more data from the @results. The metadata actually looks like this.

AVUs defined for dataObj 3a/73/2c/metadata.csv: attribute: dcterms:created value: 2014-08-13T00:00:10 units: ---- attribute: control value: 0 units: ---- attribute: md5 value: 3a732cd0fddaa80fa65fcf28664eaf6d units: ----
I want to get both the dcterms:created as well as md5. I guess having them in a hash with their attributes would help but I don't know how to do. Any help or suggestions please?

Replies are listed 'Best First'.
Re: parsing metadata
by toolic (Bishop) on Sep 05, 2014 at 14:20 UTC
    This works for the small snippet of your input:
    use warnings; use strict; my %atts; my $att; while (<DATA>) { chomp; $att = $1 if /^attribute:\s+(.+)/; $atts{$att} = $1 if /^value:\s+(.+)/; } use Data::Dumper; $Data::Dumper::Sortkeys=1; print Dumper(\%atts); __DATA__ AVUs defined for dataObj 3a/73/2c/metadata.csv: attribute: dcterms:created value: 2014-08-13T00:00:10 units: ---- attribute: control value: 0 units: ---- attribute: md5 value: 3a732cd0fddaa80fa65fcf28664eaf6d units: ----

    Outputs:

    $VAR1 = { 'control' => '0', 'dcterms:created' => '2014-08-13T00:00:10', 'md5' => '3a732cd0fddaa80fa65fcf28664eaf6d' };

    Your input probably has several records, in which case you can look at perldsc.

    Of course, if this is some standard format, there's probably a better solution on CPAN.

      Alternatively, you could write the loop like this:

      my @interesting = (); while(<DATA>) { chomp; push @interesting, $1 if m/^(?:attribute|value): (.*)$/; } my %attributes = @interesting; # magic

      It's a kind of magic. :) You could even turn this into a oneliner, using e.g. grep and map, and employing some more dirty tricks along the way:

      my %attributes = grep { length } map { m/^(?:attribute|value): (.*)$/ +and $1 } <DATA>;

      Anyhow, going back to the while loop version, adding support for parsing the attributes of several objects at the same time is also fairly straightforward. E.g.:

      my %dataObjs = (); my @interesting = (); my $dataObj; while(<DATA>) { chomp; if(m/^AVUs defined for dataObj (.*):$/) { defined $dataObj and $dataObjs{$dataObj} = { @interesting }; $dataObj = $1; } push @interesting, $1 if m/^(?:attribute|value): (.*)$/; } $dataObjs{$dataObj} = { @interesting };

      Though in this case I'd use your solution instead, since it leads to much simpler code:

      my %dataObjs = (); my @interesting = (); my $dataObj; my $att; while(<DATA>) { chomp; $dataObj = $1 if m/^AVUs defined for dataObj (. +*):$/; $att = $1 if m/^attribute:\s+(.+)/; $dataObjs{$dataObj}->{$att} = $1 if /^value:\s+(.+)/; }

      No magic, alas. :)

      Side note - according to my favorite WWW search engine, this sort of data is generated by iRODS. CPAN doesn't have any related modules, so here's a good opportunity to contribute to the Perl ecosystem for anyone who works with that system.

        Hi Both , Thanks !

        Yes they are iRods meta data. But they are printed on screen and I don't want to push them to a file and create more overhead. I was trying something along like this:

        foreach my $re(@results){ next if $re =~/^AVUs defined/; warn $re; my $attribute= $1 if($re =~ /^attribute:\s+(.*)/); + my $value = $1 if($re =~ /^value: (.*)/); + | warn $attribute,$value;
        And I get
        attribute: md5 EVAL_ERROR: Use of uninitialized value $value in warn at Access.pl lin +e 125. A problem occurred at /nfs/users/nfs_a/aj6/CGP/Fluidigm/perl/scripts/L +oadGenotypingResults.pl line 61.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1099672]
Approved by hominid
Front-paged by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2024-04-24 22:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found