Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Extracting array of hashes from data

by nysus (Priest)
on Jun 10, 2001 at 00:22 UTC ( #87227=perlquestion: print w/replies, xml ) Need Help??
nysus has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to become better at creating nested data structures. I came across this practice problem and decided to try it out. After a couple of hours, I'm stuck trying to figure out how to extract an array of hashes from the data. See the non-functioning second "elsif" conditional in my code below. Here's my attempt (with data):
#!/usr/bin/perl -w use strict; my %alldata = (); while (<DATA>) { chomp; my ($key, $value) = split/\t+/; ## Splits string on one or more t +abs if ( grep { $key eq $_ } qw(ID TITLE GENE CYTOBAND LOCUSLINK CHROMOSOME SCOUNT) ) { ## Procedure for simple hash +es $alldata{$key} = $value; } elsif ($key eq 'EXPRESS') { ## Procedure for simple arrays s/^;+//; ## Gets rid of any leading semicolons $alldata{$key} = [split /;/, $value] } elsif ( grep { $key eq $_ } qw(SEQUENCE PROTSIM) ) { ## Procedur +e for array of hashes my %temphash = (); my @splitvalue = split /;/, $value; foreach my $split2 (@splitvalue) { my ($label, $content) = split/=/, $split2; $temphash{$label} = $content; } $alldata{$key} = push [ %temphash ]; } } __DATA__ ID Hs.22 TITLE transglutaminase 1 (K polypeptide epidermal type I, prote +in-glutamine-gamma-glutamyltransferase) GENE TGM1 CYTOBAND 14q11.2 LOCUSLINK 7051 EXPRESS erwewe;Esophagus;Germ Cell;Larynx;Pancreas;Uterus;colon +;head_neck;uterus CHROMOSOME 14 PROTSIM ORG=Homo sapiens; PROTGI=1070465; PROTID=PIR:TGHUM1; PC +T=100; ALN=816 PROTSIM ORG=Mus musculus; PROTGI=730933; PROTID=SP:Q08189; PCT= +39; ALN=662 PROTSIM ORG=Rattus norvegicus; PROTGI=135697; PROTID=SP:P23606; + PCT=91; ALN=815 SCOUNT 24 SEQUENCE ACC=M62925; NID=g339603; PID=g339604 SEQUENCE ACC=M98447; NID=g186734; PID=g1256959 SEQUENCE ACC=D90287; NID=g219631; PID=g219632 SEQUENCE ACC=M55183; NID=g186789; PID=g186790 SEQUENCE ACC=X57974; NID=g510524; PID=g510525 SEQUENCE ACC=BF155997; NID=g11051180; LID=4808 SEQUENCE ACC=AW083702; NID=g6038854; CLONE=IMAGE:2587766; END=3'; L +ID=728 SEQUENCE ACC=AI652954; NID=g4736933; CLONE=IMAGE:2306445; END=3'; L +ID=698 SEQUENCE ACC=BF155987; NID=g11051170; LID=4808 SEQUENCE ACC=AI239574; NID=g3834971; CLONE=IMAGE:1846343; END=3'; L +ID=600 SEQUENCE ACC=AW265414; NID=g6642230; CLONE=IMAGE:2754263; END=3'; L +ID=1370 SEQUENCE ACC=BE182598; LID=3549 SEQUENCE ACC=AI269864; NID=g3889031; CLONE=IMAGE:2005287; END=3'; L +ID=705 SEQUENCE ACC=AW085789; NID=g6040941; CLONE=IMAGE:2588217; END=3'; L +ID=728 SEQUENCE ACC=BF156003; NID=g11051186; LID=4808 SEQUENCE ACC=AW194040; NID=g6472771; CLONE=IMAGE:2683894; END=3'; L +ID=760 SEQUENCE ACC=AL039214; NID=g5408290; CLONE=DKFZp727C171; END=5'; LI +D=860 SEQUENCE ACC=BE934356; NID=g10460432; LID=4595 SEQUENCE ACC=AW796622; NID=g7848492; LID=2510 SEQUENCE ACC=BE293065; NID=g9175931; CLONE=IMAGE:3349385; END=5'; L +ID=3594 SEQUENCE ACC=AA583940; NID=g2368549; CLONE=IMAGE:1088673; END=3'; L +ID=567 SEQUENCE ACC=NM_000359; NID=g4507474; PID=g4507475 SEQUENCE ACC=BF155992; NID=g11051175; LID=4808 SEQUENCE ACC=BF089798; NID=g10895508; LID=4808

$PM = "Perl Monk's";
$MCF = "Most Clueless Friar";
$nysus = $PM . $MCF;

Replies are listed 'Best First'.
Re: Extracting array of hashes from data
by wog (Curate) on Jun 10, 2001 at 00:32 UTC
    This line seems to be your problem:

    $alldata{$key} = push [ %temphash ];

    you probably really meant to push %temphash on to the array ref in $alldata{$key}. In that case you can reference that array with @{ $alldata{$key} }. Then you can just push a reference to %temphash on top of it:

    push @{ $alldata{$key} }, \%temphash;

    (the array will be automatically created as needed)

    Of course see perldsc and perlref if you haven't already.

    update: Typo fixed. Thanks srawls.

      push @{ $alldata{$key }, \%temphash

      whoops, minor typo, big error; change the above line to this:

      push @{$alldata{$key }}, \%temphash

      or, you can try this:

      $hash{$key} = \%temphash; # $hash{$key} is a reference to %temph +ash $hash{$hey} = \@array; # $hash{$key} is a reference to @array $hash{$key} = {KEY => 'value'}; #$hash{$key} is a reference to an anon +ymous hash $hash{$key} = [1,2,3]; #$hash[$key] is a reference to an anon +ymous array

      Update:Cleaned up code formating a bit

      The 15 year old, freshman programmer,
      Stephen Rawls

      I'm still running into a problem using push @{ $alldata{$key} }, \%temphash;

      For instance, try: print "$alldata{PROTSIM}[2]{PCT}\n"; at the end of the file. You'll get an Use of uninitialized value at line 24, <DATA> chunk 35. error. What's strange is that: print "$alldata{PROTSIM}[2]{ORG}\n"; is AOK.

      Anyone have any ideas?

      I have not read perldsc but I certainly will. Looks useful.

      $PM = "Perl Monk's";
      $MCF = "Most Clueless Friar";
      $nysus = $PM . $MCF;

        Took me a sec, but I got it. In your data, sometimes you have spaces after the ;, and sometimes you don't. After each of your splits (or just the split/;/'s) add  \s* as in:
        my @splitvalue = split /;\s*/, $value;

        The 15 year old, freshman programmer,
        Stephen Rawls

Re: Extracting array of hashes from data
by Abigail (Deacon) on Jun 10, 2001 at 17:42 UTC
    #!/opt/perl/bin/perl -w use strict; my %keys = (map ({$_ => "SCALAR"} qw /ID TITLE GENE CYTOBAND LOCUSLINK CHROMOSOME SCOUNT/), map ({$_ => "ARRAY"} qw /EXPRESS/), map ({$_ => "HASH"} qw /SEQUENCE PROTSIM/)); my %alldata; while (<DATA>) { chomp; my ($key, $value) = split /\t+/ => $_, 2; if ($keys {$key} eq "SCALAR") {$alldata {$key} = $value} elsif ($keys {$key} eq "ARRAY") { $value =~ s/^;+//; $alldata {$key} = [split /;/ => $value]; } elsif ($keys {$key} eq "HASH") { push @{$alldata {$key}} => {map {split /=/} split /;\s*/ => $v +alue}; } else { die "Unknown key $key found in data.\n" } } __DATA__

    -- Abigail

Re: Extracting array of hashes from data
by Zaxo (Archbishop) on Jun 10, 2001 at 01:27 UTC

    You use a good technique for setting $alldata{'EXPRESS'}, splitting on a field delimiter inside array ref brackets:

    $alldata{$key} = [split /;/, $value]

    You can place a SEQUENCE or PROTSYM element in a hash reference the same way:

    # replaces everything in the last elsif(){...} push @{$alldata{$key}}, {split /;|=/, $value};

    This works because a list with an even number of elements can always be treated as a hash. If you want to trim whitespace or something, that can be map'ped inside the braces.

    After Compline,

      Very cool.

      $PM = "Perl Monk's";
      $MCF = "Most Clueless Friar";
      $nysus = $PM . $MCF;

(zdog) Re: Extracting array of hashes from data
by zdog (Priest) on Jun 10, 2001 at 01:33 UTC
    You may be looking to do something like this:

    $alldata{$key} = \%temphash;

    Instead of:

    $alldata{$key} = push [ %temphash ];

    It sets $alldata{$key} as a reference to %temphash. So the content is accessed using $alldata{$key}{$key2}.

    Zenon Zabinski | zdog |

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://87227]
Approved by root
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (7)
As of 2017-09-25 19:51 GMT
Find Nodes?
    Voting Booth?
    During the recent solar eclipse, I:

    Results (288 votes). Check out past polls.