I'm trying to become better at creating nested data structures. I came across this practice problem and decided to try it out. After a couple of hours, I'm stuck trying to figure out how to extract an array of hashes from the data. See the non-functioning second "elsif" conditional in my code below. Here's my attempt (with data):
#!/usr/bin/perl -w
use strict;
my %alldata = ();
while (<DATA>) {
chomp;
my ($key, $value) = split/\t+/; ## Splits string on one or more t
+abs
if ( grep { $key eq $_ } qw(ID TITLE GENE CYTOBAND
LOCUSLINK CHROMOSOME SCOUNT) ) { ## Procedure for simple hash
+es
$alldata{$key} = $value;
} elsif ($key eq 'EXPRESS') { ## Procedure for simple arrays
s/^;+//; ## Gets rid of any leading semicolons
$alldata{$key} = [split /;/, $value]
} elsif ( grep { $key eq $_ } qw(SEQUENCE PROTSIM) ) { ## Procedur
+e for array of hashes
my %temphash = ();
my @splitvalue = split /;/, $value;
foreach my $split2 (@splitvalue) {
my ($label, $content) = split/=/, $split2;
$temphash{$label} = $content;
}
$alldata{$key} = push [ %temphash ];
}
}
__DATA__
ID Hs.22
TITLE transglutaminase 1 (K polypeptide epidermal type I, prote
+in-glutamine-gamma-glutamyltransferase)
GENE TGM1
CYTOBAND 14q11.2
LOCUSLINK 7051
EXPRESS erwewe;Esophagus;Germ Cell;Larynx;Pancreas;Uterus;colon
+;head_neck;uterus
CHROMOSOME 14
PROTSIM ORG=Homo sapiens; PROTGI=1070465; PROTID=PIR:TGHUM1; PC
+T=100; ALN=816
PROTSIM ORG=Mus musculus; PROTGI=730933; PROTID=SP:Q08189; PCT=
+39; ALN=662
PROTSIM ORG=Rattus norvegicus; PROTGI=135697; PROTID=SP:P23606;
+ PCT=91; ALN=815
SCOUNT 24
SEQUENCE ACC=M62925; NID=g339603; PID=g339604
SEQUENCE ACC=M98447; NID=g186734; PID=g1256959
SEQUENCE ACC=D90287; NID=g219631; PID=g219632
SEQUENCE ACC=M55183; NID=g186789; PID=g186790
SEQUENCE ACC=X57974; NID=g510524; PID=g510525
SEQUENCE ACC=BF155997; NID=g11051180; LID=4808
SEQUENCE ACC=AW083702; NID=g6038854; CLONE=IMAGE:2587766; END=3'; L
+ID=728
SEQUENCE ACC=AI652954; NID=g4736933; CLONE=IMAGE:2306445; END=3'; L
+ID=698
SEQUENCE ACC=BF155987; NID=g11051170; LID=4808
SEQUENCE ACC=AI239574; NID=g3834971; CLONE=IMAGE:1846343; END=3'; L
+ID=600
SEQUENCE ACC=AW265414; NID=g6642230; CLONE=IMAGE:2754263; END=3'; L
+ID=1370
SEQUENCE ACC=BE182598; LID=3549
SEQUENCE ACC=AI269864; NID=g3889031; CLONE=IMAGE:2005287; END=3'; L
+ID=705
SEQUENCE ACC=AW085789; NID=g6040941; CLONE=IMAGE:2588217; END=3'; L
+ID=728
SEQUENCE ACC=BF156003; NID=g11051186; LID=4808
SEQUENCE ACC=AW194040; NID=g6472771; CLONE=IMAGE:2683894; END=3'; L
+ID=760
SEQUENCE ACC=AL039214; NID=g5408290; CLONE=DKFZp727C171; END=5'; LI
+D=860
SEQUENCE ACC=BE934356; NID=g10460432; LID=4595
SEQUENCE ACC=AW796622; NID=g7848492; LID=2510
SEQUENCE ACC=BE293065; NID=g9175931; CLONE=IMAGE:3349385; END=5'; L
+ID=3594
SEQUENCE ACC=AA583940; NID=g2368549; CLONE=IMAGE:1088673; END=3'; L
+ID=567
SEQUENCE ACC=NM_000359; NID=g4507474; PID=g4507475
SEQUENCE ACC=BF155992; NID=g11051175; LID=4808
SEQUENCE ACC=BF089798; NID=g10895508; LID=4808
Thanks.
$PM = "Perl Monk's";
$MCF = "Most Clueless Friar";
$nysus = $PM . $MCF;
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|