I'm trying to become better at creating nested data structures. I came across this practice problem and decided to try it out. After a couple of hours, I'm stuck trying to figure out how to extract an array of hashes from the data. See the non-functioning second "elsif" conditional in my code below. Here's my attempt (with data):
#!/usr/bin/perl -w
use strict;
my %alldata = ();
while (<DATA>) {
chomp;
my ($key, $value) = split/\t+/; ## Splits string on one or more t
+abs
if ( grep { $key eq $_ } qw(ID TITLE GENE CYTOBAND
LOCUSLINK CHROMOSOME SCOUNT) ) { ## Procedure for simple hash
+es
$alldata{$key} = $value;
} elsif ($key eq 'EXPRESS') { ## Procedure for simple arrays
s/^;+//; ## Gets rid of any leading semicolons
$alldata{$key} = [split /;/, $value]
} elsif ( grep { $key eq $_ } qw(SEQUENCE PROTSIM) ) { ## Procedur
+e for array of hashes
my %temphash = ();
my @splitvalue = split /;/, $value;
foreach my $split2 (@splitvalue) {
my ($label, $content) = split/=/, $split2;
$temphash{$label} = $content;
}
$alldata{$key} = push [ %temphash ];
}
}
__DATA__
ID Hs.22
TITLE transglutaminase 1 (K polypeptide epidermal type I, prote
+in-glutamine-gamma-glutamyltransferase)
GENE TGM1
CYTOBAND 14q11.2
LOCUSLINK 7051
EXPRESS erwewe;Esophagus;Germ Cell;Larynx;Pancreas;Uterus;colon
+;head_neck;uterus
CHROMOSOME 14
PROTSIM ORG=Homo sapiens; PROTGI=1070465; PROTID=PIR:TGHUM1; PC
+T=100; ALN=816
PROTSIM ORG=Mus musculus; PROTGI=730933; PROTID=SP:Q08189; PCT=
+39; ALN=662
PROTSIM ORG=Rattus norvegicus; PROTGI=135697; PROTID=SP:P23606;
+ PCT=91; ALN=815
SCOUNT 24
SEQUENCE ACC=M62925; NID=g339603; PID=g339604
SEQUENCE ACC=M98447; NID=g186734; PID=g1256959
SEQUENCE ACC=D90287; NID=g219631; PID=g219632
SEQUENCE ACC=M55183; NID=g186789; PID=g186790
SEQUENCE ACC=X57974; NID=g510524; PID=g510525
SEQUENCE ACC=BF155997; NID=g11051180; LID=4808
SEQUENCE ACC=AW083702; NID=g6038854; CLONE=IMAGE:2587766; END=3'; L
+ID=728
SEQUENCE ACC=AI652954; NID=g4736933; CLONE=IMAGE:2306445; END=3'; L
+ID=698
SEQUENCE ACC=BF155987; NID=g11051170; LID=4808
SEQUENCE ACC=AI239574; NID=g3834971; CLONE=IMAGE:1846343; END=3'; L
+ID=600
SEQUENCE ACC=AW265414; NID=g6642230; CLONE=IMAGE:2754263; END=3'; L
+ID=1370
SEQUENCE ACC=BE182598; LID=3549
SEQUENCE ACC=AI269864; NID=g3889031; CLONE=IMAGE:2005287; END=3'; L
+ID=705
SEQUENCE ACC=AW085789; NID=g6040941; CLONE=IMAGE:2588217; END=3'; L
+ID=728
SEQUENCE ACC=BF156003; NID=g11051186; LID=4808
SEQUENCE ACC=AW194040; NID=g6472771; CLONE=IMAGE:2683894; END=3'; L
+ID=760
SEQUENCE ACC=AL039214; NID=g5408290; CLONE=DKFZp727C171; END=5'; LI
+D=860
SEQUENCE ACC=BE934356; NID=g10460432; LID=4595
SEQUENCE ACC=AW796622; NID=g7848492; LID=2510
SEQUENCE ACC=BE293065; NID=g9175931; CLONE=IMAGE:3349385; END=5'; L
+ID=3594
SEQUENCE ACC=AA583940; NID=g2368549; CLONE=IMAGE:1088673; END=3'; L
+ID=567
SEQUENCE ACC=NM_000359; NID=g4507474; PID=g4507475
SEQUENCE ACC=BF155992; NID=g11051175; LID=4808
SEQUENCE ACC=BF089798; NID=g10895508; LID=4808
Thanks.
$PM = "Perl Monk's";
$MCF = "Most Clueless Friar";
$nysus = $PM . $MCF;
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Outside of code tags, you may need to use entities for some characters:
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
|
|