my %hash=();
my $seq;
my $flag = 0;
while (<$ufh>) {
chomp;
if ( /^(AC|OS|OX|ID|GN|SQ)\s+(.*)/ ){
print "<$1> <$2>\n";
$hash{$1} = $2;
$flag = 1 if /SQ/;
} elsif (/^K\s+/){
$flag = 0;
} elsif ($flag == 1){
s/ +//g; # remove spaces
$seq .= $_."\n"
}
}
print Dumper \%hash;
print $seq;
poj | [reply] [d/l] |
hi poj! thank you so much for your help! could you explain what the flag is actually doing? i'm reading the code and i'm having difficulty understanding. also what is Dumper? i'm trying to format my code so that I can get this type of output once i parse the headers and sequence:
>NM_012514 | Rattus norvegicus | breast cancer 1 (Brca1) | mRNA
CGCTGGTGCAACTCGAAGACCTATCTCCTTCCCGGGGGGGCTTCTCCGGCATTTAGGCCT
CGGCGTTTGGAAGTACGGAGGTTTTTCTCGGAAGAAAGTTCACTGGAAGTGGAAGAAATG
GATTTATCTGCTGTTCGAATTCAAGAAGTACAAAATGTCCTTCATGCTATGCAGAAAATC
TTGGAGTGTCCAATCTGTTTGGAACTGATCAAAGAACCGGTTTCCACACAGTGCGACCAC
ATATTTTGCAAATTTTGTATGCTGAAACTCCTTAACCAGAAGAAAGGACCTTCCCAGTGT
CCTTTGTGTAAGAATGAGATAACCAAAAGGAGCCTACAAGGAAGTGCAAGG
| [reply] [d/l] |
Once you find the SQ line you set the flag, then you accumulate all further lines(without the spaces) into $seq untill you find a line the begins with K and only has spaces or nothing after it. Then you unset the flag.
Dumper is a nice way to printout arrays and hashes for inspection, also works with scalars too
as for your output, it will look something like
print $hash{AC};
print ' | '
print $hash{OS};
print ' | '
print $hash{OX};
print ' | '
print $hash{ID};
print ' | '
print $hash{GN};
print "\n";
my $split_after=50;
while (length($seq)>$split_after){
print substr($seq,0,50)."\n";
$seq=substr($seq,$split_after);
}
print $seq."\n";
Since your sample input doesnt seem to have your sample output values i cant say what the sequence of the print statements should look like. And you may need to adjust $split_after, i didnt feel like counting | [reply] [d/l] |
| [reply] [d/l] |