Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^2: Converting Uniprot File to a Fasta File in Perl

by pearllearner315 (Acolyte)
on Feb 27, 2017 at 19:06 UTC ( [id://1183000]=note: print w/replies, xml ) Need Help??


in reply to Re: Converting Uniprot File to a Fasta File in Perl
in thread Converting Uniprot File to a Fasta File in Perl

for the last group which will belong to the "SQ" line, how would i capture the multi line sequence into a variable? I would think to use  $line =~ /^SQ\s+(.*)/ again but that regex would capture the multiple white spaces in between the sequence.

Replies are listed 'Best First'.
Re^3: Converting Uniprot File to a Fasta File in Perl
by poj (Abbot) on Feb 27, 2017 at 19:43 UTC

    Use a flag to capture the multiple lines. Remove the spaces with a regex.

    my %hash=(); my $seq; my $flag = 0; while (<$ufh>) { chomp; if ( /^(AC|OS|OX|ID|GN|SQ)\s+(.*)/ ){ print "<$1> <$2>\n"; $hash{$1} = $2; $flag = 1 if /SQ/; } elsif (/^K\s+/){ $flag = 0; } elsif ($flag == 1){ s/ +//g; # remove spaces $seq .= $_."\n" } } print Dumper \%hash; print $seq;
    poj
      hi poj! thank you so much for your help! could you explain what the flag is actually doing? i'm reading the code and i'm having difficulty understanding. also what is Dumper? i'm trying to format my code so that I can get this type of output once i parse the headers and sequence:
      >NM_012514 | Rattus norvegicus | breast cancer 1 (Brca1) | mRNA CGCTGGTGCAACTCGAAGACCTATCTCCTTCCCGGGGGGGCTTCTCCGGCATTTAGGCCT CGGCGTTTGGAAGTACGGAGGTTTTTCTCGGAAGAAAGTTCACTGGAAGTGGAAGAAATG GATTTATCTGCTGTTCGAATTCAAGAAGTACAAAATGTCCTTCATGCTATGCAGAAAATC TTGGAGTGTCCAATCTGTTTGGAACTGATCAAAGAACCGGTTTCCACACAGTGCGACCAC ATATTTTGCAAATTTTGTATGCTGAAACTCCTTAACCAGAAGAAAGGACCTTCCCAGTGT CCTTTGTGTAAGAATGAGATAACCAAAAGGAGCCTACAAGGAAGTGCAAGG

        Once you find the SQ line you set the flag, then you accumulate all further lines(without the spaces) into $seq untill you find a line the begins with K and only has spaces or nothing after it. Then you unset the flag.

        Dumper is a nice way to printout arrays and hashes for inspection, also works with scalars too

        as for your output, it will look something like

        print $hash{AC}; print ' | ' print $hash{OS}; print ' | ' print $hash{OX}; print ' | ' print $hash{ID}; print ' | ' print $hash{GN}; print "\n"; my $split_after=50; while (length($seq)>$split_after){ print substr($seq,0,50)."\n"; $seq=substr($seq,$split_after); } print $seq."\n";
        Since your sample input doesnt seem to have your sample output values i cant say what the sequence of the print statements should look like. And you may need to adjust $split_after, i didnt feel like counting

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1183000]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2024-04-24 06:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found