Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

It would help if you wrapped your sample data in <code> or <pre> tags, so we can see where lines actually break.

Update:

Thanks for making your data easier to read. I'm still not sure whether by "spaces between the sequences" you mean the sequences are really in a single line, instead of broken into multiple lines as you have them here, but it doesn't matter for my solution. Instead of reading the file line-by-line and trying to determine which lines are IDs and which are sequences, and (possibly) concatenating the sections of sequences, together, I think it's much simpler if you change the input record separator from the default newline to what's actually separating your records: the > character. Then you've got a pretty standard key-value layout, making it easy to break each record into its two parts and take out anything that shouldn't be in the second part (like newlines). And as Kenosis pointed out, if you only want the longest sequence for each ID, there's no need to build a hash of arrays and find the longest ones later. Just compare lengths as you go, and replace them when you find a longer one. Like so:

#!/usr/bin/env perl use Modern::Perl; my %seqs; $/ = '>'; # break lines on this instead of newlin +e while(my $line = <DATA>){ chomp $line; # remove any trailing > next unless $line; # skip leading blank record before firs +t > my($id, $seq) = split /\s+/, $line, 2; $seq =~ s/[\r\n]//g; # strip newlines and/or carriage return +s from sequence unless($seqs{$id} and length($seqs{$id}) > length($seq)){ $seqs{$id} = $seq; # save it if it's a new ID or a longer +sequence } } say ">$_ $seqs{$_}" for keys %seqs; __DATA__ >ENSG00000010072 MDDDLMLALRLQEEWNLQEAERDHAQESLSLVDASWELVDPTPDLQALFVQFNDQFFWGQ LEAVEVKWSVRMTLCAGICSYEGKGGMCSIRLSEPLLKLRPRKDLVEVYHTFHDEVDEYR RHWWRCNGPCQHRPPYYGYVKRATNREPSAHDYWWAEHQKTCGGTYIKIKEPENYSKKGK GKAKLGKEPVLAAENKGTFVYILLIFM* >ENSG00000067082 Sequence unavailable >ENSG00000147724 MSEIQGTVEFSVELHKFYNVDLFQRGYYQIRVTLKVSSRIPHRLSASIAGQTESSSLHSA CVHDSTVHSRVFQILYRNEEVPINDAVVFRVHLLLGGERMEDALSEVDFQLKVDLHFTDS EQQLRDVAGAPMVSSRTLGLHFHPRNGLHHQVP >ENSG00000010072 MDDDLMLALRLQEEWNLQEAERDHAQESLSLVDASWELVDPTPDLQALFVQFNDQFFWGQ LEAVEVKWSVRMTLCAGICSYEGKGGMCSIRLSEPLLKLRPRKDLVETLLHEMIHAYLFV TNNDKDREGHGPEFCKHMHRINSLTGANITVYHTFHDEVDEYRRHWWRCNGPCQHRPPYY GYVKRATNREPSAHDYWWAEHQKTCGGTYIKIKEPENYSKKGKGKAKLGKEPVLAAENKD KPNRGEAQLVIPFSGKGYVLGETSNLPSPGKLITSHAINKTQDLLNQNHSANAVRPNSKI KVKFEQNGSSKNSHLVSPAVSNSHQNVLSNYFPRVSFANQKAFRGVNGSPRISVTVGNIP KNSVSSSSQRRVSSSKISLRNSSKVTESASVMPSQDVSGSEDTFPNKRPRLEDKTVFDNF FIKKEQIKSSGNDPKYSTTTAQNSSSSSSQSKMVNCPVCQNEVLESQINEHLDWCLEGDS IKVKSEESL*

Aaron B.
My Woefully Neglected Blog, where I occasionally mention Perl.


In reply to Re^3: Saving different values for the same key by using Hash of Arrays by aaron_baugher
in thread Saving different values for the same key by using Hash of Arrays by beginner27

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others romping around the Monastery: (8)
    As of 2014-12-27 02:12 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      Is guessing a good strategy for surviving in the IT business?





      Results (176 votes), past polls