Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Reading file into a hash

by hippo (Vicar)
on May 28, 2014 at 14:51 UTC ( #1087661=note: print w/ replies, xml ) Need Help??


in reply to Reading file into a hash

I think that the problem is that you are setting the value of $seq to be '' but then testing it via defined, which (other than the first time through) it will be. Change your test from if ( defined $seq ) to just if ($seq) and see what difference that makes.


Comment on Re: Reading file into a hash
Select or Download Code
Re^2: Reading file into a hash
by PerlSufi (Friar) on May 28, 2014 at 14:53 UTC
    Hi hippo,
    Thanks for the response. I changed it and still did not get the last header and sequence
      Thats a very common problem, you are trying to add a record only when the successor is to be parsed, but the last record has no successor (sic ;)!

      Most people try to solve by repeating code to add the last record after the loop.

      But it's much cleaner this way (avoiding a posteriori state logic)

      use strict; use warnings; use Data::Dump; my $header; my %sequence; while ( my $line = <DATA> ){ chomp $line; if ( $line =~ /^>(.*)/ ) { $header = $1; } else { $sequence{$header} .= $line; } } dd \%sequence; __DATA__ >sequence_5849 CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC TCCCACTAATAATTCTGAGG >sequence_5959 CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCT ATATCCATTTGTCAGCAGACACGC >sequence_0808 CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGAC TGGGAACCTGCGGGCAGTAGGTGGAAT
      output
      { sequence_0808 => "CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGAC +CAGCCCGGACTGGGAACCTGCGGGCAGTAGGTGGAAT", sequence_5849 => "CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTC +CGGCCTTCCCTCCCACTAATAATTCTGAGG", sequence_5959 => "CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCG +CCGAAGGTCTATATCCATTTGTCAGCAGACACGC", }

      Cheers Rolf

      ( addicted to the Perl Programming Language)

      update

      a *general pattern* to solve such problems while staying DRY is to use references

      if ( $line =~ /^(HEAD_PATTERN)/ ) { $data = \ $deeply{nested}{structure}{$1}; # reference data + } else { $$data .= $line; # derefrence data }

      like this you don't need to repeat the path of a deeply nested data structure, which might vary in multiple dimensions

      update
      added some explanation

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1087661]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (13)
As of 2015-07-06 12:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (74 votes), past polls