Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

regex record separator is certainly possible

by inq123 (Sexton)
on Mar 20, 2005 at 01:33 UTC ( #440966=note: print w/replies, xml ) Need Help??


in reply to Input record separator

but certainly not recommended. :) Using File::Stream one could set $/ to regex, but this approach suffers from several caveats and I wouldn't recommend it.

Aside from this, the suggestions above are quite good and covered most what I would suggest. But just to add a bit value to the discussion, purely IMHO, the best approach (as already suggested) might be to use bioperl, 'cause who doesn't want to have somebody else taking care of any format change and deal with potential problems therein? :)

Another thing is that set $/ = "\n>" is a correct approach, but ">" is not since FASTA format does not demands that seq description not have '>' in it. I would also certainly set performance as the highest priority in dealing with FASTA format (if I choose not to use Bioperl for some reason) thus code like the following would be an OK alternative to using bioperl:

$/ = "\n>"; while (<DATA>) { chomp; my $seq = /^>/ ? "$_\n" : ">$_\n"; print "seq is:\n$seq"; } __DATA__ >Record 1 AGTCTAGTCAT CATCATAAGAT CATCAATCACA >Record 2 ATGAACAGCAG ATGAAGAATGG ATAG >Record 3 AGTCTAGTCAT CATCATAAGAT CATCAATCACA >Record 4 ATGAACAGCAG ATGAAGAATGG ATAG
Now the above solution is all good, until you consider using it on a FASTA file generated on Mac. So maybe we need File::Stream after all? But it is not solution for huge files. So maybe we just use Bioperl and hope they had dealt with this issue?

Or maybe I'm just making this simple issue sounding more and more complicated? Now that's my only gift. :)

Still, hope it helps. ;)

Updated

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://440966]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (4)
As of 2020-02-23 18:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What numbers are you going to focus on primarily in 2020?










    Results (103 votes). Check out past polls.

    Notices?