Beefy Boxes and Bandwidth Generously Provided by pair Networks kudra
No such thing as a small change
 
PerlMonks  

Re: replace FASTA sequence headers

by Athanasius (Vicar)
on Jun 10, 2012 at 17:23 UTC ( #975462=note: print w/ replies, xml ) Need Help??


in reply to replace FASTA sequence headers

Hello onlyIDleft,

There are a number of problems with your code. For example, lines such as

%head_seqs = @headers_seqs;

make no sense. You need to read up on perl data structures to understand arrays and hashes.

I think the following script does what you need:

use strict; use warnings; open(my $in1, '<', $ARGV[0]) or die "Can't read from multifasta file '$ARGV[0]': $!"; my @multifasta = <$in1>; close $in1; open(my $in2, '<', $ARGV[1]) or die "Can't read from header replacement file '$ARGV[1]': $!"; my %headers; while (<$in2>) { my ($orig, $new) = split; warn "Duplicate replacement: $orig --> $new\n" if exists $headers{ +$orig}; $headers{$orig} = $new; } close $in2; my $destination = $ARGV[0] . '_headers-replaced.fasta'; open (my $out, '>', $destination) or die "Can't write to file '$destination': $!"; foreach my $line (@multifasta) { if ($line =~ /^\>/) # it is a header { foreach my $key (keys %headers) { if ($line =~ /$key/) { $line =~ s/$key/$headers{$key}/; last; } } } print $out $line; } close $out;

HTH,

Athanasius <°(((><contra mundum


Comment on Re: replace FASTA sequence headers
Select or Download Code
Re^2: replace FASTA sequence headers
by Anonymous Monk on Jun 10, 2012 at 18:14 UTC

    Thanks a lot Athanasius

    There are so many ideas that are new to me here, from just one script of yours! Thanks :)

Re^2: replace FASTA sequence headers
by jwkrahn (Prior) on Jun 10, 2012 at 20:38 UTC
    lines such as
    %head_seqs = @headers_seqs;
    make no sense.

    It makes perfect sense.    Copying values between lists, arrays and hashes is a normal idiom in Perl.

    if ($line =~ /$key/) { $line =~ s/$key/$headers{$key}/; last; }

    Your $key values may contain regular expression meta-characters so you should use quotemeta on them.

    Your loop exits on the first match but not necessarily the correct match.    You should anchor the patterns.

      Thanks for your replies jwkrahn.

      Actually, EACH of my values in the hash of new / old headers have # character in them, some of them have other special characters such as _ etc

      I was under the impression that because my keys are all letter/number strings, that there should not be any problem with having special characters in the values, BUT not in the keys themselves

      Am I missing something here about needing "quotemeta" for the hash's values as well?

      I hope I am explaining the header pairs properly

      Here is an example below:

      cluster14621 \t Helitron#element_1

        Am I missing something here about needing "quotemeta" for the hash's values as well?

        I was suggesting quotemeta not because of the hash, but because of their use in a regular expression.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://975462]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2013-05-26 04:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best material for plates (tableware) is:









    Results (524 votes), past polls