Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

add letters to the begin of the sentence

by intect (Initiate)
on Sep 12, 2013 at 16:24 UTC ( #1053758=perlquestion: print w/replies, xml ) Need Help??
intect has asked for the wisdom of the Perl Monks concerning the following question:

Hi my fasta file looks like:





There is no space between lines. I want to add 'aaaa' to the beginning of the sequences (i.e., abcdef and bcdef in the example). I used the following script but it gives me a connected long sequence. Can anyone help me identify the problem? Thanks XF

use warnings; use strict; my $read_mid1 = 'Reads.fna'; my $read_mid1_correct = 'Reads_correct.fa'; open ( my $input_fh, "<", $read_mid1 ); open ( my $output_fh, ">", $read_mid1_correct); while (my $line = <$input_fh> ) { unless ($line =~ /^>/) { $line =~ s/^/ACGAGTGCGT/; print $output_fh $line; } } close ( $input_fh ); close ( $output_fh );

Replies are listed 'Best First'.
Re: add letters to the begin of the sentence
by choroba (Chancellor) on Sep 12, 2013 at 16:31 UTC
    Move the print after the unless block. You should only substitute for non-headers, but the unchanged lines should be printed as well.
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: add letters to the begin of the sentence
by Kenosis (Priest) on Sep 12, 2013 at 17:56 UTC

    Since you're working with fasta files, consider becoming familiar with Bio::SeqIO--a module that's a powerful tool for working with fasta and other such formats.

    For example, to achieve your desired results using the module, you can do the following:

    use strict; use warnings; use Bio::SeqIO; my $in = Bio::SeqIO->new( -file => 'file1.fasta', -format => 'Fasta' + ); my $out = Bio::SeqIO->new( -file => '>file2.fasta', -format => 'Fasta' + ); while ( my $seq = $in->next_seq() ) { my $newSeq = Bio::Seq->new( -display_id => $seq->id, -seq => 'aaaa' . $seq->seq ); $out->write_seq($newSeq); }

    Output results:

    >dddd aaaaabcdef >eeee aaaabcdef

    Notice that the module permits you to access sequences and ids as objects, using the "->" notation. For example, in the above, we prepend "aaaa" to the original sequence by doing the following:

    'aaaa' . $seq->seq

    Then, create a new object with the modified sequence that's written to a new file.

    The module offers many more options for working with fasta (and other similarly formatted) files.

    Hope this helps!

Re: add letters to the begin of the sentence
by kcott (Chancellor) on Sep 12, 2013 at 21:46 UTC

    G'day intect,

    This technique should do what you want:

    $ perl -Mstrict -Mwarnings -e ' my @fasta = (">dddd\n", "abcdef\n", ">eeee\n", "bcdef\n"); for (@fasta) { print "aaaa" unless /^>/; print; } ' >dddd aaaaabcdef >eeee aaaabcdef

    Using your filehandles, that would be:

    while (<$input_fh>) { print $output_fh 'aaaa' unless /^>/; print $output_fh $_; }

    If the ACGAGTGCGT in your code is supposed to be the aaaa in your description, then just make the appropriate substitution. If it isn't, a clarification would be useful.

    -- Ken

      Thank you, Ken. It works very well. Would you help me identify the problem in my script. I moved the print after unless block but it still does not work.
        "Thank you, Ken. It works very well."

        You're welcome.

        "Would you help me identify the problem in my script."

        Happy to.

        "I moved the print after unless block but it still does not work."

        Unfortunately, that doesn't help me to help you. What exactly did you move? Where precisely did you move it to? What does "it still does not work" mean?

        You need to provide code along with expected results, actual results, and any error or warnings messages. This is all described in more detail in the "How do I post a question effectively?" guidelines.

        Here's what choroba was referring to:

        $ perl -Mstrict -Mwarnings -e ' my @fasta = (">dddd\n", "abcdef\n", ">eeee\n", "bcdef\n"); for (@fasta) { unless (/^>/) { s/^/ACGAGTGCGT/; } print; } ' >dddd ACGAGTGCGTabcdef >eeee ACGAGTGCGTbcdef

        That may give you your answer. If not, post the requested information so that we can provide genuine help as opposed to guesswork.

        -- Ken

Re: add letters to the begin of the sentence
by Laurent_R (Abbot) on Sep 12, 2013 at 17:12 UTC

    Hopefully choroba's post gave you the solution, but, if not or if you need more, please explain what a a connected long sequence is. Or give an example.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1053758]
Front-paged by Arunbear
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (13)
As of 2017-04-25 14:05 GMT
Find Nodes?
    Voting Booth?
    I'm a fool:

    Results (454 votes). Check out past polls.