Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Text manipulation

by viktor (Acolyte)
on Nov 09, 2012 at 14:22 UTC ( #1003134=perlquestion: print w/ replies, xml ) Need Help??
viktor has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks , I have two files like this

1st file >qppq ATATATTTATTATTA TATATATTATATTAT TA >lsl ATTATTATTATTATT AGGAGGAG 2nd file >dfj TATTATTATTTT ATAT >ghg ATATATAT
I want to have an output like this
>qppq ATATATTTATTATTA TATATATTATATTAT TA >dfj TATTATTATTTT ATAT ============ >lsl ATTATTATTATTATT AGGAGGAG >ghg ATATATAT ===========

#!/usr/bin/perl -w use strict; my $file1=$ARGV[0]; my $file2=$ARGV[1]; open(FILE1,"$file1"); open(FILE2,"$file2"); while (my $line1=<FILE1>){ my $line2=<FILE2>; if($line1=~/^>/){ print "$line1"; do{ $line1=<FILE1>; print $line1; }until $line1=~/^>/; print "$line2"; do{ $line2=<FILE2>; print $line2; }until $line2=~/^>/; } print "==========\n"; }

But this code is not working can somebody help me in this

Comment on Text manipulation
Select or Download Code
Re: Text manipulation
by zentara (Archbishop) on Nov 09, 2012 at 14:28 UTC
      Ya, here is a BioPerl solution to interleave to sequences:
      #!/usr/bin/env perl use strict; use warnings; use Bio::SeqIO; my $fasta_in_1 = $ARGV[0]; my $fasta_in_2 = $ARGV[1]; my $fasta_out = ">interleave_1_2.fa"; my $seqio_in_1 = Bio::SeqIO->new( -file => $fasta_in_1, -format => 'Fasta', ); my $seqio_in_2 = Bio::SeqIO->new( -file => $fasta_in_2, -format => 'Fasta', ); my $seqio_out = Bio::SeqIO->new( -file => $fasta_out, -format => 'Fasta', ); while ( my $seq_obj_1 = $seqio_in_1->next_seq() ) { my $seq_obj_2 = $seqio_in_2->next_seq(); $seqio_out->write_seq($seq_obj_1); $seqio_out->write_seq($seq_obj_2); } __END__ >qppq ATATATTTATTATTATATATATTATATTATTA >dfj TATTATTATTTTATAT >lsl ATTATTATTATTATTAGGAGGAG >ghg ATATATAT

      However, I'm not entirely sure of the best way to delimit the pairs with ============ using this approach.

Re: Text manipulation
by grizzley (Chaplain) on Nov 09, 2012 at 14:30 UTC

    In the contrary: it is working! Maybe not the way you expect, but you didn't tell what you expect, did you?

    Update: sorry there is yet desired output, but the node was poorly formatted and I didn't notice it.
      yeah it is working but the not according to the output file i have mentioned :(

        Hi, viktor,
        Please, reformat the output you desire, using the code tags.

        If you tell me, I'll forget.
        If you show me, I'll remember.
        if you involve me, I'll understand.
        --- Author unknown to me
Re: Text manipulation
by Anonymous Monk on Nov 09, 2012 at 14:57 UTC
    #!/usr/bin/perl -- use strict; use warnings; Main( @ARGV ); exit( 0 ); sub Main { @_ or die "\nUsage: $0 infile infile infile > outfile\n"; my @files = map { open my( $fh ), '<', $_ or die $^E; $fh; } @_; while( my @ofh = grep not eof, @files ) { print getOne( shift @ofh ) while @ofh; } } ## end sub Main sub getOne { ...; } ## perltidy -olq -csc -csci=10 -cscl="sub : BEGIN END" -otr -opr -ce +-nibc -i=4 -pt=0 "-nsak=*"
Re: Text manipulation
by Anonymous Monk on Nov 09, 2012 at 17:46 UTC
    use 5.014; use strict; use warnings; my @f1 = qw( >qppq ATATATTTATTATTA TATATATTATATTAT TA >lsl ATTATTATTATTATT AGGAGGAG ); my @f2 = qw( >dfj TATTATTATTTT ATAT >ghg ATATATAT ); { my $i = 1; sub get_lines { my $array_ref = shift; $i = shift if @_; my $old = $i - 1; while (exists $array_ref->[$i] and chr ord $array_ref->[$i] ne '>') { $i++; } return splice($array_ref) if $#{$array_ref} <= $old; return splice($array_ref, $old, $i - $old); } while (@f1 && @f2) { my $j = $i; say join("\n", get_lines(\@f1, $j), get_lines(\@f2, $j)); say "============"; } } # TZN
Re: Text manipulation
by space_monk (Chaplain) on Nov 09, 2012 at 18:47 UTC
    It helps if you give us non-biologists context as to what data the files contain. If you had said they were FASTA format files then you might have got a "fasta" response geddit? :-)

    You're essentially taking an entry from each file at a time, but you could do this by reading each file into an array, and unshifting an entry off each array as d1 and d2 and then doing:

    print <<EOF; $d1 $d2 ======== EOF
    A Monk aims to give answers to those who have none, and to learn from those who know more.
      Nice pun!

      For anyone that doesn't know, FASTA format is a flat-file way to store any number of DNA/RNA/protein sequences and each entry consists of (1) a sequence ID line that starts with a '>' character and (2) one or more lines containing the actual sequence.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1003134]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (12)
As of 2015-03-27 15:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When putting a smiley right before a closing parenthesis, do you:









    Results (612 votes), past polls