Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Text manipulation

by viktor (Acolyte)
on Nov 09, 2012 at 14:22 UTC ( #1003134=perlquestion: print w/ replies, xml ) Need Help??
viktor has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks , I have two files like this

1st file >qppq ATATATTTATTATTA TATATATTATATTAT TA >lsl ATTATTATTATTATT AGGAGGAG 2nd file >dfj TATTATTATTTT ATAT >ghg ATATATAT
I want to have an output like this
>qppq ATATATTTATTATTA TATATATTATATTAT TA >dfj TATTATTATTTT ATAT ============ >lsl ATTATTATTATTATT AGGAGGAG >ghg ATATATAT ===========

#!/usr/bin/perl -w use strict; my $file1=$ARGV[0]; my $file2=$ARGV[1]; open(FILE1,"$file1"); open(FILE2,"$file2"); while (my $line1=<FILE1>){ my $line2=<FILE2>; if($line1=~/^>/){ print "$line1"; do{ $line1=<FILE1>; print $line1; }until $line1=~/^>/; print "$line2"; do{ $line2=<FILE2>; print $line2; }until $line2=~/^>/; } print "==========\n"; }

But this code is not working can somebody help me in this

Comment on Text manipulation
Select or Download Code
Re: Text manipulation
by zentara (Archbishop) on Nov 09, 2012 at 14:28 UTC
      Ya, here is a BioPerl solution to interleave to sequences:
      #!/usr/bin/env perl use strict; use warnings; use Bio::SeqIO; my $fasta_in_1 = $ARGV[0]; my $fasta_in_2 = $ARGV[1]; my $fasta_out = ">interleave_1_2.fa"; my $seqio_in_1 = Bio::SeqIO->new( -file => $fasta_in_1, -format => 'Fasta', ); my $seqio_in_2 = Bio::SeqIO->new( -file => $fasta_in_2, -format => 'Fasta', ); my $seqio_out = Bio::SeqIO->new( -file => $fasta_out, -format => 'Fasta', ); while ( my $seq_obj_1 = $seqio_in_1->next_seq() ) { my $seq_obj_2 = $seqio_in_2->next_seq(); $seqio_out->write_seq($seq_obj_1); $seqio_out->write_seq($seq_obj_2); } __END__ >qppq ATATATTTATTATTATATATATTATATTATTA >dfj TATTATTATTTTATAT >lsl ATTATTATTATTATTAGGAGGAG >ghg ATATATAT

      However, I'm not entirely sure of the best way to delimit the pairs with ============ using this approach.

Re: Text manipulation
by grizzley (Chaplain) on Nov 09, 2012 at 14:30 UTC

    In the contrary: it is working! Maybe not the way you expect, but you didn't tell what you expect, did you?

    Update: sorry there is yet desired output, but the node was poorly formatted and I didn't notice it.
      yeah it is working but the not according to the output file i have mentioned :(

        Hi, viktor,
        Please, reformat the output you desire, using the code tags.

        If you tell me, I'll forget.
        If you show me, I'll remember.
        if you involve me, I'll understand.
        --- Author unknown to me
Re: Text manipulation
by Anonymous Monk on Nov 09, 2012 at 14:57 UTC
    #!/usr/bin/perl -- use strict; use warnings; Main( @ARGV ); exit( 0 ); sub Main { @_ or die "\nUsage: $0 infile infile infile > outfile\n"; my @files = map { open my( $fh ), '<', $_ or die $^E; $fh; } @_; while( my @ofh = grep not eof, @files ) { print getOne( shift @ofh ) while @ofh; } } ## end sub Main sub getOne { ...; } ## perltidy -olq -csc -csci=10 -cscl="sub : BEGIN END" -otr -opr -ce +-nibc -i=4 -pt=0 "-nsak=*"
Re: Text manipulation
by Anonymous Monk on Nov 09, 2012 at 17:46 UTC
    use 5.014; use strict; use warnings; my @f1 = qw( >qppq ATATATTTATTATTA TATATATTATATTAT TA >lsl ATTATTATTATTATT AGGAGGAG ); my @f2 = qw( >dfj TATTATTATTTT ATAT >ghg ATATATAT ); { my $i = 1; sub get_lines { my $array_ref = shift; $i = shift if @_; my $old = $i - 1; while (exists $array_ref->[$i] and chr ord $array_ref->[$i] ne '>') { $i++; } return splice($array_ref) if $#{$array_ref} <= $old; return splice($array_ref, $old, $i - $old); } while (@f1 && @f2) { my $j = $i; say join("\n", get_lines(\@f1, $j), get_lines(\@f2, $j)); say "============"; } } # TZN
Re: Text manipulation
by space_monk (Chaplain) on Nov 09, 2012 at 18:47 UTC
    It helps if you give us non-biologists context as to what data the files contain. If you had said they were FASTA format files then you might have got a "fasta" response geddit? :-)

    You're essentially taking an entry from each file at a time, but you could do this by reading each file into an array, and unshifting an entry off each array as d1 and d2 and then doing:

    print <<EOF; $d1 $d2 ======== EOF
    A Monk aims to give answers to those who have none, and to learn from those who know more.
      Nice pun!

      For anyone that doesn't know, FASTA format is a flat-file way to store any number of DNA/RNA/protein sequences and each entry consists of (1) a sequence ID line that starts with a '>' character and (2) one or more lines containing the actual sequence.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1003134]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (2)
As of 2014-09-24 02:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (244 votes), past polls