Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^14: Combining 3 files

by garyboyd (Acolyte)
on Jun 28, 2011 at 15:48 UTC ( [id://911795]=note: print w/replies, xml ) Need Help??


in reply to Re^13: Combining 3 files
in thread Combining 3 files

Ok, now I'm more confused than ever and chasing my tail in frustration. This is where I am with the code. Any helpful suggestions are appreciated.

#!/usr/bin/perl #22/06/2011 #use strict; use warnings; use File::Slurp; use Data::Dumper; my @data; my @col; my $dataset; my @fields; #my %out; my $Hashref; my $fileCount; my @out; my @results; open INFILE, "<Primer-Rev1" or die $!; open my $outfh, '>', "outputfile.txt" or die $!; for my $nr (1..2) { for my $line (read_file('Primer-For'.$nr)) { my @col = split(/\t/,$line); push @{$data[$nr - 1]->{shift(@col)}},\@col; } } while (<INFILE>){ @col = split(/\t+/, $_); chomp (@col); my ($header, $length, $tm, $sequence) = @col[0..3]; # expecting file3 line in @col my @results = ( $col[1], $col[2] ); #print Dumper (\@results) } for my $dataset (@data) { my @out = push @{ $data[ $fileCount ]->{ shift @col } }, \@col; #print Dumper (\@data); } @out = sort { my $diff_a = $col[2] - $a->[1]; $diff_a *= -1 if $diff_a < 0; my $diff_b = $col[2] - $b->[1]; $diff_b *= -1 if $diff_b < 0; $diff_a <=> $diff_b; } @out; print Dumper (\@out); push @results, $out[0]->[2]; #print Dumper (\@results); #}

Replies are listed 'Best First'.
Re^15: Combining 3 files
by Anonymous Monk on Jun 28, 2011 at 17:16 UTC

    Ok, now I'm more confused than ever and chasing my tail in frustration.

    You still have

    my @out = push @{ $data[ $fileCount ]->{ ...
    Why?

      Thanks anonymous monk!!! I now have a breakthrough!!! I think it helped me to sleep on things as well!

      So I removed the:

      my @out = push @{ $data[ $fileCount ]->{ ...

      and now have :

      #!/usr/bin/perl #29/06/2011 #use strict; use warnings; use File::Slurp; use Data::Dumper; my @data, my @col, my @fields; my $dataset; #my %out; my $Hashref; my $fileCount; my @out; my @results; open INFILE, "<Primer-Rev1" or die $!; open my $outfh, '>', "outputfile.txt" or die $!; for my $nr (1..2) { for my $line (read_file('Primer-For'.$nr)) { my @col = split(/\t/,$line); push @{$data[$nr - 1]->{shift(@col)}},\@col; } } while (<INFILE>){ @col = split(/\t+/, $_); chomp (@col); my ($header, $length, $tm, $sequence) = @col[0..3]; # expecting file3 line in @col #} my @results = ( $col[0], $col[3] ); for my $dataset (@data) { my @beef = @{ $dataset->{ $col[0] } }; @beef = sort { my $diff_a = $col[2] - $a->[1]; $diff_a *= -1 if $diff_a < 0; my $diff_b = $col[2] - $b->[1]; $diff_b *= -1 if $diff_b < 0; $diff_a <=> $diff_b; } @beef; push @results, $beef[0]->[2]; #print Dumper (\@results); foreach (@results){ print $_."\n";} } }

      I checked the output from @results and it is almost there generating the output. There are however strange things going on.

      output looks like:

      contig03841 CCAGGTTATTTATTTCAGCGGGAACT AGTAGTTCATAATAAAGAGGAGGCTGGT contig03841 CCAGGTTATTTATTTCAGCGGGAACT AGTAGTTCATAATAAAGAGGAGGCTGGT AGTAGTTCATAATAAAGAGGAGGCTGGA contig06486 GCAAATGGCTCTAAGGATCAGCC TTTTCCTGAGCGTTTTCCTGAGC contig06486 GCAAATGGCTCTAAGGATCAGCC TTTTCCTGAGCGTTTTCCTGAGC CATTTTTCCTGAGCGTTTTCCTGAGT contig09294 GTCGGAGCTCTCTCAGAACCC GCCCCAGAAGACATCACCTTCAT contig09294 GTCGGAGCTCTCTCAGAACCC GCCCCAGAAGACATCACCTTCAT contig100253 CACTCGAGTTGCAGTTATGTTCCTC AGATGATTTGTGCATTATAATTGTAATTTGGGC contig100253 CACTCGAGTTGCAGTTATGTTCCTC AGATGATTTGTGCATTATAATTGTAATTTGGGC GAGATGATTTGTGCATTATAATTGTAATTTGGGT

      I think the gaps are where there are entries missing from the files. Is there a way to print out only those results in @results where there is data from all 3 input files? eg it will output

      contig100253 CACTCGAGTTGCAGTTATGTTCCTC AGATGATTTGTGCATTATAATTGTAATTTGGGC GAGATGATTTGTGCATTATAATTGTAATTTGGGT

      rather than......

      contig100253 CACTCGAGTTGCAGTTATGTTCCTC AGATGATTTGTGCATTATAATTGTAATTTGGGC contig100253 CACTCGAGTTGCAGTTATGTTCCTC AGATGATTTGTGCATTATAATTGTAATTTGGGC GAGATGATTTGTGCATTATAATTGTAATTTGGGT

      I also want to output the data on one line tab-delimited, so for example

      contig100253 AGATGATTTGTGCATTATAATTGTAATTTGGGC GAGATGATTTGTGCATTATAATTGTAATTTGGGT CACTCGAGTTGCAGTTATGTTCCTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://911795]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2024-04-20 10:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found