Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Matching strings from three files (three hashes)

by FluffyBunny (Acolyte)
on Aug 16, 2010 at 20:00 UTC ( #855348=perlquestion: print w/replies, xml ) Need Help??

FluffyBunny has asked for the wisdom of the Perl Monks concerning the following question:

UPDATE:

Hello perl monks, So I noticed I need to study more about data structure, yet I got stuck at complex structure (foreach / nested if)problem.. I've been debugging with senior perl monks' suggestions.. I really appreciate that..

This is what I am trying to do:

1) input#1 (bwa) file gets read and important string data gets stored including Amplicon sequence ID (input#2 ID part) and Input sequence ID (input#3 ID part).

This means that I need to extract these IDs to match them later.

2) using input#1 key, match Input sequence ID (input#3 ID) from input#3 file.

3) again use input#1 hash's 2nd element (stored Reference Amplicon ID) match Reference Amplicon ID (input#2 ID) from input#2 file.

4)so the sequences (2nd element) from input#2 and input#3 can be printed out.

The main problem is that the key for Amp hash is wrong because it does not have the same ID as bwa hash and Input hash. I am not sure how to solve this problem. I cannot match only input#2 and input#3 because their ID are different. bwa (input#1)'s first element and Input(input#3)'s first element have input sequence IDs, but bwa 2nd element 1 and Amp (input#2)'s 1st element have the same IDs.

Basically I am trying to extract the two different IDs from first file, find one ID from 2nd file, find the other ID from 3rd file, then match their corresponding sequences.

So far, I figured out that foreach loop at least works, but not nested if's. because of Amp hash. It's supposed to print out all the 'print' that I added.. but what I get for my output is

'Use of uninitialized value in string eq at BSanalyzer.pl line 127' error message (something is wrong with my foreach loop's if ($bwa{$ID}1 eq $Amp{$ID}[0]) line) and

"1233out1233out1233out1233out1233out1233out1233out" which means that it does not go into my 3rd if... Why is this happening?

#!/usr/bin/perl -w use warnings; use strict; # BWA alignment output (.sam) my %bwa = (); my $file1 = shift; open (FILE1, "$file1") || die "Failed to open $file1 for reading : $!" +; # Open second file while (<FILE1>) { # Reading second hash if ($_ =~ /^[^@]/s) { chomp; my @line = split /\s+/, $_; my $ID; if ($line[2] =~ /[^*]/) { $ID = $line[0]; $bwa{$ID}[0] = $line[0]; # seq ID $bwa{$ID}[1] = $line[2]; # Ref ID $bwa{$ID}[2] = $line[5]; # CIGAR ID for insertion #$bwa{$ID}[3] = @line[9]; # Processed seq (already C->T) $bwa{$ID}[3] = $line[12]; # Edit distance (edited area by # of bas +e) : NM $bwa{$ID}[4] = $line[15]; # No. of mismatches in the alignment : X +M $bwa{$ID}[5] = $line[16]; # No. of gap opens for insertion : XO $bwa{$ID}[6] = $line[17]; # No. of gap extensions for deletion :XG $bwa{$ID}[7] = $line[18]; # Mismatching positions / bases : MD } } } close FILE1 || die "Failed to close $file1 : $!"; # ORIGINAL Reference Amplicon File (.fa) my %Amp = (); my $file2 = shift; open (FILE2, "$file2")|| die "Failed to open $file2 for reading : $!"; + # Open first file local $/= ">"; my $first=<FILE2>; while (<FILE2>) { # Reading first hash chomp; my ($ID, $Seq) = split("\n"); $Amp{$ID}[0] = $ID; $Amp{$ID}[1] = $Seq; } close FILE2 || die "Failed to close $file2 : $!"; # ORIGINAL Input FASTQ Sequencing File (.fq) my %Input = (); my $file3 = shift; open (FILE3, "$file3")|| die "Failed to open $file3 for reading : $!"; + # Open first file local $/= "@"; $first=<FILE3>; while (<FILE3>) { # Reading first hash chomp; my ($ID, $Seq,undef,undef) = split("\n"); $Input{$ID}[0] = $ID; $Input{$ID}[1] = $Seq; } close FILE3 || die "Failed to close $file3 : $!"; foreach my $ID (keys %bwa) { print "1"; if (exists $Input{$ID}[0] ){ print "2"; if ($bwa{$ID}[0] eq $Input{$ID}[0]){ print "3"; if ($bwa{$ID}[1] eq $Amp{$ID}[0]){ print "4"; if ($bwa{$ID}[3] eq "NM:i:0" && $bwa{$ID}[4] eq "XM:i:0" && $b +wa{$ID}[5] eq "XO:i:0" && $bwa{$ID}[6] eq "XG:1:0") { print "$Amp{$ID}[1]\n$Input{$ID}[1]"; } else {print "4out";} } else {print "3out";} } else {print "2out";} } else {print "1out";} } exit;

Also here are three input files: input#1:

@SQ SN:TMEM200B LN:293 @SQ SN:B3GAT2-2_P001 LN:204 Seq1Perfect 0 B3GAT2-2_P001 1 37 204M * 0 0 + GGTTGGTTTTTATTTTTTGGAAGAGTTTTAGATTATAGGTGTTGTCGTCGTTAGCGAAGAAGAGTACG +TCGGGTTGCGCGCGTTGGTGTTGGTGTTTTTGGCGTAGTTAGGCGAGGTTCGCGTTGCGTTGTTTAGTG +GCGCGCGGTAGTTCGGGTCGTTTGTAGCGTCGCGGCGTGGGTACGTGTAGGTGAGTGTTGGGTAGTT + &a==aa=====a======aaaaaaa====aaa==a=aaa=a==a=$a=$a==aa$aaaaaaaaa=a$ +a=$aaa==a$a$a$a==aa=a==aa=a=====aa$a=aa==aaa$aaaa==$a$a==a$a==a===aa= +aa$a$a$aa=aa==$aaa=$a===a=aa$a=$a$aa$a=aaa=a$a=a=aaa=aaa=a==aaa=aa== + XT:A:U NM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 + MD:Z:204 Seq2MM 0 B3GAT2-2_P001 1 37 204M * 0 0 GGTT +AATTTTTATTTTTTGGAAGAGTTTTAGATTATAGGTGTTGTCGTCGTTAGCGAAGAAGAGTACGTCGGG +TTGCGCGCGTTGGTGTTGGTGTTTTTGGCGTAGTTAGGCGAGGTTCGCGTTGCGTTGTTTAGTGGCGCG +CGGTAGTTCGGGTCGTTTGTAGCGTCGCGGCGTGGGTACGTGTAGGTGAGTGTTGGGTAGTT &a= +=aa=====a======aaaaaaa====aaa==a=aaa=a==a=$a=$a==aa$aaaaaaaaa=a$a=$aa +a==a$a$a$a==aa=a==aa=a=====aa$a=aa==aaa$aaaa==$a$a==a$a==a===aa=aa$a$ +a$aa=aa==$aaa=$a===a=aa$a=$a$aa$a=aaa=a$a=a=aaa=aaa=a==aaa=aa== XT +:A:U NM:i:2 X0:i:1 X1:i:0 XM:i:2 XO:i:0 XG:i:0 M +D:Z:4G0G198 Seq3In 0 B3GAT2-2_P001 1 37 12M1I192M * 0 0 + GGTTGGTTTTTAGTTTTTTGGAAGAGTTTTAGATTATAGGTGTTGTCGTCGTTAGCGAAGAAGAGTAC +GTCGGGTTGCGCGCGTTGGTGTTGGTGTTTTTGGCGTAGTTAGGCGAGGTTCGCGTTGCGTTGTTTAGT +GGCGCGCGGTAGTTCGGGTCGTTTGTAGCGTCGCGGCGTGGGTACGTGTAGGTGAGTGTTGGGTAGTT + &a==aa===a==a======aaaaaaa====aaa==a=aaa=a==a=$a=$a==aa$aaaaaaaaa= +a$a=$aaa==a$a$a$a==aa=a==aa=a=====aa$a=aa==aaa$aaaa==$a$a==a$a==a===a +a=aa$a$a$aa=aa==$aaa=$a===a=aa$a=$a$aa$a=aaa=a$a=a=aaa=aaa=a==aaa=aa= += XT:A:U NM:i:1 X0:i:1 X1:i:0 XM:i:0 XO:i:1 XG:i +:1 MD:Z:204 Seq4Del 0 B3GAT2-2_P001 1 37 55M6D143M * 0 0 + GGTTGGTTTTTATTTTTTGGAAGAGTTTTAGATTATAGGTGTTGTCGTCGTTAGCGAGTACGTCGGG +TTGCGCGCGTTGGTGTTGGTGTTTTTGGCGTAGTTAGGCGAGGTTCGCGTTGCGTTGTTTAGTGGCGCG +CGGTAGTTCGGGTCGTTTGTAGCGTCGCGGCGTGGGTACGTGTAGGTGAGTGTTGGGTAGTT &a= +=aa=====a======aaaaaaa====aaa==a=aaa=a==a=$a=$a==aa$aaaaaaaaa=a$a=$aa +a==a$a$a$a==aa=a==aa=a=====aa$a=aa==aaa$aaaa==$a$a==a$a==a===aa=aa$a$ +a$aa=aa==$aaa=$a===a=aa$a=$a$aa$a=aaa=a$a=a=aaa=aaa=a==aa XT:A:U + NM:i:6 X0:i:1 X1:i:0 XM:i:0 XO:i:1 XG:i:6 MD:Z:55 +^GAAGAA143 Seq5Partial 0 B3GAT2-2_P001 1 37 204M * 0 0 + GGTTGGTTTTTATTTTTTGGAAGAGTTTTAGATTATAGGTGTTGTTGTTGTTAGCGAAGAAGAGTACG +TCGGGTTGCGCGCGTTGGTGTTGGTGTTTTTGGCGTAGTTAGGCGAGGTTCGCGTTGCGTTGTTTAGTG +GCGCGCGGTAGTTCGGGTCGTTTGTAGCGTCGCGGCGTGGGTACGTGTAGGTGAGTGTTGGGTAGTT + &a==aa=====a======aaaaaaa====aaa==a=aaa=a==a=$a=$a==aa$aaaaaaaaa=a$ +a=$aaa==a$a$a$a==aa=a==aa=a=====aa$a=aa==aaa$aaaa==$a$a==a$a==a===aa= +aa$a$a$aa=aa==$aaa=$a===a=aa$a=$a$aa$a=aaa=a$a=a=aaa=aaa=a==aaa=aa== + XT:A:U NM:i:2 X0:i:1 X1:i:0 XM:i:2 XO:i:0 XG:i:0 + MD:Z:45C2C155 Seq6TruncB 0 B3GAT2-2_P001 1 37 189M * 0 0 +GGTTGGTTTTTATTTTTTGGAAGAGTTTTAGATTATAGGTGTTGTCGTCGTTAGCGAAGAAGAGTACGT +CGGGTTGCGCGCGTTGGTGTTGGTGTTTTTGGCGTAGTTAGGCGAGGTTCGCGTTGCGTTGTTTAGTGG +CGCGCGGTAGTTCGGGTCGTTTGTAGCGTCGCGGCGTGGGTACGTGTAGGT &a==aa=====a== +====aaaaaaa====aaa==a=aaa=a==a=$a=$a==aa$aaaaaaaaa=a$a=$aaa==a$a$a$a= +=aa=a==aa=a=====aa$a=aa==aaa$aaaa==$a$a==a$a==a===aa=aa$a$a$aa=aa==$a +aa=$a===a=aa$a=$a$aa$a=aaa=a$a=a=aaa= XT:A:U NM:i:0 X0:i:1 + X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:189 Seq7TruncF 0 B3GAT2-2_P001 16 37 189M * 0 0 + TTTGGAAGAGTTTTAGATTATAGGTGTTGTCGTCGTTAGCGAAGAAGAGTACGTCGGGTTGCGCGCGT +TGGTGTTGGTGTTTTTGGCGTAGTTAGGCGAGGTTCGCGTTGCGTTGTTTAGTGGCGCGCGGTAGTTCG +GGTCGTTTGTAGCGTCGCGGCGTGGGTACGTGTAGGTGAGTGTTGGGTAGTT ===aaaaaaa=== +=aaa==a=aaa=a==a=$a=$a==aa$aaaaaaaaa=a$a=$aaa==a$a$a$a==aa=a==aa=a=== +==aa$a=aa==aaa$aaaa==$a$a==a$a==a===aa=aa$a$a$aa=aa==$aaa=$a===a=aa$a +=$a$aa$a=aaa=a$a=a=aaa=aaa=a==aaa=aa== XT:A:U NM:i:0 X0:i:1 + X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:189 Seq8Incomplete 4 * 0 0 * * 0 0 GGTTGGTTCTTA +TTCCTTGGAAGAGTTTTAGATTATAGGTGTTGTCGTCGTTAGCGAAGAAGAGTACGTCGGGTTGCGCGC +GTTGGTGTTGGTGTTTTTGGCGTAGTTAGGCGAGGTTCGCGTTGCGTTGTTTAGTGGCGCGCGGTAGTT +CGGGTCGTTTGTAGCGTCGCGGCGTGGGTACGTGTAGGTGAGTGTTGGGTAGTT &a==aa===== +a======aaaaaaa====aaa==a=aaa=a==a=$a=$a==aa$aaaaaaaaa=a$a=$aaa==a$a$a +$a==aa=a==aa=a=====aa$a=aa==aaa$aaaa==$a$a==a$a==a===aa=aa$a$a$aa=aa= +=$aaa=$a===a=aa$a=$a$aa$a=aaa=a$a=a=aaa=aaa=a==aaa=aa==

input#2:

>TMEM200B CTCCTCTGCCTGGCTGGTCTTGATCCGAGCGGTCTTCCCGGTGTCTAGCTCAAGTCGCTCCTGCTGCAGC +TTCGCTGCGGGCGGAGGAGGTCTGGAAGGAGGGGGCGGGCAGGGAGAGGCTGGAGCCGGTGACGCCCCC +TCCTCCCGCGCTGCGGTATGTAAAGCACAGTAGGGGGGAGGTGGGGCCCGGCGAGCGACCCCTGCGGAC +CTGGGAGGCCCGAGCGCCCCCGCCCCATTTGCTACGGTGCAGCCACGTGCGGGGGTGGGGTCGAGCCCG +GGAGGTACTTACCCTGGAGA >B3GAT2-2_P001 GGCTGGCCTTTACCTCCTGGAAGAGCTCCAGACTATAGGTGTTGTCGTCGTCAGCGAAGAAGAGCACGCC +GGGCTGCGCGCGCTGGTGCTGGTGCCTCTGGCGCAGCCAGGCGAGGCCCGCGTTGCGCTGCTCAGTGGC +GCGCGGCAGCCCGGGCCGCTTGTAGCGCCGCGGCGTGGGCACGTGCAGGTGAGTGCTGGGCAGCC

input#3:

@Seq1Perfect GGTTGGTTTTTATTTTTTGGAAGAGTTTTAGATTATAGGTGTTGTCGTCGTTAGCGAAGAAGAGTACGTC +GGGTTGCGCGCGTTGGTGTTGGTGTTTTTGGCGTAGTTAGGCGAGGTTCGCGTTGCGTTGTTTAGTGGC +GCGCGGTAGTTCGGGTCGTTTGTAGCGTCGCGGCGTGGGTACGTGTAGGTGAGTGTTGGGTAGTT +Seq1Perfect &a==aa=====a======aaaaaaa====aaa==a=aaa=a==a=$a=$a==aa$aaaaaaaaa=a$a=$ +aaa==a$a$a$a==aa=a==aa=a=====aa$a=aa==aaa$aaaa==$a$a==a$a==a===aa=aa$ +a$a$aa=aa==$aaa=$a===a=aa$a=$a$aa$a=aaa=a$a=a=aaa=aaa=a==aaa=aa== @Seq2MM GGTTAATTTTTATTTTTTGGAAGAGTTTTAGATTATAGGTGTTGTCGTCGTTAGCGAAGAAGAGTACGTC +GGGTTGCGCGCGTTGGTGTTGGTGTTTTTGGCGTAGTTAGGCGAGGTTCGCGTTGCGTTGTTTAGTGGC +GCGCGGTAGTTCGGGTCGTTTGTAGCGTCGCGGCGTGGGTACGTGTAGGTGAGTGTTGGGTAGTT +Seq2MM &a==aa=====a======aaaaaaa====aaa==a=aaa=a==a=$a=$a==aa$aaaaaaaaa=a$a=$ +aaa==a$a$a$a==aa=a==aa=a=====aa$a=aa==aaa$aaaa==$a$a==a$a==a===aa=aa$ +a$a$aa=aa==$aaa=$a===a=aa$a=$a$aa$a=aaa=a$a=a=aaa=aaa=a==aaa=aa== @Seq3In GGTTGGTTTTTAGTTTTTTGGAAGAGTTTTAGATTATAGGTGTTGTCGTCGTTAGCGAAGAAGAGTACGT +CGGGTTGCGCGCGTTGGTGTTGGTGTTTTTGGCGTAGTTAGGCGAGGTTCGCGTTGCGTTGTTTAGTGG +CGCGCGGTAGTTCGGGTCGTTTGTAGCGTCGCGGCGTGGGTACGTGTAGGTGAGTGTTGGGTAGTT +Seq3In &a==aa===a==a======aaaaaaa====aaa==a=aaa=a==a=$a=$a==aa$aaaaaaaaa=a$a= +$aaa==a$a$a$a==aa=a==aa=a=====aa$a=aa==aaa$aaaa==$a$a==a$a==a===aa=aa +$a$a$aa=aa==$aaa=$a===a=aa$a=$a$aa$a=aaa=a$a=a=aaa=aaa=a==aaa=aa== @Seq4Del GGTTGGTTTTTATTTTTTGGAAGAGTTTTAGATTATAGGTGTTGTCGTCGTTAGCGAGTACGTCGGGTTG +CGCGCGTTGGTGTTGGTGTTTTTGGCGTAGTTAGGCGAGGTTCGCGTTGCGTTGTTTAGTGGCGCGCGG +TAGTTCGGGTCGTTTGTAGCGTCGCGGCGTGGGTACGTGTAGGTGAGTGTTGGGTAGTT +Seq4Del &a==aa=====a======aaaaaaa====aaa==a=aaa=a==a=$a=$a==aa$aaaaaaaaa=a$a=$ +aaa==a$a$a$a==aa=a==aa=a=====aa$a=aa==aaa$aaaa==$a$a==a$a==a===aa=aa$ +a$a$aa=aa==$aaa=$a===a=aa$a=$a$aa$a=aaa=a$a=a=aaa=aaa=a==aa @Seq5Partial GGTTGGTTTTTATTTTTTGGAAGAGTTTTAGATTATAGGTGTTGTTGTTGTTAGCGAAGAAGAGTACGTC +GGGTTGCGCGCGTTGGTGTTGGTGTTTTTGGCGTAGTTAGGCGAGGTTCGCGTTGCGTTGTTTAGTGGC +GCGCGGTAGTTCGGGTCGTTTGTAGCGTCGCGGCGTGGGTACGTGTAGGTGAGTGTTGGGTAGTT +Seq5Partial &a==aa=====a======aaaaaaa====aaa==a=aaa=a==a=$a=$a==aa$aaaaaaaaa=a$a=$ +aaa==a$a$a$a==aa=a==aa=a=====aa$a=aa==aaa$aaaa==$a$a==a$a==a===aa=aa$ +a$a$aa=aa==$aaa=$a===a=aa$a=$a$aa$a=aaa=a$a=a=aaa=aaa=a==aaa=aa== @Seq6TruncB GGTTGGTTTTTATTTTTTGGAAGAGTTTTAGATTATAGGTGTTGTCGTCGTTAGCGAAGAAGAGTACGTC +GGGTTGCGCGCGTTGGTGTTGGTGTTTTTGGCGTAGTTAGGCGAGGTTCGCGTTGCGTTGTTTAGTGGC +GCGCGGTAGTTCGGGTCGTTTGTAGCGTCGCGGCGTGGGTACGTGTAGGT +Seq6TruncB &a==aa=====a======aaaaaaa====aaa==a=aaa=a==a=$a=$a==aa$aaaaaaaaa=a$a=$ +aaa==a$a$a$a==aa=a==aa=a=====aa$a=aa==aaa$aaaa==$a$a==a$a==a===aa=aa$ +a$a$aa=aa==$aaa=$a===a=aa$a=$a$aa$a=aaa=a$a=a=aaa= @Seq7TruncF TTTGGAAGAGTTTTAGATTATAGGTGTTGTCGTCGTTAGCGAAGAAGAGTACGTCGGGTTGCGCGCGTTG +GTGTTGGTGTTTTTGGCGTAGTTAGGCGAGGTTCGCGTTGCGTTGTTTAGTGGCGCGCGGTAGTTCGGG +TCGTTTGTAGCGTCGCGGCGTGGGTACGTGTAGGTGAGTGTTGGGTAGTT +Seq7TruncF ===aaaaaaa====aaa==a=aaa=a==a=$a=$a==aa$aaaaaaaaa=a$a=$aaa==a$a$a$a==a +a=a==aa=a=====aa$a=aa==aaa$aaaa==$a$a==a$a==a===aa=aa$a$a$aa=aa==$aaa +=$a===a=aa$a=$a$aa$a=aaa=a$a=a=aaa=aaa=a==aaa=aa== @Seq8Incomplete GGTTGGTTCTTATTCCTTGGAAGAGTTTTAGATTATAGGTGTTGTCGTCGTTAGCGAAGAAGAGTACGTC +GGGTTGCGCGCGTTGGTGTTGGTGTTTTTGGCGTAGTTAGGCGAGGTTCGCGTTGCGTTGTTTAGTGGC +GCGCGGTAGTTCGGGTCGTTTGTAGCGTCGCGGCGTGGGTACGTGTAGGTGAGTGTTGGGTAGTT +Seq8Incomplete &a==aa=====a======aaaaaaa====aaa==a=aaa=a==a=$a=$a==aa$aaaaaaaaa=a$a=$ +aaa==a$a$a$a==aa=a==aa=a=====aa$a=aa==aaa$aaaa==$a$a==a$a==a===aa=aa$ +a$a$aa=aa==$aaa=$a===a=aa$a=$a$aa$a=aaa=a$a=a=aaa=aaa=a==aaa=aa==

Replies are listed 'Best First'.
Re: Matching strings from three files (three hashes)
by toolic (Bishop) on Aug 16, 2010 at 20:27 UTC
    I'm having a hard time following your code and question, but a simple thing for you to try quickly is Data::Dumper:
    use Data::Dumper; print 'bwa ', Dumper(\%bwa); print 'Input ', Dumper(\%Input); print 'Amp ', Dumper(\%Amp);

    See also Basic debugging checklist

      Hello toolic, Thank you for that useful code. I found out that my bwa and Input hashes are okay because they share the same key (ID), but hash Amp does not share the same key! so it makes weird data structure like this:

      Amp $VAR1 = { 'Seq1Perfect' => [], 'Seq6TruncB' => [], 'Seq7TruncF' => [], 'Seq3In' => [], 'TMEM200B' => [ 'TMEM200B', 'TTTTTTTGTTTGGTTGGTTTTGATTTGAGTGGTTTTTTTGGTG +TTTAGTTTAA GTTGTTTTTGTTGTAGTTTTGTTGTGGGTG +GAGGAGGTTTGGAAGGAGGGGGTGGGTAGGGAGAGGTTGGAGTTGGTGAT + GTTTTTTTTTTTTGTGTTGTGGTATGTAAAGTATAGTAGGGGGGAGGTGGGGTTTGGTG +AGTGATTTTTGTGGATTTGGG AGGTTTGAGTGTTTTTGTT +TTATTTGTTATGGTGTAGTTATGTGTGGGGGTGGGGTTGAGTTTGGGAGGTATTTATTTTG + GAGA' ], 'Seq4Del' => [], 'Seq2MM' => [], 'B3GAT2-2_P001' => [ 'B3GAT2-2_P001', 'GGTTGGTTTTTATTTTTTGGAAGAGTTTTAGATTATAG +GTGTTGTTGT TGTTAGTGAAGAAGAGTATGTTGGGTTGTG +TGTGTTGGTGTTGGTGTTTTTGGTGTAGTTAGGTGAGGTTTGTGTTGTGT + TGTTTAGTGGTGTGTGGTAGTTTGGGTTGTTTGTAGTGTTGTGGTGTGGGTATGTGTAG +GTGAGTGTTGGGTAGTT' ], 'Seq5Partial' => [] };
      I will explain again... basically bwa is an alignment program and i already ran bwa so the bwa input file (input file #1 - .sam file) contain both Reference Amplicon ID and Input sequence ID.. Hash 'Amp' contains Reference Amplicon ID and Hash 'Input' contains Input sequence ID.. so a sequence from bwa file will have Reference Amplicon ID AND Input sequence ID both. So you use these two IDs to pick a sequence in Input file #2 (Reference Amplicon ID) and a sequence in Input file #3 (Input sequence ID) and trying to print them so I can compare them later (codes for this comparison is not added yet) Should I create nested foreach to use two different keys? or is there easier way?
Re: Matching strings from three files (three hashes)
by roboticus (Chancellor) on Aug 17, 2010 at 00:21 UTC

    FluffyBunny:

    Unless you have an odd distro of perl, you do have a debugger. Just start your script as:

    perl -d your_script.pl arguments

    Give it a try, and you'll be able to figure out most problems pretty easily by looking at the data as you go through your program.

    ...roboticus

Re: Matching strings from three files (three hashes)
by murugu (Curate) on Aug 17, 2010 at 05:33 UTC
    FluffyBunny,

    I think you are populating bwa hash with different values and you are comparing different values of bwa has with 0. As others suggested, use debugger or Dumper module to know what is in the populated hash.

    I have tweaked in some of your code. Check out whether is this what you need. I am not able to understand your requirement correctly. You can modify the below code according to your requirement.

    #!/usr/bin/perl -w use warnings; use strict; use Data::Dumper; # BWA alignment output (.sam) my %bwa = (); my $file1 = shift; open (FILE1, "$file1") || die "Failed to open $file1 for reading : $!" +; # Open second file while (<FILE1>) { # Reading second hash if ($_ =~ /^[^@]/s) { chomp; my @line = split /\s+/, $_; my $ID; if ($line[2] =~ /[^*]/) { $ID = $line[0]; push @{$bwa{$ID}}, @line[2,5,12,15,16,17,18]; } } } close FILE1 || die "Failed to close $file1 : $!"; # ORIGINAL Reference Amplicon File (.fa) my %Amp = (); my $file2 = shift; open (FILE2, "$file2")|| die "Failed to open $file2 for reading : $!"; + # Open first file local $/= ">"; my $first=<FILE2>; while (<FILE2>) { # Reading first hash chomp; my ($ID, $Seq) = split("\n"); $Amp{$ID} = $Seq } close FILE2 || die "Failed to close $file2 : $!"; # ORIGINAL Input FASTQ Sequencing File (.fq) my %Input = (); my $file3 = shift; open (FILE3, "$file3")|| die "Failed to open $file3 for reading : $!"; + # Open first file local $/= "@"; $first=<FILE3>; while (<FILE3>) { # Reading first hash chomp; my ($ID, $Seq,undef,undef) = split("\n"); $Input{$ID} = $Seq; } close FILE3 || die "Failed to close $file3 : $!"; foreach my $ID (keys %bwa) { if (exists $Input{$ID}){ print "Key : $ID\t AMP:\t$Amp{$bwa{$ID}[0]}\nInput\t$Input +{$ID}" if (exists $Amp{$bwa{$ID}[0]}); } }

    Regards,
    Murugesan Kandasamy
    use perl for(;;);

Re: Matching strings from three files (three hashes)
by ww (Archbishop) on Aug 17, 2010 at 14:40 UTC

    Please use paras (<p>...</p>) to separate your stream-of-consciousness narrative into coherent thoughts. As posted, s/it's/it was/ (see reply below) unnecessarily difficult to read.

    Second, please tell us the relevance of the error you cite:

    'Use of uninitialized value in string eq at BSanalyzer.pl line 122'

    as the code you posted may or may not be from BSanalyzer.pl but certainly contains no line 122.

    Update: tense change above (and ++ for responsiveness) ... but
    changing the error citation to refer to another line which does not exist in your posting is less useful.

      Thank you for your suggestion. I updated my post now it is easier to read. I also located where I have an error message.
Re: Matching strings from three files (three hashes)
by bluecompassrose (Initiate) on Aug 17, 2010 at 20:45 UTC

    The issue was that the Amp variable, when compared to the size of the others, would go OOB due to the way the $ID was referenced.

    I fixed up the code by assigning a separate array for the variable, and then using a loop, like so

    my @Amp = (); #Setting up a dynamic Array due to changes in amounts of + possible cell ID's my @Amps = (); #Setting up a dynamic array for the sequences under the + IDs my $n = (0); #Array spacer. Not a local variable; used later in execu +tion. my $file2 = shift; open (FILE2, "$file2")|| die "Failed to open $file2 for reading : $!"; + # Open first file local $/= ">"; my $first=<FILE2>; while (<FILE2>) { # Reading first hash chomp; my ($ID, $Seq) = split("\n"); $Amp[$n] = $ID; #Only need the ID. Sequence can be added to a dif +ferent array in a similar fashion $Seq =~ tr/acgt/ACGT/; $Amps[$n] = $Seq; #As Above, but the sequence $n++; #Increases array holder } close FILE2 || die "Failed to close $file2 : $!"; ... ... my $m = (0); #New variable to check against Amp while ($m < $n){ #while there remain more Amp ID's to check. Not usin +g <= due to how array and perl interact. foreach my $ID (keys %bwa){ #for each ID ... } $m++; }

    I realize this isn't the prettiest solution, but its worked well enough with only two different "$ID" strings in input 2. If any monks know of a neater solution, please go ahead =)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://855348]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (7)
As of 2019-09-23 15:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    The room is dark, and your next move is ...












    Results (280 votes). Check out past polls.

    Notices?