http://www.perlmonks.org?node_id=990508

jemswira has asked for the wisdom of the Perl Monks concerning the following question:

Hi there monks.

I'm trying to combine two databases into one for my computational bio project. The first database (Test) looks something like this:

Q197F8 ORFNames=IIV3-002R Q197F7 ORFNames=IIV3-003L Q6GZX2 ORFNames=FV3-003R Q6GZX1 ORFNames=FV3-004R Q197F5 ORFNames=IIV3-005L Q6GZX0 ORFNames=FV3-005R ;PF02393 Q91G88 ORFNames=IIV6-006L ;PF12299;PF04383 Q6GZW9 ORFNames=FV3-006R

It's just a small part of it. The .{6} before the ORFNAMES is the accession numbers. Next I have this sort of format in Test2.

Q197F8 | PF04947.9 Q91G88 | PF01486.12 PF00319.13

The format I need in the end is this:

Q197F8 IIV3-002R PF04947.9 Q91G88 IIV6-006L PF01486.12 PF00319.13

Since the first database test was going to be much larger than the second, I wanted to make it into an array list with the accession numbers and names, then go down the list of accession numbers in Test2 and print them out. Like so:

#!/usr/bin/perl use warnings; use strict; open DATA, "C:\\Users\\Jems\\Desktop\\Perl\\test\\test.txt" or die $!; use Modern::Perl; use Data::Dump qw/dump/; our %data; my $ac; while (<DATA>) { my @splitted= split(/=|;/); foreach (@splitted){ if (/^(.{6})\sORFNames/) { $ac = $1; chomp ($ac); next; } if (/^(.+)\s\n/) { #print "$ac $1\n"; push @{ $data{$1} }, $ac; next; #print @{$data{$ac}} if exists $data{$ac}; } if (/^(.+)\s;PF/) { push @{ $data{$1} }, $ac; next; } next;} next; } my $acn; open ACTIVATOR, "C:\\Users\\Jems\\Desktop\\Perl\\test\\Test2.txt" or d +ie $!; open ACTIVOUT, ">C:\\Users\\jems\\Desktop\\Perl\\test\\ActivACNPF.txt" + or die $!; select ACTIVOUT; while ($acn= <ACTIVATOR>){ if($acn =~ m/^(......)\s\|/){ my $ab = $1; chomp ($ab); #print "$ab"; print "$acn | @{$data{$ab}}\n" if exists $data{$ab}; next; } } print STDOUT "DONE ACTIV";

The commented parts were my print testing. However I just can't seem to get the print at line 44 to print what I want. the commented print at line 23 returns blank, but the commented print at line 19 prints that both are correct. Also, if I use a print %data, it will return values. Am I checking something wrongly?

Thanks monks!