jemswira has asked for the wisdom of the Perl Monks concerning the following question:
Hi guys. So I have tried to make a database combining two files of data. One is accesionnumbersfull.txt
A0AQI4 A0AQI5 A0AQI7 .....
the other is this Pfam-A.seed
# STOCKHOLM 1.0 #=GF ID 1-cysPrx_C #=GF AC PF10417.4 #=GF DE C-terminal domain of 1-Cys peroxiredoxin #=GF AU Finn RD, Coggill PC #=GF SE Gene3D, pdb_1prx ... #=GS A3EU39_9BACT/160-195 AC A3EU39.1 #=GS Q7VQB3_BLOFL/159-194 AC Q7VQB3.1 #=GS Q057V5_BUCCC/160-195 AC Q057V5.1 #=GS A5CDZ8_ORITB/160-195 AC A5CDZ8.1 ...
So what i'm supposed to do is to match the numbers in the first file to the groups in the second. so the group name is after the #=GF AC. PFxxxxx Problem is the files are huge. the first file alone is 138mb. So i have memory issues. my code is as follow.
#!/usr/bin/perl use warnings; use strict; open OUTPUT, ">C:\\Users\\Jems\\Desktop\\Perl\\PFAMin.txt" or die $!; open ANUMBER, "C:\\Users\\Jems\\Desktop\\Perl\\AccessionNumbersfull.tx +t" or die $!; our @acnumbers; select OUTPUT; $|=1; foreach (<ANUMBER>){ chomp; push (@acnumbers, $_);} $/="\/\/"; our $acnumbers; our @list; foreach $acnumbers(@acnumbers){ open PFAMDB, "C:\\Users\\Jems\\Desktop\\Perl\\Pfam-A.seed" or die +$!; my $unit; foreach $unit(<PFAMDB>){ my @units= split /#/,$unit; my @pfx=grep(/=GF AC/,@units); foreach (@pfx){s/=GF AC/\x20/}; our $units; foreach $units(@units){ if ($units=~/.*AC $acnumbers/){ push (@list, @pfx); }else{next} } } print "$acnumbers is in:"; print "@list \n"; undef @list; }
anyway to streamline it?
another thing i needed to do is add the names corresponding to the numbers. those are in a seperate file, but the sequence is the same. i took the numbers out of that file. format:
>tr|A0FGZ9|A0FGZ9_9ARCH Methyl coenzyme M reductase (Fragment) OS=uncu +ltured archaeon GN=mcrA PE=4 SV=1 >tr|A0FH03|A0FH03_9ARCH Methyl coenzyme M reductase (Fragment) OS=uncu +ltured archaeon GN=mcrA PE=4 SV=1
but i don't know how to. any ideas? thanks!!
sorry but it's kinda urgent and i've been trying for ages!
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Using less memory with BIG files
by moritz (Cardinal) on Feb 02, 2012 at 08:11 UTC | |
by jemswira (Novice) on Feb 02, 2012 at 12:11 UTC | |
Re: Using less memory with BIG files
by GrandFather (Saint) on Feb 02, 2012 at 20:24 UTC | |
by jemswira (Novice) on Feb 04, 2012 at 17:02 UTC | |
by GrandFather (Saint) on Feb 05, 2012 at 22:45 UTC | |
Re: Using less memory with BIG files
by sundialsvc4 (Abbot) on Feb 02, 2012 at 14:09 UTC | |
by jemswira (Novice) on Feb 02, 2012 at 14:51 UTC |