laziness, impatience, and hubris | |
PerlMonks |
Replacement of data in a column of a file using Hashes created from another fileby rohitmonk (Initiate) |
on Oct 30, 2012 at 08:59 UTC ( [id://1001475]=perlquestion: print w/replies, xml ) | Need Help?? |
rohitmonk has asked for the wisdom of the Perl Monks concerning the following question: Greetings fellow monks. Being a beginner this is my first post seeking your wisdom. I am trying to edit a file in one format to another one. I have a list of values which are to be replaced with the values and their description, such as gi|315134697|dbj|AP012030.1|=gi|315134697|dbj|AP012030.1| Escherichia coli DH1 (ME8569) DNA,... gi|260447279|gb|CP001637.1|=gi|260447279|gb|CP001637.1| Escherichia coli DH1, complete genome gi|238859724|gb|CP001396.1|=gi|238859724|gb|CP001396.1| Escherichia coli BW2952, complete g... gi|194400059|gb|EU855241.1|=gi|194400059|gb|EU855241.1| Shigella flexneri strain FBD047 23S... gi|194400053|gb|EU855235.1|=gi|194400053|gb|EU855235.1| Shigella dysenteriae strain FBD056 ... gi|169887498|gb|CP000948.1|=gi|169887498|gb|CP000948.1| Escherichia coli str. K12 substr. D... gi|85674274|dbj|AP009048.1|=gi|85674274|dbj|AP009048.1| Escherichia coli str. K12 substr. W... gi|48994873|gb|U00096.2|=gi|48994873|gb|U00096.2| Escherichia coli str. K-12 substr. MG1... gi|81239530|gb|CP000034.1|=gi|81239530|gb|CP000034.1| Shigella dysenteriae Sd197, complete... gi|5801828|gb|AF053967.1|AF053967=gi|5801828|gb|AF053967.1|AF053967 Escherichia coli strain ECOR ... gi|5801827|gb|AF053966.1|AF053966=gi|5801827|gb|AF053966.1|AF053966 Escherichia coli rrlD operon,... gi|406775301|gb|CP003297.1|=gi|406775301|gb|CP003297.1| Escherichia coli O104:H4 str. 2009E... gi|383403426|gb|CP002967.1|=gi|383403426|gb|CP002967.1| Escherichia coli W, complete genome I need to replace the value preceding the '=' sign by the value succeeding it. So I made a hash of it using the split function.
Then i wanted to use this hash as reference and replace every instance of the occurrence of the hash-key by the hash value. The file where I want to do the replacement looks like this 10_25_res.txt:Locus3034v1rpkm4.98 gi|315134697|dbj|AP012030.1| 100.00 280 0 0 1 280 3402569 3402290 4e-140 506 10_25_res.txt:Locus3034v1rpkm4.98 gi|315134697|dbj|AP012030.1| 99.64 280 1 0 1 280 227880 228159 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|315134697|dbj|AP012030.1| 99.64 280 1 0 1 280 2704973 2704694 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|315134697|dbj|AP012030.1| 99.64 280 1 0 1 280 4018745 4019024 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|315134697|dbj|AP012030.1| 99.64 280 1 0 1 280 4149866 4150145 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|315134697|dbj|AP012030.1| 99.64 280 1 0 1 280 4191268 4191547 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|315134697|dbj|AP012030.1| 98.93 280 3 0 1 280 3924929 3925208 9e-136 491 10_25_res.txt:Locus3034v1rpkm4.98 gi|260447279|gb|CP001637.1| 100.00 280 0 0 1 280 459101 459380 4e-140 506 10_25_res.txt:Locus3034v1rpkm4.98 gi|260447279|gb|CP001637.1| 99.64 280 1 0 1 280 1156698 1156977 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|260447279|gb|CP001637.1| 99.64 280 1 0 1 280 3643499 3643220 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|260447279|gb|CP001637.1| 99.64 280 1 0 1 280 4302307 4302028 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|260447279|gb|CP001637.1| 99.64 280 1 0 1 280 4343709 4343430 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|260447279|gb|CP001637.1| 99.64 280 1 0 1 280 4474830 4474551 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|260447279|gb|CP001637.1| 98.93 280 3 0 1 280 4568646 4568367 9e-136 491 So i attempted to write the final code, as follows -
This works fine for a small file, but my files are more than 2 million lines each. I want to increase the speed of my program. Can you please share your wisdom on how to make it faster for larger files? Regards
Back to
Seekers of Perl Wisdom
|
|