Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Replacement of data in a column of a file using Hashes created from another file

by choroba (Abbot)
on Oct 30, 2012 at 09:22 UTC ( #1001476=note: print w/ replies, xml ) Need Help??


in reply to Replacement of data in a column of a file using Hashes created from another file

The reason why it is slow is you use the nested loops (for each line, you loop over all the keys). The following code generates a regular expression that will match all the keys, so it saves you one loop:

#!/usr/bin/perl use warnings; use strict; open my $EQ, '<', '1.txt' or die "1: $!"; my %subst; while (<$EQ>) { chomp; # <- updated my ($search, $replace) = split /=/; $subst{$search} = $replace; } my $regex = join '|', map quotemeta, keys %subst; open my $LST, '<', '2.txt' or die "2: $!"; while (<$LST>) { s/($regex)/$subst{$1}/; print; }
لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ


Comment on Re: Replacement of data in a column of a file using Hashes created from another file
Download Code
Re^2: Replacement of data in a column of a file using Hashes created from another file
by rohitmonk (Initiate) on Oct 30, 2012 at 09:43 UTC

    Thank you for the reply. But I need to get an output file, I am pretty new to this syntax.

    And this output which gets printed does not have any replacements in it when i run it. Where is it searching for the hash-key and replacing it with the value?

      Not sure what you mean:

      Input:

      10_25_res.txt:Locus3034v1rpkm4.98 gi|315134697|dbj|AP012030.1| 100.00 +280 0 0 1 280 3402569 3402290 4e-140 506 10_25_res.txt:Locus3034v1rpkm4.98 gi|315134697|dbj|AP012030.1| 99.64 2 +80 1 0 1 280 227880 228159 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|315134697|dbj|AP012030.1| 99.64 2 +80 1 0 1 280 2704973 2704694 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|315134697|dbj|AP012030.1| 99.64 2 +80 1 0 1 280 4018745 4019024 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|315134697|dbj|AP012030.1| 99.64 2 +80 1 0 1 280 4149866 4150145 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|315134697|dbj|AP012030.1| 99.64 2 +80 1 0 1 280 4191268 4191547 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|315134697|dbj|AP012030.1| 98.93 2 +80 3 0 1 280 3924929 3925208 9e-136 491 10_25_res.txt:Locus3034v1rpkm4.98 gi|260447279|gb|CP001637.1| 100.00 2 +80 0 0 1 280 459101 459380 4e-140 506 10_25_res.txt:Locus3034v1rpkm4.98 gi|260447279|gb|CP001637.1| 99.64 28 +0 1 0 1 280 1156698 1156977 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|260447279|gb|CP001637.1| 99.64 28 +0 1 0 1 280 3643499 3643220 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|260447279|gb|CP001637.1| 99.64 28 +0 1 0 1 280 4302307 4302028 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|260447279|gb|CP001637.1| 99.64 28 +0 1 0 1 280 4343709 4343430 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|260447279|gb|CP001637.1| 99.64 28 +0 1 0 1 280 4474830 4474551 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|260447279|gb|CP001637.1| 98.93 28 +0 3 0 1 280 4568646 4568367 9e-136 491

      Output:

      10_25_res.txt:Locus3034v1rpkm4.98 gi|315134697|dbj|AP012030.1| Escheri +chia coli DH1 (ME8569) DNA,... 100.00 280 0 0 1 280 3402569 3402290 4e-140 506 10_25_res.txt:Locus3034v1rpkm4.98 gi|315134697|dbj|AP012030.1| Escheri +chia coli DH1 (ME8569) DNA,... 99.64 280 1 0 1 280 227880 228159 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|315134697|dbj|AP012030.1| Escheri +chia coli DH1 (ME8569) DNA,... 99.64 280 1 0 1 280 2704973 2704694 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|315134697|dbj|AP012030.1| Escheri +chia coli DH1 (ME8569) DNA,... 99.64 280 1 0 1 280 4018745 4019024 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|315134697|dbj|AP012030.1| Escheri +chia coli DH1 (ME8569) DNA,... 99.64 280 1 0 1 280 4149866 4150145 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|315134697|dbj|AP012030.1| Escheri +chia coli DH1 (ME8569) DNA,... 99.64 280 1 0 1 280 4191268 4191547 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|315134697|dbj|AP012030.1| Escheri +chia coli DH1 (ME8569) DNA,... 98.93 280 3 0 1 280 3924929 3925208 9e-136 491 10_25_res.txt:Locus3034v1rpkm4.98 gi|260447279|gb|CP001637.1| Escheric +hia coli DH1, complete genome 100.00 280 0 0 1 280 459101 459380 4e-140 506 10_25_res.txt:Locus3034v1rpkm4.98 gi|260447279|gb|CP001637.1| Escheric +hia coli DH1, complete genome 99.64 280 1 0 1 280 1156698 1156977 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|260447279|gb|CP001637.1| Escheric +hia coli DH1, complete genome 99.64 280 1 0 1 280 3643499 3643220 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|260447279|gb|CP001637.1| Escheric +hia coli DH1, complete genome 99.64 280 1 0 1 280 4302307 4302028 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|260447279|gb|CP001637.1| Escheric +hia coli DH1, complete genome 99.64 280 1 0 1 280 4343709 4343430 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|260447279|gb|CP001637.1| Escheric +hia coli DH1, complete genome 99.64 280 1 0 1 280 4474830 4474551 2e-138 500 10_25_res.txt:Locus3034v1rpkm4.98 gi|260447279|gb|CP001637.1| Escheric +hia coli DH1, complete genome 98.93 280 3 0 1 280 4568646 4568367 9e-136 491

      The replacement is done in the s/($regex)/$subst{$1}/ statement, and you can direct the output to a file by:

      ./program.pl > output_file.txt
      There does appear to be a stray carriage return in the output - the code may needs a chomp somewhere...
        True, thank you. chomp added to the code. I did not notice because the format is not familiar to me and the expected output was not included to diff against.
        لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

        Thanks a ton Choroba

        Hi, Try this one

        use strict; use warnings; my ($a, $b, $c) = @ARGV; open(FILE, "$a"); my $text=do {local $/; <FILE>}; close FILE; my %mymatch = $text =~ m/^\s*(.*?)\s*\=\s*(.*?)\s*$/igm; open(FILE, "$b"); my $maintext=do {local $/; <FILE>}; close FILE; foreach my $myfnd (sort {length($b) <=> length($a)} keys %mymatch) { $maintext =~ s/\Q$myfnd\E/$mymatch{$myfnd}/ig; } open(FILE,">$c"); print FILE $maintext; close FILE;

        Thank you so much for looking into the problem and helping me out. Cheers... <\p>

Re^2: Replacement of data in a column of a file using Hashes created from another file
by rohitmonk (Initiate) on Oct 31, 2012 at 05:46 UTC

    Thank you for the code, it was dope. Cool skills u got, wish to learn more. Cheers....

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1001476]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (9)
As of 2014-10-02 09:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (52 votes), past polls