Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

using a hash to do substitution?

by mr.dunstan (Monk)
on Jun 14, 2001 at 05:10 UTC ( #88265=perlquestion: print w/replies, xml ) Need Help??

mr.dunstan has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to do some hardcore substitution in a chunk of text, where the things I am searching for and substituting are stored in a hash ...

my %hash;
$hash{'usernamevar'} = "mr.dunstan";

Now lets look at the untouched content ...

I am usernamevar.

drum roll ...
*.*.* magical perl regexp thingy happens *.*.*
I am mr.dunstan.

"Ta da!"

The content now has the value of $hash{'usernamevar'} magically substituted wherever the text usernamevar appears in the content.

"There are only 35 chambers, there is no 36th."
"I know ... I want to create a new one ..." - Shaolin Master Killer

Replies are listed 'Best First'.
Re: using a hash to do substitution?
by no_slogan (Deacon) on Jun 14, 2001 at 05:20 UTC
    # create a regex that matches any one of the hash keys $regex = join("|", map(quotemeta, keys %hash)); # substitute them s/($regex)/$hash{$1}/eg;
      I'd change this subtly (unless you only care to match whole words only) to:
      $regex = join("|", map(quotemeta, sort { length $b <=> length $a } keys %hash)); s/($regex)/$hash{$1}/eg;
      That is, the longer keys will appear first in the alternatives list and thus they will be matched first.


      Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
        $code .= map { "s/\b$_\b/$hash{$_}/eg;\n" } keys %hash; eval $code;
        This way, you cut the expense of alteration (if the keys are user defined, or you have metachars in them, instead of quotemeta, add \Q and \E before and after, respectively, the first $_ in the substitution). You can print $code to test its values, and by doing it can easily customize it for your specific needs(though if substituting is the only thing going on, this won't be a concern).
      It is likely that s/\b($regex)\b/$hash{$1}/eg; is more appropriate, so that, for example, $hash{'name'} = 'Bryan' doesn't turn 'nameserver' into 'Bryanserver'
      no_slogan's regex trick is very nice (++), however, I've been led to believe that lots of ors (the | char) in a regex can blow out the stack (or take a long time) because of the back tracking required. Can any more adept monks comment on this?

      The other more brute force less pretty way to solve the problem is:

      foreach $key (keys %hash) { $text =~ s/$key/$hash{$key}/eg; }

      -I went outside... and then I came back in!!!!

        Rather than speculate on efficiency you can always use benchmark. One small point though - whenever you interpolate a string into a m// regex or the first half of a s/// regex you need to backslash your special regex metacharacters $^*()+{[\|.?

        The easiest way is to use quotemeta.

        foreach $key (keys %hash) { $key = quotemeta $key; $text =~ s/$key/$hash{$key}/eg; }

        This is *vital* for reliability. Otherwise you will get unexpected runtime failures when your data eventually contains metachars (typos, deliberate, malicious...)

        use Benchmark; timethese(10000, { 'Simple loop' => ' $text = "This is my test string"; %hash = qw(test text foo bar use loop the end); foreach $key (keys %hash) { $key = quotemeta $key; $text =~ s/$key/$hash{$key}/eg; } ', 'Alternation' => ' $text = "This is my test string"; %hash = qw(test text foo bar use loop the end); $regex = join("|", map(quotemeta, keys %hash)); $text =~ s/\b($regex)\b/$hash{$1}/eg; ', } ); Output: Benchmark: timing 100000 iterations of Alternation, Simple loop... Alternation: 5 wallclock secs ( 4.12 usr + 0.00 sys = 4.12 CPU) @ 2 +4271.84/s (n=100000) Simple loop: 10 wallclock secs ( 9.72 usr + 0.00 sys = 9.72 CPU) @ 1 +0288.07/s (n=100000)

        So as it happens alternation is twice as fast. I thought your solution would be faster but there you go!

        If in doubt - use Benchmark

        Cheers

        tachyon

        On the other hand, an alternating regex can use the /o modifier, while your version can't. Regex compilation is a costly operation, so alternation should be much faster under /o.
           MeowChow                                   
                       s aamecha.s a..a\u$&owag.print

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://88265]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (1)
As of 2023-03-21 23:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which type of climate do you prefer to live in?






    Results (60 votes). Check out past polls.

    Notices?