Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Re: multiple substitution

by Corion (Pope)
on Aug 25, 2012 at 09:27 UTC ( #989706=note: print w/replies, xml ) Need Help??

in reply to multiple substitution

See perlop on the /e switch for regular expressions. perlretut also covers it.

my %replace = ( apples => 'yummy', oranges => 'yummier', bananas => 'yummiest', ); $string =~ s!(apples|oranges|bananas)!$replace{$1} || $1!e;

Replies are listed 'Best First'.
Re^2: multiple substitution
by aaron_baugher (Curate) on Aug 25, 2012 at 16:39 UTC

    I answered a similar question recently with a loop:

    $s =~ s/$_/$h{$_}/g for keys %h;

    So I wondered how that would compare to your solution of combining the searches into a single regex. I thought your way might win for a few words, but surely with a lot of words the complexity of the regex would slow it down, right?

    Well, so much for that theory. The Perl regex engine continues to amaze me. I gave it a pattern combining 676 strings (all two-letter combinations) with pipes like yours, and it blew the forloop method away (92 times faster). It also beat a regex solution using Regexp::Assemble, but I was using very simple and known search strings, so the hand-made pipe method was safe and simple. With unknown or more complex strings, making it harder to hand-make a safe and efficient search pattern, I think RA would probably come out on top eventually. Anyway, my test and results:

    abaugher@bannor> cat #!/usr/bin/env perl use Modern::Perl; use Benchmark qw(:all); use Regexp::Assemble; my %h = map { $_ => uc } ( 'aa' .. 'zz' ); my $s = `cat bigfile`; # 8MB file say "Testing with @{[-s 'bigfile']} byte file and @{[ scalar keys %h ] +} patterns"; cmpthese( 10, { 'forloop' => \&forloop, 'pipes' => \&pipes, 'regexpa' => \&regexpa, }); sub forloop { $s =~ s/$_/$h{$_}/g for keys %h; } sub pipes { my $p = join '|', keys %h; $s =~ s/($p)/$h{$1}/g; } sub regexpa { my $p = Regexp::Assemble->new->add(keys %h)->re; $s =~ s/($p)/$h{$1}/g; } abaugher@bannor> perl Testing with 8560854 byte file and 676 patterns Rate forloop regexpa pipes forloop 9.75e-02/s -- -96% -99% regexpa 2.40/s 2364% -- -74% pipes 9.08/s 9213% 278% --

    Aaron B.
    Available for small or large Perl jobs; see my home node.

      The  pipes() and  regexpa() functions used in the timing loops above both include generation of the matching regexes in each loop execution. I doubt it adds greatly to the overall execution time, but is it proper to include regex generation in the timing of a substitution operation?

      On a more critical note, a substitution is done on the  $s string in each repetition of each timing loop, but will there be anything to be found for substitution after the first pass of whatever timing function happens to be executed first? Are not all subsequent passes in all functions just comparing the time it takes for a regex to find no match in a string? (Maybe take the 8MB file content and  x   it into three identical 200 - 500MB strings and do just one comparison pass of substitutions on each string.)

      I only (re)used what the OP had as a regular expression already. But your results mesh well with When Perl Isn't Quite Fast Enough - the less ops you need, and the more you can do within the RE engine, the faster your Perl code is.

Re^2: multiple substitution
by naturalsciences (Beadle) on Aug 25, 2012 at 10:08 UTC

    Could you explain the code for a sec. Should those ! be /.

    I can understand $string =~ s/(apples|oranges|bananas)/$replace{$1}/e would take the first match from string ($1). Then because the /e tag the second part in substitution would be value complement to the key ($1). What is the deal with the || (or?) statement. (I guess I'm mistaken with the ! elements) Would this (mine own )code work?
    #!/usr/bin/perl -w use strict; use warnings; my @keys = qw(F29-2 F29-3 F29-4 F44-2 F53-2 F38-3 F12-2); my @vals = qw(F29B2 F29B3 F29B4 F44B2 F53B2 F38B3 F12B2); my %replace; @replace{@keys} = @vals; while (my $line = <>) { if($line =~ m/^\>/){my $name=$line;$name =~ s/(F29-2,F29-3,F29-4,F +44-2,F53-2,F38-3,F12-2)/$replace{$1}/;print $name;} elsif ($line!~m/^\>/){print $line;} }
    Did not want to use some convoluted regexp patterns because they might be usable this time but not always. Want to learn the tehnique to do such list/hash substitutions as in original question.
      Should those ! be / That ! is alright. Perl allows that.

      Could you explain the code for a sec.
      $replace{$1} || $1
      This part helps you replace the matched string with itself in case %replace does not have corresponding key. For example,

      Did not want to use some convoluted regexp
      Trust me, this is a simple regex. It can get a lot worse, if you delve deeper :)

      Want to learn the tehnique to do such list/hash substitutions as in original question
      As far as searching and replacing in strings is concerned, I guess regexes would be most helpful.
      If you are unfamiliar with s/// and s!!!, I already linked to the relevant documentation, perlop. Please read it. Regarding your own attempt, what happened when you tried it?
      Did not want to use some convoluted regexp patterns ... Want to learn the tehnique to do such list/hash substitutions ...

      A common approach to handling long search/replace string lists is to generate the search regex automatically from the keys of the search/replace hash. (Then you just have to worry about getting the hash right!)

      >perl -wMstrict -le "my @keys = qw(F29-2 F29-3 F29-4 F44-2 F53-2 F38-3 F12-2); my @vals = qw(F29B2 F29B3 F29B4 F44B2 F53B2 F38B3 F12B2); my %replace; @replace{@keys} = @vals; ;; my $rx_search = join q{ | }, map quotemeta, keys %replace; $rx_search = qr{ $rx_search }xms; print $rx_search; ;; my $s = 'F99-9 FF29-22 -F29-2- F29-2 F44-2 F12-2'; print qq{'$s'}; my $t = $s; $t =~ s{ ($rx_search) }{$replace{$1}}xmsg; print qq{'$t'}; ;; $t = $s; $t =~ s{ \b ($rx_search) \b }{$replace{$1}}xmsg; print qq{'$t'}; ;; $t = $s; $t =~ s{ (?<! \S) ($rx_search) (?! \S) }{$replace{$1}}xmsg; print qq{'$t'}; " (?^msx: F29\-4 | F53\-2 | F44\-2 | F29\-2 | F38\-3 | F29\-3 | F12\-2 ) 'F99-9 FF29-22 -F29-2- F29-2 F44-2 F12-2' 'F99-9 FF29B22 -F29B2- F29B2 F44B2 F12B2' 'F99-9 FF29-22 -F29B2- F29B2 F44B2 F12B2' 'F99-9 FF29-22 -F29-2- F29B2 F44B2 F12B2'

      Note that none of the conversion examples use the  /e switch, which will make conversion slightly faster. In all the conversion examples, F99-9 is never converted: it just doesn't appear in the conversion  @keys array.

      In the first conversion example, the F29-2 substring in FF29-22 and -F29-2- is converted even though it is embedded in another string: it appears in the conversion list.

      This is fixed for FF29-22 in the second example by using  \b boundary assertions to allow conversion only if a search string is neither preceded nor followed by a 'word' character ([A-Za-z0-9_]), but this still allows the substring in -F29-2- to be replaced because '-' is not a word character.

      This problem (if problem it is) is fixed in the third example by using different boundary assertions:  (?<! \S) and  (?! \S) allow a match (and replacement) only if the potential match substring is neither preceded nor followed by a non-whitespace character.

      $name =~ s/(F29-2,F29-3,F29-4,F44-2,F53-2,F38-3,F12-2)/$replace{$1}/;

      Note that | (pipe) and not , (comma) is the alternation metacharacter.

      Update: aaron_baugher, in a reply already posted, gave an example of the automatic regex generation technique discussed above, but the examples of using boundary conditions to refine a match may still be useful.

Re^2: multiple substitution
by naturalsciences (Beadle) on Aug 25, 2012 at 09:34 UTC

    OK thanks!

    quote:"s///e treats the replacement text as Perl code, rather than a double-quoted string."

    Well that could be useful!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://989706]
[Eily]: I still don't understand how the Turkish AA fit into the German+Czech joke though :P
[LanX]: new Firefox + cb sidebar do random auto expand on submit
[LanX]: probably need to start pm discussion
[LanX]: they have a constitutional referendum in turkey, kind of "do you want a dictator" and everybody opting no gets problems ...
[Corion]: LanX: Random Auto Expand?
[Corion]: LanX: Well, everybody opting "yes" will also get problems. The question is more like "Do you want problems now or problems later?"
[LanX]: and the AA had posters with a big say NO with a small "to alcohol"
[ambrus]: LanX: is it the kind of free and secret vote where there's only one box you can check?

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (9)
As of 2017-03-27 12:14 GMT
Find Nodes?
    Voting Booth?
    Should Pluto Get Its Planethood Back?

    Results (320 votes). Check out past polls.