Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

multiple substitution

by naturalsciences (Beadle)
on Aug 25, 2012 at 09:17 UTC ( [id://989705]=perlquestion: print w/replies, xml ) Need Help??

naturalsciences has asked for the wisdom of the Perl Monks concerning the following question:

Is it possible to do multiple substitutions by means of a one-liner. you can add the first part in s/// as a list like

s/(apples|oranges|bananas)/fruit/

but if I would like to make a substitution apple > yummy, orange > yummier, banana > yummiest. then I can't just

s/(apples|oranges|bananas)/(yummy|yummier|yummiest)/

because it would just replace them all with the full (yummy|yummier|yummiest) not case by case. Is it possible still to get something like this by a somewhat comparably short expression.

Replies are listed 'Best First'.
Re: multiple substitution
by Corion (Patriarch) on Aug 25, 2012 at 09:27 UTC

    See perlop on the /e switch for regular expressions. perlretut also covers it.

    my %replace = ( apples => 'yummy', oranges => 'yummier', bananas => 'yummiest', ); $string =~ s!(apples|oranges|bananas)!$replace{$1} || $1!e;

      I answered a similar question recently with a loop:

      $s =~ s/$_/$h{$_}/g for keys %h;

      So I wondered how that would compare to your solution of combining the searches into a single regex. I thought your way might win for a few words, but surely with a lot of words the complexity of the regex would slow it down, right?

      Well, so much for that theory. The Perl regex engine continues to amaze me. I gave it a pattern combining 676 strings (all two-letter combinations) with pipes like yours, and it blew the forloop method away (92 times faster). It also beat a regex solution using Regexp::Assemble, but I was using very simple and known search strings, so the hand-made pipe method was safe and simple. With unknown or more complex strings, making it harder to hand-make a safe and efficient search pattern, I think RA would probably come out on top eventually. Anyway, my test and results:

      abaugher@bannor> cat 989705.pl #!/usr/bin/env perl use Modern::Perl; use Benchmark qw(:all); use Regexp::Assemble; my %h = map { $_ => uc } ( 'aa' .. 'zz' ); my $s = `cat bigfile`; # 8MB file say "Testing with @{[-s 'bigfile']} byte file and @{[ scalar keys %h ] +} patterns"; cmpthese( 10, { 'forloop' => \&forloop, 'pipes' => \&pipes, 'regexpa' => \&regexpa, }); sub forloop { $s =~ s/$_/$h{$_}/g for keys %h; } sub pipes { my $p = join '|', keys %h; $s =~ s/($p)/$h{$1}/g; } sub regexpa { my $p = Regexp::Assemble->new->add(keys %h)->re; $s =~ s/($p)/$h{$1}/g; } abaugher@bannor> perl 989705.pl Testing with 8560854 byte file and 676 patterns Rate forloop regexpa pipes forloop 9.75e-02/s -- -96% -99% regexpa 2.40/s 2364% -- -74% pipes 9.08/s 9213% 278% --

      Aaron B.
      Available for small or large Perl jobs; see my home node.

        The  pipes() and  regexpa() functions used in the timing loops above both include generation of the matching regexes in each loop execution. I doubt it adds greatly to the overall execution time, but is it proper to include regex generation in the timing of a substitution operation?

        On a more critical note, a substitution is done on the  $s string in each repetition of each timing loop, but will there be anything to be found for substitution after the first pass of whatever timing function happens to be executed first? Are not all subsequent passes in all functions just comparing the time it takes for a regex to find no match in a string? (Maybe take the 8MB file content and  x it into three identical 200 - 500MB strings and do just one comparison pass of substitutions on each string.)

        I only (re)used what the OP had as a regular expression already. But your results mesh well with When Perl Isn't Quite Fast Enough - the less ops you need, and the more you can do within the RE engine, the faster your Perl code is.

      Could you explain the code for a sec. Should those ! be /.

      I can understand $string =~ s/(apples|oranges|bananas)/$replace{$1}/e would take the first match from string ($1). Then because the /e tag the second part in substitution would be value complement to the key ($1). What is the deal with the || (or?) statement. (I guess I'm mistaken with the ! elements) Would this (mine own )code work?
      #!/usr/bin/perl -w use strict; use warnings; my @keys = qw(F29-2 F29-3 F29-4 F44-2 F53-2 F38-3 F12-2); my @vals = qw(F29B2 F29B3 F29B4 F44B2 F53B2 F38B3 F12B2); my %replace; @replace{@keys} = @vals; while (my $line = <>) { if($line =~ m/^\>/){my $name=$line;$name =~ s/(F29-2,F29-3,F29-4,F +44-2,F53-2,F38-3,F12-2)/$replace{$1}/;print $name;} elsif ($line!~m/^\>/){print $line;} }
      Did not want to use some convoluted regexp patterns because they might be usable this time but not always. Want to learn the tehnique to do such list/hash substitutions as in original question.
        Should those ! be / That ! is alright. Perl allows that.

        Could you explain the code for a sec.
        $replace{$1} || $1
        This part helps you replace the matched string with itself in case %replace does not have corresponding key. For example,

        Did not want to use some convoluted regexp
        Trust me, this is a simple regex. It can get a lot worse, if you delve deeper :)

        Want to learn the tehnique to do such list/hash substitutions as in original question
        As far as searching and replacing in strings is concerned, I guess regexes would be most helpful.
        If you are unfamiliar with s/// and s!!!, I already linked to the relevant documentation, perlop. Please read it. Regarding your own attempt, what happened when you tried it?
        Did not want to use some convoluted regexp patterns ... Want to learn the tehnique to do such list/hash substitutions ...

        A common approach to handling long search/replace string lists is to generate the search regex automatically from the keys of the search/replace hash. (Then you just have to worry about getting the hash right!)

        >perl -wMstrict -le "my @keys = qw(F29-2 F29-3 F29-4 F44-2 F53-2 F38-3 F12-2); my @vals = qw(F29B2 F29B3 F29B4 F44B2 F53B2 F38B3 F12B2); my %replace; @replace{@keys} = @vals; ;; my $rx_search = join q{ | }, map quotemeta, keys %replace; $rx_search = qr{ $rx_search }xms; print $rx_search; ;; my $s = 'F99-9 FF29-22 -F29-2- F29-2 F44-2 F12-2'; print qq{'$s'}; my $t = $s; $t =~ s{ ($rx_search) }{$replace{$1}}xmsg; print qq{'$t'}; ;; $t = $s; $t =~ s{ \b ($rx_search) \b }{$replace{$1}}xmsg; print qq{'$t'}; ;; $t = $s; $t =~ s{ (?<! \S) ($rx_search) (?! \S) }{$replace{$1}}xmsg; print qq{'$t'}; " (?^msx: F29\-4 | F53\-2 | F44\-2 | F29\-2 | F38\-3 | F29\-3 | F12\-2 ) 'F99-9 FF29-22 -F29-2- F29-2 F44-2 F12-2' 'F99-9 FF29B22 -F29B2- F29B2 F44B2 F12B2' 'F99-9 FF29-22 -F29B2- F29B2 F44B2 F12B2' 'F99-9 FF29-22 -F29-2- F29B2 F44B2 F12B2'

        Note that none of the conversion examples use the  /e switch, which will make conversion slightly faster. In all the conversion examples, F99-9 is never converted: it just doesn't appear in the conversion  @keys array.

        In the first conversion example, the F29-2 substring in FF29-22 and -F29-2- is converted even though it is embedded in another string: it appears in the conversion list.

        This is fixed for FF29-22 in the second example by using  \b boundary assertions to allow conversion only if a search string is neither preceded nor followed by a 'word' character ([A-Za-z0-9_]), but this still allows the substring in -F29-2- to be replaced because '-' is not a word character.

        This problem (if problem it is) is fixed in the third example by using different boundary assertions:  (?<! \S) and  (?! \S) allow a match (and replacement) only if the potential match substring is neither preceded nor followed by a non-whitespace character.

        $name =~ s/(F29-2,F29-3,F29-4,F44-2,F53-2,F38-3,F12-2)/$replace{$1}/;

        Note that | (pipe) and not , (comma) is the alternation metacharacter.

        Update: aaron_baugher, in a reply already posted, gave an example of the automatic regex generation technique discussed above, but the examples of using boundary conditions to refine a match may still be useful.

      OK thanks!

      quote:"s///e treats the replacement text as Perl code, rather than a double-quoted string."

      Well that could be useful!

Re: multiple substitution
by philiprbrenan (Monk) on Aug 25, 2012 at 23:50 UTC

    Or perhaps just:

    s/apples/yummy/ or s/oranges/yummier/ or s/bananas/yummiest/;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://989705]
Approved by Athanasius
Front-paged by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2024-04-24 04:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found