Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses

Keeping tags in regex

by Anonymous Monk
on May 17, 2012 at 15:41 UTC ( #971091=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I am doing some substitutions of strings with a regex. For example, I have code like this:

%dictionary = (foo => 'bar', baz => 'w00t'); $str =~ s[$_][$dictionary{$_}] for keys %dictionary;
But now the problem is that some of the words have embedded XML tags. I would like to still perform the substition but keep the XML tag. There is at most one XML tag in a word; for example: <tag>f</tag>oo should become  <tag>b</tag>ar


 <b>fo</b>o should become <b>ba</b>r

Is there a simple way of doing this without writing out all of the possible combinations? Thanks!

Replies are listed 'Best First'.
Re: Keeping tags in regex
by choroba (Bishop) on May 17, 2012 at 15:56 UTC
    And what should happen to b<tag>a</tag>z?
      The tags always sorround the first letter -- formatting the letter to be bold, italic, red, etc.
Re: Keeping tags in regex
by choroba (Bishop) on May 17, 2012 at 22:51 UTC
    #!/usr/bin/perl use warnings; use strict; my %dictionary = (foo => 'bar', baz => 'w00t'); sub replace { my $str = shift; $str =~ m[(<[^>]+>)?([^<]+)(</[^>]+>)?(.*)]; # the key should never contain '<' my $key = $2 . $4; my @tags = ($1, $3); my $length = length $2; $key =~ s[$_][$dictionary{$_}] for keys %dictionary; if (grep $_, @tags) { # return tags to replaced string substr $key, $length, 0, $tags[1]; substr $key, 0, 0, $tags[0]; } return $key; } use Test::More; is replace('baz'), 'w00t'; is replace('<tag>f</tag>oo'), '<tag>b</tag>ar'; is replace('<b>fo</b>o'), '<b>ba</b>r'; done_testing();
Re: Keeping tags in regex
by afoken (Abbot) on May 17, 2012 at 16:25 UTC
      In the thred you link to, the discussion is over looping over the keys in a hash. Perhaps I didn't make myself clear in the question. I'm wondering if there's a way to extract the non-html out of the string, concatenate it, lookup the substitution in the hash, and then put the html back in. Something along the lines of:
      s|(\w)(<.+?>)(\w+)|$dictionary{$1$3}| if exists $dictionary{$1 . $3};

      but now I'm at a loss as to how to put the xml tag back in. I agree that if this is not possible, the alternative is to write all the combinations with the tags into the dictionary hash.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://971091]
Approved by marto
Front-paged by naikonta
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (9)
As of 2018-06-20 18:04 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (117 votes). Check out past polls.