Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Keeping tags in regex

by Anonymous Monk
on May 17, 2012 at 15:41 UTC ( #971091=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I am doing some substitutions of strings with a regex. For example, I have code like this:

%dictionary = (foo => 'bar', baz => 'w00t'); $str =~ s[$_][$dictionary{$_}] for keys %dictionary;
But now the problem is that some of the words have embedded XML tags. I would like to still perform the substition but keep the XML tag. There is at most one XML tag in a word; for example: <tag>f</tag>oo should become  <tag>b</tag>ar


 <b>fo</b>o should become <b>ba</b>r

Is there a simple way of doing this without writing out all of the possible combinations? Thanks!

Replies are listed 'Best First'.
Re: Keeping tags in regex
by choroba (Chancellor) on May 17, 2012 at 15:56 UTC
    And what should happen to b<tag>a</tag>z?
      The tags always sorround the first letter -- formatting the letter to be bold, italic, red, etc.
Re: Keeping tags in regex
by choroba (Chancellor) on May 17, 2012 at 22:51 UTC
    #!/usr/bin/perl use warnings; use strict; my %dictionary = (foo => 'bar', baz => 'w00t'); sub replace { my $str = shift; $str =~ m[(<[^>]+>)?([^<]+)(</[^>]+>)?(.*)]; # the key should never contain '<' my $key = $2 . $4; my @tags = ($1, $3); my $length = length $2; $key =~ s[$_][$dictionary{$_}] for keys %dictionary; if (grep $_, @tags) { # return tags to replaced string substr $key, $length, 0, $tags[1]; substr $key, 0, 0, $tags[0]; } return $key; } use Test::More; is replace('baz'), 'w00t'; is replace('<tag>f</tag>oo'), '<tag>b</tag>ar'; is replace('<b>fo</b>o'), '<b>ba</b>r'; done_testing();
Re: Keeping tags in regex
by afoken (Abbot) on May 17, 2012 at 16:25 UTC
      In the thred you link to, the discussion is over looping over the keys in a hash. Perhaps I didn't make myself clear in the question. I'm wondering if there's a way to extract the non-html out of the string, concatenate it, lookup the substitution in the hash, and then put the html back in. Something along the lines of:
      s|(\w)(<.+?>)(\w+)|$dictionary{$1$3}| if exists $dictionary{$1 . $3};

      but now I'm at a loss as to how to put the xml tag back in. I agree that if this is not possible, the alternative is to write all the combinations with the tags into the dictionary hash.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://971091]
Approved by marto
Front-paged by naikonta
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (8)
As of 2017-07-21 13:29 GMT
Find Nodes?
    Voting Booth?
    I came, I saw, I ...

    Results (322 votes). Check out past polls.