Hi
Task
I need a regex to transform wiki markup surrounding words to html, * to <b> etc.
my problem is that */_ could be combined at word boundaries, see the following example
DB<66> $_=$wiki; tf();tf();tf() ; print "'$wiki' \n=>\n'$_'"
'_*one /two/*_ _*three /four/*_ _*five /six/*_'
=>
'<u><b>one <i>two</i></b></u> <u><b>three /four/</b></u> <u><b>five <i
+>six</i></b></u>'
DB<67>
'_*one /two/*_ _*three /four/*_ _*five /six/*_'
=>
'one two three /four/ five six'
as you can see I have to run the tf() transformation thrice
DB<40> %h = ( '*'=>'b', '/' => 'i' , '_' => 'u' )
DB<59> sub tf { s{ $pre ([_*/]) (.*?) \2 $post}{$1<$h{$2}>$3</$h{$2}
+>$4}xg }
DB<62> $pre = qr/(^|\s|>)/
DB<63> $post = qr/($|\s|<)/
DB<65> $wiki='_*one /two/*_ _*three /four/*_ _*five /six/*_'
Question
Is there a way to make it a one-run transformation?
Trouble is that /g continues after the inserted replacement, here underline
I was experimenting with lookaround-assertions and \G and couldn't get it done.
Approaches
The only ways I can (theoretically) think of so far are
- to loop over /g in scalar context while (s///g) { ... } and to manipulate pos
- or to manipulate pos in an embedded Perl code (?{...})
- to call tf() recursively in the /e evaled replacement part
NB: It's a more theoretical question because running tf() three times doesn't pose problems.
UPDATE:
I just noticed a bug, since four wasn't expanded.
&tf has to be better written with a lookbehind which doesn't consume the next whitespace
DB<90> sub tf { s{ $pre ([_*/]) (.*?) \2 (?=$post)}{$1<$h{$2}>$3</$h
+{$2}>}xg }
I'll update an SSCCE soon.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.