"be consistent" | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
That won't just exclude HTML entities from being matched, it will exclude any & character that is in the same line as a semicolon somewhere to the right of it, because .+? also matches whitespace. Instead, you should match for HTML/XML entities specifically. There are three forms that they can take, and the corresponding regexes for matching them would be: /&#[0-9]+;/ - character referenced by decimal number /&#x[0-9a-f]+;/i - character referenced by hexadecimal number /&[a-z]+;/i - character referenced by name Putting it together, you get this regex for matching an HTML entity:/&(?:#(?:[0-9]+|x[0-9a-f]+)|[a-z]+);/i Although that's kinda messy and pedantic, and you can probably get away with using this simplified version: To do what the OP requested, wrap everything after the & in a negative look-ahead bracket like choroba suggested:
Output:
(i.e. it only matches the last two & characters in $str) In reply to Re^3: Perl & regex help
by smls
|
|