Highlight your result

tune has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

It is really annoying but I cannot find the right solution to my problem. Maybe it is a very simple regexp, maybe an intermediate one. That means i am a very beginner regexp-user than. Definitely not a hacker :(

When I have some results in a search (HTML), i would like to highlight it. For example:

$result =~ s/($seekwrd)/<b>$1</b>/ig;
[download]

Good, it works fine... Until there is an URL, and the word is found after href=" but before the ending doublequote. (e.g. <a href="mp3.com/artist">mp3.com/artist</a>, and the search word to highlight is "mp3") It messes the HTML code up.

I don't have a clue what is the right method in this case. Please help! TIA

--
<tune>

Comment on Highlight your result Select or Download Code

Replies are listed 'Best First'.
Re: Highlight your result by TheoPetersen (Priest) on Jul 18, 2001 at 20:15 UTC
To do this correctly you need to handle HTML correctly. The best way to do that is to work within the bounds of a module like HTML::Parser that can tell you when a piece of text is inside a tag or not. There's a good bit to learn in getting started with HTML::Parser, but it's a worthy investment of your time. If you try to handle this on your own, you'll end up having to code most or all of the cases yourself.	[reply]
Re: Highlight your result by voyager (Friar) on Jul 18, 2001 at 20:15 UTC
Your requirement is to only apply your substitution when you are not in a tag. Knowing when you are not in a tag is non-trivial. You might try HTML::Parser. Your `start` and `end` handlers can just write what they read, and have your substitution take place in the `text` handler.	[reply]
Re: Highlight your result by gryphon (Abbot) on Jul 18, 2001 at 21:01 UTC
Greetings tune. First off, you should definately look into HTML::Parser if you want to play around with regex in HTML documents. Trust me, I've been burned on this several times. However, just because I'm curious and want to practice my still young regex skills, I decided to play around with a solution: `$string = 'Some mp3 at <a href="mp3.com">mp3.com</a> with mp3 stuff.'; $search = 'mp3'; $string =~ s#(<[^>]>[^<])($search)\|^([^<])($search)#$1$3<b>$2$4</b> +#g;` [download] This seems to work, as far as I can tell. I haven't really tested it with anything complex, but I'd say that chances are it would break in real-world use. So use HTML::Parser instead. -gryphon code('Perl') \|\| die;*	[reply] [d/l]
Re: Highlight your result by lshatzer (Friar) on Jul 18, 2001 at 20:18 UTC
You could use some fancy regex and use lookahead or lookbehind to make sure you are not inside a html tag. Such as `s/($seekword)(?!>)/<b>$1<\/b>/ig;` (I would also suggest either using an escape for `</b>` or use another delimitor for your `s///` such ask `s{($seekwork)(?!>)}{<b>$1</b>}ig;`. This code is of course untested and off the top of my head, I suggest looking up perlre, and reading up on the subject of `(?!patern)` in regexs. updated: Or as noted above, try the HTML::Parser module. My trick is a quick and dirty aproach that might break in some cases.	[reply] [d/l] [select]


Problems? Is your data what you think it is?
	PerlMonks