Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Highlight your result

by tune (Curate)
on Jul 18, 2001 at 20:07 UTC ( [id://97720]=perlquestion: print w/replies, xml ) Need Help??

tune has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

It is really annoying but I cannot find the right solution to my problem. Maybe it is a very simple regexp, maybe an intermediate one. That means i am a very beginner regexp-user than. Definitely not a hacker :(

When I have some results in a search (HTML), i would like to highlight it. For example:

$result =~ s/($seekwrd)/<b>$1</b>/ig;
Good, it works fine... Until there is an URL, and the word is found after href=" but before the ending doublequote. (e.g. <a href="mp3.com/artist">mp3.com/artist</a>, and the search word to highlight is "mp3") It messes the HTML code up.

I don't have a clue what is the right method in this case. Please help! TIA

--
<tune>

Replies are listed 'Best First'.
Re: Highlight your result
by TheoPetersen (Priest) on Jul 18, 2001 at 20:15 UTC
    To do this correctly you need to handle HTML correctly. The best way to do that is to work within the bounds of a module like HTML::Parser that can tell you when a piece of text is inside a tag or not.

    There's a good bit to learn in getting started with HTML::Parser, but it's a worthy investment of your time. If you try to handle this on your own, you'll end up having to code most or all of the cases yourself.

Re: Highlight your result
by voyager (Friar) on Jul 18, 2001 at 20:15 UTC
    Your requirement is to only apply your substitution when you are not in a tag. Knowing when you are not in a tag is non-trivial.

    You might try HTML::Parser. Your start and end handlers can just write what they read, and have your substitution take place in the text handler.

Re: Highlight your result
by gryphon (Abbot) on Jul 18, 2001 at 21:01 UTC

    Greetings tune. First off, you should definately look into HTML::Parser if you want to play around with regex in HTML documents. Trust me, I've been burned on this several times. However, just because I'm curious and want to practice my still young regex skills, I decided to play around with a solution:

    $string = 'Some mp3 at <a href="mp3.com">mp3.com</a> with mp3 stuff.'; $search = 'mp3'; $string =~ s#(<[^>]*>[^<]*)($search)|^([^<]*)($search)#$1$3<b>$2$4</b> +#g;

    This seems to work, as far as I can tell. I haven't really tested it with anything complex, but I'd say that chances are it would break in real-world use. So use HTML::Parser instead.

    -gryphon
    code('Perl') || die;

Re: Highlight your result
by lshatzer (Friar) on Jul 18, 2001 at 20:18 UTC
    You could use some fancy regex and use lookahead or lookbehind to make sure you are not inside a html tag.

    Such as s/($seekword)(?!>)/<b>$1<\/b>/ig;

    (I would also suggest either using an escape for </b> or use another delimitor for your s/// such ask s{($seekwork)(?!>)}{<b>$1</b>}ig;.

    This code is of course untested and off the top of my head, I suggest looking up perlre, and reading up on the subject of (?!patern) in regexs.

    updated: Or as noted above, try the HTML::Parser module. My trick is a quick and dirty aproach that might break in some cases.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://97720]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2024-04-23 06:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found