Beefy Boxes and Bandwidth Generously Provided by pair Networks kudra
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Highlight your result

by tune (Curate)
on Jul 18, 2001 at 16:07 UTC ( [id://97720]=perlquestion: print w/replies, xml ) Need Help??

This is an archived low-energy page for bots and other anonmyous visitors. Please sign up if you are a human and want to interact.

tune has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

It is really annoying but I cannot find the right solution to my problem. Maybe it is a very simple regexp, maybe an intermediate one. That means i am a very beginner regexp-user than. Definitely not a hacker :(

When I have some results in a search (HTML), i would like to highlight it. For example:

$result =~ s/($seekwrd)/<b>$1</b>/ig;
Good, it works fine... Until there is an URL, and the word is found after href=" but before the ending doublequote. (e.g. <a href="mp3.com/artist">mp3.com/artist</a>, and the search word to highlight is "mp3") It messes the HTML code up.

I don't have a clue what is the right method in this case. Please help! TIA

--
<tune>

Replies are listed 'Best First'.
Re: Highlight your result
by TheoPetersen (Priest) on Jul 18, 2001 at 16:15 UTC
    To do this correctly you need to handle HTML correctly. The best way to do that is to work within the bounds of a module like HTML::Parser that can tell you when a piece of text is inside a tag or not.

    There's a good bit to learn in getting started with HTML::Parser, but it's a worthy investment of your time. If you try to handle this on your own, you'll end up having to code most or all of the cases yourself.

Re: Highlight your result
by voyager (Friar) on Jul 18, 2001 at 16:15 UTC
    Your requirement is to only apply your substitution when you are not in a tag. Knowing when you are not in a tag is non-trivial.

    You might try HTML::Parser. Your start and end handlers can just write what they read, and have your substitution take place in the text handler.

Re: Highlight your result
by lshatzer (Friar) on Jul 18, 2001 at 16:18 UTC
    You could use some fancy regex and use lookahead or lookbehind to make sure you are not inside a html tag.

    Such as s/($seekword)(?!>)/<b>$1<\/b>/ig;

    (I would also suggest either using an escape for </b> or use another delimitor for your s/// such ask s{($seekwork)(?!>)}{<b>$1</b>}ig;.

    This code is of course untested and off the top of my head, I suggest looking up perlre, and reading up on the subject of (?!patern) in regexs.

    updated: Or as noted above, try the HTML::Parser module. My trick is a quick and dirty aproach that might break in some cases.
Re: Highlight your result
by gryphon (Abbot) on Jul 18, 2001 at 17:01 UTC

    Greetings tune. First off, you should definately look into HTML::Parser if you want to play around with regex in HTML documents. Trust me, I've been burned on this several times. However, just because I'm curious and want to practice my still young regex skills, I decided to play around with a solution:

    $string = 'Some mp3 at <a href="mp3.com">mp3.com</a> with mp3 stuff.'; $search = 'mp3'; $string =~ s#(<[^>]*>[^<]*)($search)|^([^<]*)($search)#$1$3<b>$2$4</b> +#g;

    This seems to work, as far as I can tell. I haven't really tested it with anything complex, but I'd say that chances are it would break in real-world use. So use HTML::Parser instead.

    -gryphon
    code('Perl') || die;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://97720]
Approved by root
help
Sections?
Information?
Find Nodes?
Leftovers?
    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.