Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

regex question

by Samn (Monk)
on Aug 01, 2002 at 02:48 UTC ( #186662=perlquestion: print w/ replies, xml ) Need Help??
Samn has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to strip HTML image tags with a regex, the image tags will necessarily not have alt tags, size or double quotes. The code I'm using is $body =~ s/\<img src=(.*)\>/\[image:\<a href=$1\>$1\<\/a\>\]/gi;
The goal is to change <img src=http://fuzzy.com/kittens.jpg>
which would display as a graphic to [image: <a href=http://fuzzy.com/kittens.jpg>http://fuzzy.com/kittens.jpg</a>]
which displays as text with a hyperlink. My regex is not working if there are two images in a string, however. I'm not exactly sure why, but I suspect it's encapsulating the first opening image tag and the last closing image tag bits. Any suggestions?

Comment on regex question
Select or Download Code
Re: regex question
by Samn (Monk) on Aug 01, 2002 at 02:50 UTC
    Should have used code - The regex should read $body =~ s/\<img src=(.*)\>/\[image:\<a href=$1\>$1\<\/a\>\]/gi;
Re: regex question
by krusty (Hermit) on Aug 01, 2002 at 03:15 UTC
    $body =~ s/<img src=(.*?)>/[image:<a href=$1>$1</a>]/gi;
    Sounds like this might be what you're looking for.

    Cheers,
    kln
Re: regex question
by Zaxo (Archbishop) on Aug 01, 2002 at 03:32 UTC

    use HTML::Parser;

    It handles maniacal markup you'll never think of in your homerolled regexen

    Update: ++mkmcconn suggested I add HTML::TokeParser to the recommendation, and I agree (I knew I was forgetting a good one)

    After Compline,
    Zaxo

Re: regex question
by Abigail-II (Bishop) on Aug 01, 2002 at 09:44 UTC
    You already identified one of the problems (and solutions have been suggested for that), but let me point out that your regex won't work either if there's whitespace between "src" and "=".

    BTW, HTML doesn't have alt tags. HTML has alt attributes - which have been mandatory for IMG tags for quite some time.

    Abigail

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://186662]
Approved by BrowserUk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (10)
As of 2014-12-26 14:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (171 votes), past polls