Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: getting and printing form values etc from html stripping out all else

by ww (Archbishop)
on Feb 25, 2010 at 02:13 UTC ( [id://825228]=note: print w/replies, xml ) Need Help??


in reply to getting and printing form values etc from html stripping out all else

For your first requirement, a regex is probably safe and effective, since (unless I'm having a Sr. moment) the html 4.x standard does not allow an image tag with a literal ">" inside the tag.

One way to approach the job, therefore, is to extend your regex with less-greedy (aka "minimally greedy") matching and a lookahead. Here's a sketch, minus file-handling, CGI, etc:

#!/usr/bin/perl use strict; use warnings; #825146 my @line = <DATA>; for my $line(@line) { chomp $line; if ( $line =~ m/(<img .*?[^>]+)/ ){ print "<g:image_link> " . $1 . "> </g:image_link>\n"; } else { print "\t nope: $line \n"; # you may want to send this to a di +fferent file } } __DATA__ <p><img src="http://www.mysite/graphics/blue.jpg" alt="Hey" width="100 +" height="100" ><br>yada yada</p> <p><img src="../grapics/blue1.gif" alt="Yo" width="200" height="75"></ +p> <p>foobar with no img</p> <blockquote><img width="75" height="75" src="blue2.png"></blockquote>

Output:

<g:image_link> <img src="http://www.mysite/graphics/blue.jpg" alt="Hey +" width="100" height="100" > </g:image_link> <g:image_link> <img src="../grapics/blue1.gif" alt="Yo" width="200" he +ight="75"> </g:image_link> nope: <p>foobar with no img</p> <g:image_link> <img width="75" height="75" src="blue2.png"> </g:image_ +link>

BUT take the advice from pemungkah above: Use a parser! Trying to deal with all the possible unwanted tags in a form with regexen is going to get you deeper and deeper into complexities.

And if you're planning to read user input from a form, for heaven's sake, read about untainting. You really don't want to let the fumble-fingered or malicious run around loose in your playground.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://825228]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (6)
As of 2024-03-28 23:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found