Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Answer: How can use Perl to strip away some nested HTML markup code, like <SCRIPT> ?

( #299416=categorized answer: print w/ replies, xml ) Need Help??

Q&A > regular expressions > How can use Perl to strip away some nested HTML markup code, like <SCRIPT> ? contributed by Pedro Picasso

Let's say you have some html like this:
<b>I like</b> <i>squirrels!</i>.
You could use this:
$html =~ s/<[^>]*>([^<]*)<\/[^>]*>/$1/gs;
To turn it into this:
I like squirrels.
{QandAEditors note: merlyn points out by way of followup that the above regexp only works for simple HTML, and that in real life HTML, the regexp can't be counted upon to not fail. See the followup for details. }

Comment on Answer: How can use Perl to strip away some nested HTML markup code, like <SCRIPT> ?
Select or Download Code
•Re: Answer: How can use Perl to strip away some nested HTML markup code, like <SCRIPT> ?
by merlyn (Sage) on Oct 15, 2003 at 15:02 UTC
    Sure, that works for simple HTML, but real life HTML can fail on such a simple regex. For example:
    <!-- > this is still the comment --> and some more text
    In that case, "this is still the comment" would be left within the output, when it shouldn't be.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

Log In?
Username:
Password:

What's my password?
Create A New User
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (12)
As of 2014-10-30 16:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (208 votes), past polls