Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Answer: How can use Perl to strip away some nested HTML markup code, like <SCRIPT> ?

( #299416=categorized answer: print w/ replies, xml ) Need Help??

Q&A > regular expressions > How can use Perl to strip away some nested HTML markup code, like <SCRIPT> ? contributed by Pedro Picasso

Let's say you have some html like this:
<b>I like</b> <i>squirrels!</i>.
You could use this:
$html =~ s/<[^>]*>([^<]*)<\/[^>]*>/$1/gs;
To turn it into this:
I like squirrels.
{QandAEditors note: merlyn points out by way of followup that the above regexp only works for simple HTML, and that in real life HTML, the regexp can't be counted upon to not fail. See the followup for details. }

Comment on Answer: How can use Perl to strip away some nested HTML markup code, like <SCRIPT> ?
Select or Download Code
•Re: Answer: How can use Perl to strip away some nested HTML markup code, like <SCRIPT> ?
by merlyn (Sage) on Oct 15, 2003 at 15:02 UTC
    Sure, that works for simple HTML, but real life HTML can fail on such a simple regex. For example:
    <!-- > this is still the comment --> and some more text
    In that case, "this is still the comment" would be left within the output, when it shouldn't be.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

Log In?
Username:
Password:

What's my password?
Create A New User
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2014-07-29 05:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (211 votes), past polls