Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: Strip HTML tags

by davorg (Chancellor)
on Dec 15, 2000 at 13:49 UTC ( [id://46816]=note: print w/replies, xml ) Need Help??


in reply to Strip HTML tags

As discussed on many threads recently, this kind of work is really better left to the professionals - which in this case is HTML::Parser and its subclasses.

Any regex-based solution is bound to break at some point as your HTML gets more complex.

--
<http://www.dave.org.uk>

"Perl makes the fun jobs fun
and the boring jobs bearable" - me

Replies are listed 'Best First'.
Re: Re: Strip HTML tags
by rlk (Pilgrim) on Dec 15, 2000 at 23:03 UTC
    Well, I mentioned that this'd break on bad HTML. Specifically, it assumes all tags have an even number of "outer" quotation marks(1) in them....

    In my defense, the complexity of the HTML is not at issue here, because I'm not really parsing it, I'm simply stripping all tags, which is a much simpler problem.

    (1)I hadn't realized that foo='"' was a legal attribute-value pair originally. This has since been fixed.

    --
    Ryan Koppenhaver, Aspiring Perl Hacker
    "I ask for so little. Just fear me, love me, do as I say and I will be your slave."

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://46816]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2024-03-29 12:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found