Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Re: Re: Re: Strip HTML tags again

by little (Curate)
on Jul 01, 2002 at 12:55 UTC ( #178532=note: print w/replies, xml ) Need Help??

in reply to Re: Re: Strip HTML tags again
in thread Strip HTML tags again

look up the POD (or your preferred docs) for HTML::Tagset
cite: "hashset %HTML::Tagset::isKnown
This hashset lists all known HTML elements."
So you've got to compare your match with that list ...

Have a nice day
All decision is left to your taste


Look through the previous suggestions as well. Try it at least and ask again if you get an error or get otherwise stuck. :-)

Replies are listed 'Best First'.
Re: Re: Re: Re: Strip HTML tags again
by dda (Friar) on Jul 01, 2002 at 13:01 UTC
    The problem is how to extract 'my match' from the regexp shown earlier (or other - please suggest one).. I know about that hashset, and what I need is to apply it to my sub.


      Hi ! I think this does what you want:
      use HTML::Tagset; my %tags = %HTML::Tagset::isKnown; my $tagpattern = "(".join('|',keys %tags).")"; print STDERR "$tagpattern\n"; while (<>) { print strip_html_tags($_); } sub strip_html_tags { my $line = shift; $line =~ s/<\s*$tagpattern(?:\s*>|\s+[^>]*>)([^<]*)<\s*\/\1[^>]*>/$2 +/ig; return $line; }
      I first create the string $tagpattern by putting a "|" between all known HTML tags and surrounding the whole thing with parantheses. This will give something like "(a|p|code.....)" and is used later in the subroutine to check for valid HTML tags.

      The regex looks a bit complicated and I am sure that it can be written much better, but I believe it is sufficient for your cause.

      Note that this will only work for tags that are on one line and could get you into trouble if there are < or > signs inside a tag (Don't know if this is possible in HTML).


      It would propably be a lot wiser to use Ovid's code then my homegrown regex.

      ---- kurt
        I really love your idea! Thanks!


      Did you look further than ides' suggestion? Did you try Ovid's suggestion?
      Have a nice day
      All decision is left to your taste
        Yes, Ovid's solution is fine, and it's rating proves it. But I wanted to hear other ideas too.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://178532]
LanX has to go and get his measels vaccination ...
[chacham]: Sauron Rap = Ancient song about an old evil that clings to life.
[LanX]: "Sauron VS Voldemort - Epic Rap Battle" on youtube? (Italian)
chacham thinks fizz-keeper cap gets me all pumped up
LanX brain hurts
[chacham]: Voldermort probably just lip synced. He always gets everyone else to do his dirty works.
[holli]: Einstein vs. Hawking is better. But ERB in general is hilarious.
[karlgoethebier]: Immigrant Song
LanX defects o/
[chacham]: the caption translatio doesn't help either

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (13)
As of 2017-12-13 18:26 GMT
Find Nodes?
    Voting Booth?
    What programming language do you hate the most?

    Results (373 votes). Check out past polls.