Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Re: Re: Re: Strip HTML tags again

by little (Curate)
on Jul 01, 2002 at 12:55 UTC ( #178532=note: print w/replies, xml ) Need Help??

in reply to Re: Re: Strip HTML tags again
in thread Strip HTML tags again

look up the POD (or your preferred docs) for HTML::Tagset
cite: "hashset %HTML::Tagset::isKnown
This hashset lists all known HTML elements."
So you've got to compare your match with that list ...

Have a nice day
All decision is left to your taste


Look through the previous suggestions as well. Try it at least and ask again if you get an error or get otherwise stuck. :-)

Replies are listed 'Best First'.
Re: Re: Re: Re: Strip HTML tags again
by dda (Friar) on Jul 01, 2002 at 13:01 UTC
    The problem is how to extract 'my match' from the regexp shown earlier (or other - please suggest one).. I know about that hashset, and what I need is to apply it to my sub.


      Hi ! I think this does what you want:
      use HTML::Tagset; my %tags = %HTML::Tagset::isKnown; my $tagpattern = "(".join('|',keys %tags).")"; print STDERR "$tagpattern\n"; while (<>) { print strip_html_tags($_); } sub strip_html_tags { my $line = shift; $line =~ s/<\s*$tagpattern(?:\s*>|\s+[^>]*>)([^<]*)<\s*\/\1[^>]*>/$2 +/ig; return $line; }
      I first create the string $tagpattern by putting a "|" between all known HTML tags and surrounding the whole thing with parantheses. This will give something like "(a|p|code.....)" and is used later in the subroutine to check for valid HTML tags.

      The regex looks a bit complicated and I am sure that it can be written much better, but I believe it is sufficient for your cause.

      Note that this will only work for tags that are on one line and could get you into trouble if there are < or > signs inside a tag (Don't know if this is possible in HTML).


      It would propably be a lot wiser to use Ovid's code then my homegrown regex.

      ---- kurt
        I really love your idea! Thanks!


      Did you look further than ides' suggestion? Did you try Ovid's suggestion?
      Have a nice day
      All decision is left to your taste
        Yes, Ovid's solution is fine, and it's rating proves it. But I wanted to hear other ideas too.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://178532]
[pra]: Perl/Tk. I want to input a file that will contain color names, but users might not know what names are valid

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (8)
As of 2017-10-17 11:45 GMT
Find Nodes?
    Voting Booth?
    My fridge is mostly full of:

    Results (226 votes). Check out past polls.