http://www.perlmonks.org?node_id=507666


in reply to Parsing HTML tags with regex

Hi jithoosin,

Here is the regex to match the tag with attributes value.

m#<([^">]+(?:"[^"]+")*[^>]+)>#

Thanks,
Gopal.R

Replies are listed 'Best First'.
Re^2: Parsing HTML tags with regex
by Perl Mouse (Chaplain) on Nov 14, 2005 at 11:22 UTC
    But that would match on:
    a < b implies b > a
    which does not contain an HTML tag. Oh, and it won't match all HTML tags correctly either. Consider for instance:
    <tag attr1="one" attr2="two"> <tag attr='"'> <tag attr1='"'>
    The first one fails to match because your regex requires that if there are double quoted values inside a tag, they must follow each other. And the second fails because your regex doesn't consider single quoted values.
    Perl --((8:>*
Re^2: Parsing HTML tags with regex
by Anonymous Monk on Jan 19, 2012 at 09:11 UTC
    thanks gopal the above regex was usefull
Re^2: Parsing HTML tags with regex
by jithoosin (Scribe) on Nov 11, 2005 at 09:23 UTC
    Hi gopal,
    THANK YOU VERY MUCH. I won the bet .But now i am in bit of trouble. I donot know how to explain the working to my friends.So could you PLEASE explain the working of the regular expression.Once again THANK YOU VERY MUCH GOPAL.
      m# < ## start with < ( ## group start [^">]+ ## text but Not match " and > (?:"[^"]+")* ## if " found, match till end quote found. Its optional [^>]+ ## text but Not match and > ) ## group end > ## End with > #
        Hi gopal,
        There is a problem if the input string is my $line = "<select name=\"url><23\" style=\"width><A125px\"  >";
        .But i think a slight modification to your previous answer will do the job then #$line =~ m#<([^">]+(?:"[^"]+")*)*[^>]+>#; IS there any problem.Also provide me an asnwer for the question about <!--->--->thing