Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

How to find XML tags using regular expression

by satishchandra (Initiate)
on Feb 17, 2011 at 01:37 UTC ( #888611=perlquestion: print w/ replies, xml ) Need Help??
satishchandra has asked for the wisdom of the Perl Monks concerning the following question:

Hi, suppose this is my text <text><Find>abcdefghh</text><find> I want to match all the tags and put it in an array ,how can i do it with regular expression ?

Comment on How to find XML tags using regular expression
Re: How to find XML tags using regular expression
by Anonymous Monk on Feb 17, 2011 at 01:45 UTC
Re: How to find XML tags using regular expression
by ikegami (Pope) on Feb 17, 2011 at 02:46 UTC

    Tags? Are you using the right terminology? You asked for the following output:

    my @tags = ( '<text>', '<Find>', '</text>', '<find>', );

    I'm guessing that's not what you want, but I'd rather not guess what you want wrong, so I'll wait for clarification.

    PS — That's not valid XML.

      hi, you are correct i want all my tags to be in one array so how can i do that with pattern matching?
Re: How to find XML tags using regular expression
by sundialsvc4 (Monsignor) on Feb 17, 2011 at 03:16 UTC

    To echo what others are saying ... this is clearly an XML file, so you want to use an XML package.   There are several of these.   There are “simple parsers,” and there are also much more sophisticated ones ... including packages that let you write (industry standard) “XPath expressions” that can search an entire XML structure for you, with no special coding (on your part) at all.   “TMTOWTDI™” ... and the choice is yours.

      this is clearly an XML file
      It is???
Re: How to find XML tags using regular expression
by toolic (Chancellor) on Feb 17, 2011 at 03:32 UTC
Re: How to find XML tags using regular expression
by bart (Canon) on Feb 17, 2011 at 12:02 UTC
    That is not even remotely like valid XML, but anyway...

    You can try this (ignoring CDATA sections, but including other weird stuff, like the XML declaration):

    @tags = grep defined, $xml =~ /<!--.*?-->|(<(?>[^"'>]+|'[^']*'|"[^"]*" +)*>)/sg;
    And of course, you'll have some cleaning up to do now, because the output is very coarse.

    As a first step into parsing this content, you can use the above regexp in split:

    @tokens = split /<!--.*?-->|(<(?>[^"'>]+|'[^']*'|"[^"]*")*>)/s, $xml;
    which will return a list of text and tags. Comments will be thrown away.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://888611]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (12)
As of 2014-09-01 12:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (7 votes), past polls