Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Strip that XML!

( #46517=categorized question: print w/ replies, xml ) Need Help??
Contributed by redbeard on Dec 14, 2000 at 03:48 UTC
Q&A  > regular expressions


Description:

I have an XML document with multiple repetitive fields all in a single string, as obtained from a web service using LWP::Simple::get.

I would like to parse out those multiple repetitive values (e.g. <email> and <name>) and put them in their own arrays (e.g. @email and @name).

Example document content:

$xml = <<EOF; <xml> <email>toto@foo.com</email><name>Toto</name> <email>tata@bar.com</email><name>Tata</name> <email>tutu@baz.com</email><name>Tutu</name> </xml> EOF

Answer: Strip that XML!
contributed by mirod

Using XML::Twig:

use XML::Twig; my $xml = <<EOF; <xml> <email>toto@foo.com</email><name>Toto</name> <email>tata@bar.com</email><name>Tata</name> <email>tutu@baz.com</email><name>Tutu</name> </xml> EOF my @email; my @name; my $twig = new XML::Twig( TwigHandlers => { email => sub { push @email, $_[1]->text; }, # $_[1] is the element + name => sub { push @name , $_[1]->text; } } ); $twig->parse( $xml ); print "email: @email\n"; print "name: @name\n";
Answer: Strip that XML!
contributed by mirod

Using XML::Parser (basic mode, no style):

#!/bin/perl -w use strict; use XML::Parser; my @email; my @name; my $stored_content = ''; # global used to store text sub start { my( $expat, $gi, %atts ) = @_; $stored_content=''; # reset } sub char { my( $expat, $string ) = @_; $stored_content .= $string; # accumulate } sub end { my( $expat, $gi ) = @_; # now we can do some "real" processing with the element content push @name, $stored_content if( $gi eq 'name'); push @email, $stored_content if( $gi eq 'email'); $stored_content = ''; # reset } # create the parser my $parser = new XML::Parser( Handlers => { Start => \&start, # called for each start tag Char => \&char, # called for all text (including \n between tags +) End => \&end # called for each end tag } ); my $xml = <<EOF; <xml> <email>toto@foo.com</email><name>Toto</name> <email>tata@bar.com</email><name>Tata</name> <email>tutu@baz.com</email><name>Tutu</name> </xml> EOF $parser->parse( $xml ); print "email: @email\n"; print "name: @name\n";
Answer: Strip that XML!
contributed by ramrod

Here's another easy method:

$_ = <<EOF; <xml> <email>toto@foo.com</email><name>Toto</name> <email>tata@bar.com</email><name>Tata</name> <email>tutu@baz.com</email><name>Tutu</name> </xml> EOF my @email = /<email>(.*?)<\/email>/g; my @name = /<name>(.*?)<\/name>/g;
(Note that it leaves XML entities undecoded, as noted in the reply.)

Please (register and) log in if you wish to add an answer



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others rifling through the Monastery: (7)
    As of 2014-10-25 00:37 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      For retirement, I am banking on:










      Results (138 votes), past polls