redbeard has asked for the wisdom of the Perl Monks concerning the following question: (regular expressions)

I have an XML document with multiple repetitive fields all in a single string, as obtained from a web service using LWP::Simple::get.

I would like to parse out those multiple repetitive values (e.g. <email> and <name>) and put them in their own arrays (e.g. @email and @name).

Example document content:

$xml = <<EOF; <xml> <email></email><name>Toto</name> <email></email><name>Tata</name> <email></email><name>Tutu</name> </xml> EOF

Originally posted as a Categorized Question.

Replies are listed 'Best First'.
Re: Strip that XML!
by mirod (Canon) on Dec 14, 2000 at 04:46 UTC

    Using XML::Twig:

    use XML::Twig; my $xml = <<EOF; <xml> <email></email><name>Toto</name> <email></email><name>Tata</name> <email></email><name>Tutu</name> </xml> EOF my @email; my @name; my $twig = new XML::Twig( TwigHandlers => { email => sub { push @email, $_[1]->text; }, # $_[1] is the element + name => sub { push @name , $_[1]->text; } } ); $twig->parse( $xml ); print "email: @email\n"; print "name: @name\n";
Re: Strip that XML!
by mirod (Canon) on Dec 14, 2000 at 12:00 UTC

    Using XML::Parser (basic mode, no style):

    #!/bin/perl -w use strict; use XML::Parser; my @email; my @name; my $stored_content = ''; # global used to store text sub start { my( $expat, $gi, %atts ) = @_; $stored_content=''; # reset } sub char { my( $expat, $string ) = @_; $stored_content .= $string; # accumulate } sub end { my( $expat, $gi ) = @_; # now we can do some "real" processing with the element content push @name, $stored_content if( $gi eq 'name'); push @email, $stored_content if( $gi eq 'email'); $stored_content = ''; # reset } # create the parser my $parser = new XML::Parser( Handlers => { Start => \&start, # called for each start tag Char => \&char, # called for all text (including \n between tags +) End => \&end # called for each end tag } ); my $xml = <<EOF; <xml> <email></email><name>Toto</name> <email></email><name>Tata</name> <email></email><name>Tutu</name> </xml> EOF $parser->parse( $xml ); print "email: @email\n"; print "name: @name\n";
Re: Strip that XML!
by ramrod (Curate) on Feb 20, 2009 at 20:38 UTC
    Here's another easy method:
    $_ = <<EOF; <xml> <email></email><name>Toto</name> <email></email><name>Tata</name> <email></email><name>Tutu</name> </xml> EOF my @email = /<email>(.*?)<\/email>/g; my @name = /<name>(.*?)<\/name>/g;
    (Note that it leaves XML entities undecoded, as noted in the reply.)
      That doesn't decode entities (its broken).