Contributed by redbeard
on Dec 14, 2000 at 03:48 UTC
Q&A
> regular expressions
Description: I have an XML document with multiple repetitive fields all in a single string, as obtained from a web service using LWP::Simple::get.
I would like to parse out those multiple repetitive values (e.g. <email> and <name>) and put them in their own arrays (e.g. @email and @name).
Example document content:
$xml = <<EOF;
<xml>
<email>toto@foo.com</email><name>Toto</name>
<email>tata@bar.com</email><name>Tata</name>
<email>tutu@baz.com</email><name>Tutu</name>
</xml>
EOF
Answer: Strip that XML! contributed by mirod Using XML::Twig:
use XML::Twig;
my $xml = <<EOF;
<xml>
<email>toto@foo.com</email><name>Toto</name>
<email>tata@bar.com</email><name>Tata</name>
<email>tutu@baz.com</email><name>Tutu</name>
</xml>
EOF
my @email;
my @name;
my $twig = new XML::Twig(
TwigHandlers => {
email => sub { push @email, $_[1]->text; }, # $_[1] is the element
+
name => sub { push @name , $_[1]->text; }
}
);
$twig->parse( $xml );
print "email: @email\n";
print "name: @name\n";
| Answer: Strip that XML! contributed by mirod Using XML::Parser (basic mode, no style):
#!/bin/perl -w
use strict;
use XML::Parser;
my @email;
my @name;
my $stored_content = ''; # global used to store text
sub start {
my( $expat, $gi, %atts ) = @_;
$stored_content=''; # reset
}
sub char {
my( $expat, $string ) = @_;
$stored_content .= $string; # accumulate
}
sub end {
my( $expat, $gi ) = @_;
# now we can do some "real" processing with the element content
push @name, $stored_content if( $gi eq 'name');
push @email, $stored_content if( $gi eq 'email');
$stored_content = ''; # reset
}
# create the parser
my $parser = new XML::Parser(
Handlers => {
Start => \&start, # called for each start tag
Char => \&char, # called for all text (including \n between tags
+)
End => \&end # called for each end tag
}
);
my $xml = <<EOF;
<xml>
<email>toto@foo.com</email><name>Toto</name>
<email>tata@bar.com</email><name>Tata</name>
<email>tutu@baz.com</email><name>Tutu</name>
</xml>
EOF
$parser->parse( $xml );
print "email: @email\n";
print "name: @name\n";
| Answer: Strip that XML! contributed by ramrod Here's another easy method:
$_ = <<EOF;
<xml>
<email>toto@foo.com</email><name>Toto</name>
<email>tata@bar.com</email><name>Tata</name>
<email>tutu@baz.com</email><name>Tutu</name>
</xml>
EOF
my @email = /<email>(.*?)<\/email>/g;
my @name = /<name>(.*?)<\/name>/g;
(Note that it leaves XML entities undecoded, as noted in the reply.)
|
Please (register and) log in if you wish to add an answer
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Outside of code tags, you may need to use entities for some characters:
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
|
|