Re: Pasring XML into a simple hash
by Kanji (Parson) on Jun 21, 2001 at 09:38 UTC
|
I have a "XML like" file <...>
If you don't want to write a custom parser from scratch or within the framework something like Parse::RecDescent provides, you'll need to convert your file into valid XML before you could use XML::Parser or XML::Simple.
But if you do, doing what you need is a cinch ...
use XML::Simple;
my $xml = XMLin(<<__XML__);
<posts>
<post>
<jobnumber>1234</jobnumber>
<location>Somecity, NJ</location>
</post>
<post>
<jobnumber>87922</jobnumber>
<location>Othercity, AK</location>
</post>
</posts>
__XML__
foreach my $post ( @{ $xml->{'post'} } ) {
print "City: $post->{'location'}\n";
}
--k.
| [reply] [Watch: Dir/Any] [d/l] |
Re: Parsing XML into a simple hash
by bikeNomad (Priest) on Jun 21, 2001 at 10:03 UTC
|
Or you could force it into a single element by attaching tags to both ends: #!/usr/bin/perl -w
use strict;
use XML::Parser;
my %hash;
my $depth = 0;
my @tags;
sub start
{
my ($expat, $element) = @_;
push(@tags, $element);
$hash{$tags[-1]} = '';
}
sub end
{
pop(@tags);
if (@tags == 1)
{
delete $hash{posts};
delete $hash{post};
# now you have hash.
print "Job: $hash{jobnumber}\n";
print "City: $hash{location}\n";
%hash = ();
}
}
sub char
{
my ($expat, $string) = @_;
$hash{$tags[-1]} .= $string;
}
my $text = <<'EOF';
<post>
<jobnumber>1234</jobnumber>
<location> somecity NJ </location>
</post>
<post>
<jobnumber>87922</jobnumber>
<location> Othercity, AK </location>
</post>
EOF
my $p1 = new XML::Parser(Handlers => { Start => \&start,
End => \&end,
Char => \&char });
$p1->parse("<posts>$text</posts>");
| [reply] [Watch: Dir/Any] [d/l] |
Re: Parsing XML into a simple hash
by mirod (Canon) on Jun 21, 2001 at 16:00 UTC
|
And here is the ObXTW (the Obligatory XML::Twig Way), once you've fixed your XML by wrapping everything into a single element:
#!/bin/perl -w
use strict;
use XML::Twig;
my $t= new XML::Twig( twig_handlers => { post => \&post });
$t->parse( \*DATA);
sub post
{ my( $t, $post)= @_; # all handlers get called with those arguments
# here is the magic!
# gi is the element name and text is its... text!
my %hash= map { $_->gi, $_->text} $post->children;
# or whatever you want to do with the hash
print "City: $hash{location} \n";
# if your file is small enough you don't need to purge, otherwise
# it will free the memory used so far
$t->purge;
}
__DATA__
<posts>
<post>
<jobnumber>1234</jobnumber>
<location> somecity NJ </location>
</post>
<post>
<jobnumber>87922</jobnumber>
<location> Othercity, AK </location>
</post>
</posts>
| [reply] [Watch: Dir/Any] [d/l] |
Re: Pasring XML into a simple hash
by strredwolf (Chaplain) on Jun 21, 2001 at 10:45 UTC
|
If you look at my chatterbox, you'll find a SGML pharser (read: precursor to XML, will work here). It'll split on those tags, so you can plop those locations into seperate posts (say, inside a @posts array).
I gotta put it into a module...
--
$Stalag99{"URL"}="http://stalag99.keenspace.com";
| [reply] [Watch: Dir/Any] |
Re: Parsing XML into a simple hash
by mattr (Curate) on Jun 21, 2001 at 11:15 UTC
|
If you don't have strict XML (i.e. no ending /location) tag
why not just use regular
expressions? This seems to work..
#!/usr/bin/perl
use strict;
open (IN,"testxml.dat");
my @buf = <IN>;
close IN;
for (my $i=0; $i<=$#buf; $i++) {
if ($buf[$i] =~ s/^\s*<jobnumber>(.*)<\/jobnumber>\s*$/$1/) {
$buf[$i+1] =~ s/^\s*<location>\s*(.*)\s<location>\s*$/$1/;
# if your tags are really like this
&process($buf[$i],$buf[$i+1]);
}
}
sub process {
my ($jobnumber,$location) = @_;
print "Found a job $jobnumber in $location.\n"; # do something
}
On a related note, I tried to lose the spaces inside the
location tags and couldn't get this kind of regex to work, anyone?
$buf[$i+1] =~ s/^\s*<location>\s*(.?)\s*<location>\s*$/$1/;
# \s?(.*)\s works though..
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Your solution does not work. Initially, it will appear to work against his data set, but XML start and end tags don't have to appear on the same line. If that happens, your regex will break because the dot metacharacter doens't match the newline. Adding the /s modifier allows the dot to match, but then, because your match is greedy, it still breaks:
#!/usr/bin/perl
use strict;
my @buf = <DATA>;
for my $i ( 0 .. $#buf ) {
if ($buf[$i] =~ s/^\s*<jobnumber>(.*)<\/jobnumber>\s*$/$1/s) {
$buf[$i+1] =~ s/^\s*<location>\s*(.*)\s<location>\s*$/$1/s;
# if your tags are really like this
&process($buf[$i],$buf[$i+1]);
}
}
sub process {
my ($jobnumber,$location) = @_;
print "Found a job $jobnumber in $location.\n"; # do something
}
__DATA__
<posts>
<post>
<jobnumber>
1234
</jobnumber>
<location>Somecity, NJ</location>
</post>
<post>
<jobnumber>87922</jobnumber>
<location>Othercity, AK</location>
</post>
</posts>
See Death to Dot Star! for the explanation of why your regex fails (and for some excellent examples of how I have screwed up regexes on delimited text).
Use a parser for data like this. Regexes, while I love them, are for matching data, not parsing it.
As for your 'related note', it doesn't work because you have (.?) in your code. The dot/question mark makes you match one character and have that match optional. It's equivalent to (.{0,1}).
Cheers,
Ovid
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats. | [reply] [Watch: Dir/Any] [d/l] |
|
Thanks Ovid, you're right and I'll reread
that article!
Matt
| [reply] [Watch: Dir/Any] |
andye Re: Pasring XML into a simple hash
by andye (Curate) on Jun 21, 2001 at 15:47 UTC
|
For a quick-and-dirty regexp solution, how about this...
while ($text =~ m|<jobnumber>(.*?)</jobnumber>.*?<location>(.*?)</loc
+ation>|sg) {
print "Found job number $1 in location $2 \n";
}
NB I'm assuming the missing slashes in the data are a typo. If not then it's easy enough to modify the above.
Of course, a regexp solution isn't the right one if you want it to work in more general cases - like if the tags are the other way round, or whatever. If there's going to be any variation in the data, then the way to go is an XML parser as described by others above.
andy.
Some Time Later: Just For Fun, I tried to see if I could write one that /would/ work with the tags either way round... came up with this...
my $regexp = '<post>';
$regexp .= "(?=.*?<$_>(.*?)</$_>)" foreach qw(jobnumber location);
$regexp .= '.*?</post>';
while ($stuff =~ m|$regexp|sog) {
print "Found job number $1 in location $2 \n";
}
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Pasring XML into a simple hash
by Desdinova (Friar) on Jun 22, 2001 at 07:28 UTC
|
Thanks for all the help everyone. I ended up going with XML::Twig beacuse when i looked at it just clicked in my brain. As a side note the lack of the closing location tag was a typo that I'didnt notice until everyone came up with how to fix that as well as do what I wanted. Thanks for all the help. | [reply] [Watch: Dir/Any] |