mading0 has asked for the wisdom of the Perl Monks concerning the following question:
Hi all,
I know that there are several cpan modules for XML parsing. I don't think my needs are so advanced that I need to spend much time on choosing one, I will probably use Simple, but I want to check first:
I am trying to parse an XML file for a certain element. If the value does not match my intended value, I want to delete that entire paragraph/hierarchy. I assume I need to parse to find what I want to keep, have perl extract it somewhere, then convert it back to XML. Does my logic sound right? If so, is XML::Simple the easiest tool for this?
Re: XML parsing
by choroba (Cardinal) on Oct 02, 2014 at 16:24 UTC
|
Have you read the documentation of XML::Simple? Especially the section "Status of this module" is informative.
| [reply] |
Re: XML parsing
by McA (Priest) on Oct 02, 2014 at 16:40 UTC
|
Hi,
search herein for XML::Simple and you will get a feeling that it is not recommended for new projects anymore. You will also find many hints to recommended modules on CPAN. (e.g. XML::Twig).
Regards
McA
| [reply] |
Re: XML parsing
by Discipulus (Canon) on Oct 02, 2014 at 16:43 UTC
|
| [reply] [d/l] |
Re: XML parsing
by Anonymous Monk on Oct 02, 2014 at 18:23 UTC
|
XML::Simple's name is misleading, it sounds like "a simple module for complete XML handling", but it's not - it's more like "simplistic XML handling for simple XML". I find it's great for reading simple XML config files that have been designed to work with XML::Simple, but it is not an all-purpose solution, and in your case is very likely not appropriate because, among several other things, it very often doesn't maintain an XML document's structure when reading a file and writing it back.
Anyway, I'm inclined to agree with the other monks' suggestions for XML::Twig, which is great when you want to process a file piece by piece. For the case you describe, if the file isn't so big that loading it into memory is too expensive, then XML::LibXML is fine too. For example, the following deletes the <bar> element if its child <quz>'s text content is "baz".
| [reply] [d/l] [select] |
|
Thank you, this is really helpful. I have two questions about the code.
First of all:
$el->textContent eq 'baz'
Is there a way to say find the content that ISN'T baz? Is there a neq, basically?
Secondly, is there any quick way through LibXML to print back into an XML file, or do I just need to use normal perl file output for that?
| [reply] [d/l] |
|
| [reply] [d/l] |
|
|
| [reply] [d/l] [select] |
Re: XML parsing
by Anonymous Monk on Oct 02, 2014 at 23:18 UTC
|
| [reply] |
Re: XML parsing
by jellisii2 (Hermit) on Oct 03, 2014 at 11:35 UTC
|
I haven't had to do anything with XML that I couldn't get done with XML::Twig. May $DEITY bless mirod. | [reply] |
|
Perhaps you can help me with Twig, then.
What I want to do is search within a parent node for a specific text content of an element. If it doesnt match the string, then delete the entire parent node. Here's an example:
<book>
<book1>Book1</book1>
<title>Title of Book 1</title>
<genre>Fantasy</genre>
</book>
What I'd like to do in this case is search within each <book> and see if the title matches "Title of Book 1". If it doesn't, I want that entire <book> deleted. Does this make sense?
| [reply] [d/l] |
|
use XML::Twig;
open my $ofh, '>', $output_filename or die $!;
XML::Twig->new(
twig_print_outside_roots => $ofh,
keep_spaces => 1,
twig_roots => {
book => sub {
my ($twig, $book) = @_;
if ($book->first_child_text('title') eq 'Title of Book 1') {
$book->flush($ofh);
}
else {
$book->purge;
}
return 1;
}
},
)->parsefile($input_filename);
close $ofh;
| [reply] [d/l] |
|
I would say that your example is probably over simplistic, so I've expanded on and cleaned it up slightly:
<library>
<book>
<book1>Book1</book1>
<title>Title of Book 1</title>
<genre>Fantasy</genre>
</book>
<book>
<book2>Book2</book2>
<title>Not the Title of Book 1</title>
<genre>Fantasy</genre>
</book>
</library>
Now that we have something that's a little bit easier to show off some stuff, here's an example that does exactly what you ask.
use strict;
use warnings;
use XML::Twig;
use Data::Dumper;
my $DATA = '
<library>
<book>
<book1>Book1</book1>
<title>Title of Book 1</title>
<genre>Fantasy</genre>
</book>
<book>
<book2>Book2</book2>
<title>Not the Title of Book 1</title>
<genre>Fantasy</genre>
</book>
</library>
';
my $source_twig = XML::Twig->new('pretty_print' => 'indented');
$source_twig->safe_parse($DATA);
foreach my $book ($source_twig->root->children('book')) {
if ($book->first_child('title')->text() ne 'Title of Book 1') {
$book->cut()
}
}
$source_twig->print();
And it's output:
<library>
<book>
<book1>Book1</book1>
<title>Title of Book 1</title>
<genre>Fantasy</genre>
</book>
</library>
| [reply] [d/l] [select] |
Re: XML parsing
by codiac (Beadle) on Oct 03, 2014 at 10:40 UTC
|
Vote 1 XML::TreeBuilder, I hear the maintainer will do just about any coding effort for a pint or 2 of cider! | [reply] |
|
|