Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

XML parsing

by mading0 (Initiate)
on Oct 02, 2014 at 16:00 UTC ( #1102642=perlquestion: print w/replies, xml ) Need Help??

mading0 has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I know that there are several cpan modules for XML parsing. I don't think my needs are so advanced that I need to spend much time on choosing one, I will probably use Simple, but I want to check first: I am trying to parse an XML file for a certain element. If the value does not match my intended value, I want to delete that entire paragraph/hierarchy. I assume I need to parse to find what I want to keep, have perl extract it somewhere, then convert it back to XML. Does my logic sound right? If so, is XML::Simple the easiest tool for this?

Replies are listed 'Best First'.
Re: XML parsing
by choroba (Cardinal) on Oct 02, 2014 at 16:24 UTC
    Have you read the documentation of XML::Simple? Especially the section "Status of this module" is informative.
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: XML parsing
by McA (Priest) on Oct 02, 2014 at 16:40 UTC

    Hi,

    search herein for XML::Simple and you will get a feeling that it is not recommended for new projects anymore. You will also find many hints to recommended modules on CPAN. (e.g. XML::Twig).

    Regards
    McA

Re: XML parsing
by Discipulus (Canon) on Oct 02, 2014 at 16:43 UTC
Re: XML parsing
by Anonymous Monk on Oct 02, 2014 at 18:23 UTC

    XML::Simple's name is misleading, it sounds like "a simple module for complete XML handling", but it's not - it's more like "simplistic XML handling for simple XML". I find it's great for reading simple XML config files that have been designed to work with XML::Simple, but it is not an all-purpose solution, and in your case is very likely not appropriate because, among several other things, it very often doesn't maintain an XML document's structure when reading a file and writing it back.

    Anyway, I'm inclined to agree with the other monks' suggestions for XML::Twig, which is great when you want to process a file piece by piece. For the case you describe, if the file isn't so big that loading it into memory is too expensive, then XML::LibXML is fine too. For example, the following deletes the <bar> element if its child <quz>'s text content is "baz".

      Thank you, this is really helpful. I have two questions about the code. First of all:  $el->textContent eq 'baz' Is there a way to say find the content that ISN'T baz? Is there a neq, basically? Secondly, is there any quick way through LibXML to print back into an XML file, or do I just need to use normal perl file output for that?
        Secondly, is there any quick way through LibXML to print back into an XML file, or do I just need to use normal perl file output for that?

        XML::LibXML::Document has the methods toFile and toFH, or you can use toString and print it to a file yourself.

Re: XML parsing
by Anonymous Monk on Oct 02, 2014 at 23:18 UTC
Re: XML parsing
by jellisii2 (Hermit) on Oct 03, 2014 at 11:35 UTC
    I haven't had to do anything with XML that I couldn't get done with XML::Twig. May $DEITY bless mirod.
      Perhaps you can help me with Twig, then. What I want to do is search within a parent node for a specific text content of an element. If it doesnt match the string, then delete the entire parent node. Here's an example:
      <book> <book1>Book1</book1> <title>Title of Book 1</title> <genre>Fantasy</genre> </book>
      What I'd like to do in this case is search within each <book> and see if the title matches "Title of Book 1". If it doesn't, I want that entire <book> deleted. Does this make sense?

        See the section "Building an XML filter" of XML::Twig. Here's a quick implementation:

        use XML::Twig; open my $ofh, '>', $output_filename or die $!; XML::Twig->new( twig_print_outside_roots => $ofh, keep_spaces => 1, twig_roots => { book => sub { my ($twig, $book) = @_; if ($book->first_child_text('title') eq 'Title of Book 1') { $book->flush($ofh); } else { $book->purge; } return 1; } }, )->parsefile($input_filename); close $ofh;
        I would say that your example is probably over simplistic, so I've expanded on and cleaned it up slightly:
        <library> <book> <book1>Book1</book1> <title>Title of Book 1</title> <genre>Fantasy</genre> </book> <book> <book2>Book2</book2> <title>Not the Title of Book 1</title> <genre>Fantasy</genre> </book> </library>

        Now that we have something that's a little bit easier to show off some stuff, here's an example that does exactly what you ask.
        use strict; use warnings; use XML::Twig; use Data::Dumper; my $DATA = ' <library> <book> <book1>Book1</book1> <title>Title of Book 1</title> <genre>Fantasy</genre> </book> <book> <book2>Book2</book2> <title>Not the Title of Book 1</title> <genre>Fantasy</genre> </book> </library> '; my $source_twig = XML::Twig->new('pretty_print' => 'indented'); $source_twig->safe_parse($DATA); foreach my $book ($source_twig->root->children('book')) { if ($book->first_child('title')->text() ne 'Title of Book 1') { $book->cut() } } $source_twig->print();

        And it's output:
        <library> <book> <book1>Book1</book1> <title>Title of Book 1</title> <genre>Fantasy</genre> </book> </library>
Re: XML parsing
by codiac (Beadle) on Oct 03, 2014 at 10:40 UTC
    Vote 1 XML::TreeBuilder, I hear the maintainer will do just about any coding effort for a pint or 2 of cider!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1102642]
Approved by Athanasius
Front-paged by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2023-12-08 15:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What's your preferred 'use VERSION' for new CPAN modules in 2023?











    Results (36 votes). Check out past polls.

    Notices?