http://www.perlmonks.org?node_id=998674

Roboz has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perl Monks! I have an XML file similar to the following:

<booklist> <book> <author>Book 1 author 1</author> <author>Book 1 author 2</author> <title>Book 1 title</title> <isbn>Book1ISBN</isbn> </book> <book> <author>Book 2 author 1</author> <author>Book 2 author 2</author> <title>Book 2 title</title> <isbn>Book2ISBN</isbn> </book> <book> <author>Book 3 author 1</author> <author>Book 3 author 2</author> <author>Book 3 author 3</author> <title>Book 3 title</title> <isbn>Book3ISBN</isbn> </book> </booklist>

I would like to loop through this file and output each book's XML code including the opening and closing book tags. I've tried parsing the XML file with XML::Simple using XMLin and that gives me a data structure like so:

$VAR1 = { 'book' => [ { 'isbn' => 'Book1ISBN', 'title' => 'Book 1 title', 'author' => [ 'Book 1 author 1', 'Book 1 author 2' ] }, { 'isbn' => 'Book2ISBN', 'title' => 'Book 2 title', 'author' => [ 'Book 2 author 1', 'Book 2 author 2' ] }, { 'isbn' => 'Book3ISBN', 'title' => 'Book 3 title', 'author' => [ 'Book 3 author 1', 'Book 3 author 2', 'Book 3 author 3' ] } ] };

I'm really at a loss as how I would loop throught this to output the XML. I think the data structure is confusing me. I need the XML chunks to be identical to the input. I'm assuming I would use XMLout. Or maybe something completely different? Thanks in advance for any pointers!

Replies are listed 'Best First'.
Re: XML::Simple XML / XMLin / XMLout? or something else?
by tobyink (Canon) on Oct 12, 2012 at 13:43 UTC

    XML::Simple is almost certainly not what you want to be using. Here's an example using XML::LibXML...

    use XML::LibXML 1.70; my $xml = XML::LibXML->load_xml(IO => \*DATA); foreach my $book ($xml->getElementsByTagName('book')) { print "GOT THIS: " . $book->toString . "\n"; } __DATA__ <booklist> <book> <author>Book 1 author 1</author> <author>Book 1 author 2</author> <title>Book 1 title</title> <isbn>Book1ISBN</isbn> </book> <book> <author>Book 2 author 1</author> <author>Book 2 author 2</author> <title>Book 2 title</title> <isbn>Book2ISBN</isbn> </book> <book> <author>Book 3 author 1</author> <author>Book 3 author 2</author> <author>Book 3 author 3</author> <title>Book 3 title</title> <isbn>Book3ISBN</isbn> </book> </booklist>
    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
      Or, using the XML::XSH2 wrapper of XML::LibXML:
      open 1.xml ; for /booklist/book ls . ;
      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

        Wow. That makes it simple! Thanks

      This looks like a good method to me! I plan on spitting out each <book></book> to SOAP::Lite to pass to a web service and, of course, it expects each XML chunk to have the original format... Naturally the XML I'm dealing with is much more complex but this gets me where I want to go. Thanks.

Re: XML::Simple XML / XMLin / XMLout? or something else?
by greengaroo (Hermit) on Oct 12, 2012 at 13:32 UTC

    This structure is a HashRef that contains ArrayRef. When you see { } its a HashRef, when you see [ ] its an ArrayRef. To loop through them, you have to dereference them.

    Lets say your reference variable is called $VAR1 (obviously this name comes from Data::Dumper so use the real variable). You can do this:

    # Bellow, you have two dereference # The arrow dereferences the HashRef $VAR1 # The @{ } dereferences the ArrayRef under 'book' => foreach my $book ( @{ $VAR1->{'book'} } ) { # Then, each $book is another HashRef print $book->{'isbn'}, "\n"; print $book->{'title'}, "\n"; # But 'author' is an ArrayRef: foreach my $author ( @{ $book->{'author'} } ) { print "Author: ", $author, "\n"; } }

    That does not print the XML code, of course, but nothing prevents you from reprinting the labels manually. Otherwise, you have to use an XML related module.

    There are no stupid questions, but there are a lot of inquisitive idiots.

      Thanks for the info! I was having a tough time looping through array refs and hash refs in the past. This is good stuff.

Re: XML::Simple XML / XMLin / XMLout? or something else?
by BrowserUk (Patriarch) on Oct 12, 2012 at 14:02 UTC

    With your posted example after the __DATA_ tag, this:

    #! perl -slw use strict; use XML::Simple; my $xml = XMLin( \*DATA ); print XMLout( $_, NoAttr => 1, RootName => 'book' ) for @{ $xml->{book +} }; __DATA__

    Produces this:

    C:\test>junk <book> <author>Book 1 author 1</author> <author>Book 1 author 2</author> <isbn>Book1ISBN</isbn> <title>Book 1 title</title> </book> <book> <author>Book 2 author 1</author> <author>Book 2 author 2</author> <isbn>Book2ISBN</isbn> <title>Book 2 title</title> </book> <book> <author>Book 3 author 1</author> <author>Book 3 author 2</author> <author>Book 3 author 3</author> <isbn>Book3ISBN</isbn> <title>Book 3 title</title> </book>

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

      This also looks like it'll do what I need. Man all you folk are awesome! Thanks for the help.

        I can't resist: How does the output of the last approach differ from mine where I said you have to just throw away the <booklist> tag from your input file? By the way: Extra newlines are also simple.

        But if you really need perl:

        perl -ne 'print unless /booklist/' < data.xml

        Best regards
        McA

Re: XML::Simple XML / XMLin / XMLout? or something else?
by dasgar (Priest) on Oct 12, 2012 at 13:31 UTC

    I don't think that XML::Simple is going to do what you're trying to do. The XMLin function takes XML input and parses that into a data structure of nested arrays and hashes. The XMLout function takes a data structure (presumably one created from XMLin) and then generates the XML output.

    I'm not familiar enough with any of the other XML parsing modules to recommend an alternative to XML::Simple that will do what you're trying to accomplish.

Re: XML::Simple XML / XMLin / XMLout? or something else?
by McA (Priest) on Oct 12, 2012 at 13:29 UTC

    Hi

    I don't know what you want to achieve. If you need the XML you have than you have nothing to do. Load the file into your editor, delete the lines with <booklist> and you're done.

    Or tell us what you really want to do ;-) I'm sure I misunderstood you.

    Best regards
    McA