Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Hey, XML::DOM!, My least favorite module!

Here is a more perlish (and which seems to be working fine) version of your code:

#!/bin/perl -w use strict; use XML::DOM; my $parser = new XML::DOM::Parser; my $doc = $parser->parse( \*DATA ) or die "Unable to parse document"; my $root = $doc->getDocumentElement(); # safer than just getting the f +irst # child, in case the document h +as a # DTD or start with comments scanner($root); sub scanner { my ($rt) = @_; my $i=0; foreach my $nde ( $rt->getChildNodes()) # yes it is an +array! { if ( ($nde->getNodeType() == TEXT_NODE ) && ($nde->getData()=~ /\S/s) ) # to avoid extr +a white spaces { #$log->info( $i.$nde->getNodeValue()); print $i++," TEXT /", $nde->getData(), "/\n"; } if ($nde->getNodeType == ELEMENT_NODE) { #$log->info( $i.$nde->getNodeName()); print $i++, " ELEMENT ", $nde->getNodeName(), "\n"; } scanner( $nde ); } } __DATA__ <methodCall>Level1 Text <Level2a>Text at Level2a</Level2a> <Level2b>Text at Level2b</Level2b> </methodCall>

Some explanations:

  • Yes there are plenty of white spaces in your document. The DOM states that they should be reported to the application. In your case I think you can ignore "pure white spaces" If you have a choice you should also wrap the text at level 1 in a tag, it would make it easier to ignore the trailing spaces and it would generally be cleaner. Especially if you are dealing with data-oriented XML it is a good idea to avoid mixed-content (text such as Level1 Text and elements (Level2a and Level2b) mixed within an element (methodCall).
  • If you want to program in Java you should write Java code. Seriously XML::DOM uses Perl native types such as arrays, you should take advantage of it. The DOM is enough of a pain as it is, no need to make it even more painful. As for the oft heard argument than this is different than the Java API, even Java programmers seem to be ditching the DOM in favour of JDOM (motto "Say NO to DOM"), a simpler interface to the strict binding defined by the W3C so...
  • use $root=  $doc->getDocumentElement() instead of getFirstChild(), that will save you some trouble the day your XML document comes with a DTD or leading comment, which is perfectly legal

In reply to Re: XML::DOM::Parser by mirod
in thread XML::DOM::Parser by spoddie

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others cooling their heels in the Monastery: (1)
    As of 2021-05-10 02:05 GMT
    Find Nodes?
      Voting Booth?
      Perl 7 will be out ...

      Results (103 votes). Check out past polls.