Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Indenting XML

by uzzikie (Sexton)
on Feb 21, 2003 at 02:20 UTC ( #237315=perlquestion: print w/replies, xml ) Need Help??
uzzikie has asked for the wisdom of the Perl Monks concerning the following question:

hi Monks,
I have to display several XML files in one page which XML::Simple does a good job of parsing them.
However, the XML is in one continuous string like
as you can see, not very human friendly (and that is just a small fragment!)...
I wonder what can i do to format them nicely so that it is more readable.

Replies are listed 'Best First'.
Re: Indenting XML
by Zaxo (Archbishop) on Feb 21, 2003 at 03:28 UTC

    XML::Simple already does a primitive pretty print in XMLout(). Here is how it's used with your data:

    use XML::Simple; my $doc = XMLin 'air.xml', forcearray => 1; print XMLout $doc, xmldecl => 1, rootname => 'AIRAVAALABILITY'; __END__ prints: <?xml version='1.0' standalone='yes'?> <AIRAVAALABILITY> <AIRAVL> <AIRAVL007V>Y</AIRAVL007V> <AIRAVL018B>HKG</AIRAVL018B> <AIRAVL012U>01MAY</AIRAVL012U> <AIRAVL012G></AIRAVL012G> <AIRAVL0145>SIN</AIRAVL0145> </AIRAVL> </AIRAVAALABILITY>

    Update: fixed up root tag

    After Compline,

Re: Indenting XML
by Mr_Person (Hermit) on Feb 21, 2003 at 03:10 UTC
Re: Indenting XML
by uzzikie (Sexton) on Feb 21, 2003 at 06:07 UTC
    thanks everyone for their input....
    after trying a few solutions, i guess I shall adopt the below approach....
    use XML::Parser::PerlSAX; use XML::Handler::YAWriter; my $xmlquery = qq(<AIRAVAALABILTY> <AIRAVL> <AIRAVL012U>01MAY</AIRAVL0 +12U> <AIRAVL0145>SIN</AIRAVL0145> <AIRAVL018B>HKG</AIRAVL018B> <AIRAV +L007V>Y</AIRAVL007V> <AIRAVL012G></AIRAVL012G> </AIRAVL> </AIRAVAALAB +ILTY>); my $ya = new XML::Handler::YAWriter( 'Output' => new IO::File ( ">-" ), 'Pretty' => { 'NoComments'=>1, 'PrettyWhiteIndent'=>1, 'NoWhiteSpace'=>1, 'PrettyWhiteNewline'=>1, } ); my $perlsax = new XML::Parser::PerlSAX( 'Handler' => $ya); my $result = $perlsax->parse(Source => { Encoding => 'ISO-8859-1', Str +ing => "$xmlquery" }); print qq($result);
    which in turn prints out
      Some XML apps will be seriously annoyed by your adding whitespace within the leaf level tags. Just a thought but I'd test that carefully before you transform it for more than just debugging by eye.

      $you = new YOU;
      honk() if $you->love(perl)

Re: Indenting XML
by mirod (Canon) on Feb 21, 2003 at 06:14 UTC
Re: Indenting XML
by Zero_Flop (Pilgrim) on Feb 21, 2003 at 04:48 UTC
    Use the best tool for the job. The TIDY application for HTML on the 3WC site has been designed to do what you want.
Re: Indenting XML
by graff (Chancellor) on Feb 21, 2003 at 03:23 UTC
    I suppose there may be a module for this (but maybe not). A two-bit kluge might go something like this:
    # assume that the long string from XML::Simple is assigned to $_: s/></>\n</g; # put line breaks between all adjacent tags my @lines = split( /\n/ ); my $indent = 0; $_ = ""; foreach my $l ( @lines ) { $indent -=2 if ( $l =~ /<\// ); # close tag decreases indent $_ .= " " x $indent . "$l\n"; $indent +=2 if ( $l =~ /^<\w/ ); # open tag increases indent } print;
Re: Indenting XML
by TStanley (Canon) on Feb 21, 2003 at 20:13 UTC
    If you do data munging on a fairly regular basis, you might want to consider picking up Data Munging with Perl which was written by our own davorg. There is an entire chapter devoted to the various ways of munging XML, and as a whole, the book is very enlightening.

    It is God's job to forgive Osama Bin Laden. It is our job to arrange the meeting -- General Norman Schwartzkopf
Re: Indenting XML
by hardburn (Abbot) on Feb 21, 2003 at 17:30 UTC

    A lot of nodes above are pointing out solutions using various modules. I'm just thinking about how to do this without using an existing formatter (if only for the intellectual exercise).

    Keep a variable named $tab_count, intitilized to 0. Each time you see a valid begin tag, increment $tab_count, and print a newline followed by "\t" x $tab_count after the begin tag. Whenever you see a valid end tag, decrement $tab_count and print the same string as above.

    Reinvent a rounder wheel.

    Note: All code is untested, unless otherwise stated

      It is a little more complex than this... what do you do with mixed content for example: how would you indent this:

      <ul><li><b>bold</b> statement indeed <i>hardburn</i></li></ul>

      Update: Oops! Sorry, I should have read the original question before getting all excited. XML::Simple doesn't do mixed content, so there should not be any in the XML anyway. So then, yes, indenting is a little easier. You still need to parse tags properly though, which is not as easy as it might look.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://237315]
Approved by newrisedesigns
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2018-04-21 16:38 GMT
Find Nodes?
    Voting Booth?