This is a dirt-basic XML indenter. I wrote it so I could take a directory full of XML files and add newlines after tags and leading whitespace. I made no attempt to do anything comprehensive with this since it handles the simple case I was looking at.
The simple (and white-space free) document
<message><org><cn>Some org-or-other</cn><ph>Wouldn't you like to know</ph></org><contact><fn>Pat</fn><ln>Califia</ln></contact></message>Becomes something nicer to look at
<message>
<org>
<cn>Some org-or-other</cn>
<ph>Wouldn't you like to know</ph>
</org>
<contact>
<fn>Pat</fn>
<ln>Califia</ln>
</contact>
</message>
@files = glob "*.xml";
undef $/;
for $file (@files) {
$indent = 0;
open FILE, $file or die "Couldn't open $file for reading: $!";
$_ = readline *FILE;
close FILE or die "Couldn't close $file: $!";
# Remove whitespace between > and < if that is the only thing sepa
+rating
# them
s/(?<=>)\s+(?=<)//g;
# Indent
s{ # Capture a tag <$1$2$3>,
# a potential closing slash $1
# the contents $2
# a potential closing slash $3
<(/?)([^/>]+)(/?)>
# Optional white space
\s*
# Optional tag.
# $4 contains either undef, "<" or "</"
(?=(</?))?
}
{
# Adjust the indentation level.
# $3: A <foo/> tag. No alteration to indentation.
# $1: A closing </foo> tag. Drop one indentation level
# else: An opening <foo> tag. Increase one indentation level
$indent +=
$3 ? 0 :
$1 ? -1 :
1;
# Put the captured tag back into place
"<$1$2$3>" .
# Two closing tags in a row. Add a newline and indent the next
+ line
($1 and ($4 eq "</") ?
"\n" . (" " x $indent) :
# This isn't a closing tag but the next tag is. Add a newline
+and
# indent the next line.
$4 ?
"\n" . (" " x $indent) :
# This isn't a closing tag - no special indentation. I forget
+why
# this works.
""
)
# /g repeat as necessary
# /e Execute the block of perl code to create replacement text
# /x Allow whitespace and comments in the regex
}gex;
open FILE, ">", $file or die "Couldn't open $file for writing: $!"
+;
print FILE or die "Couldn't write to $file: $!";
close FILE or die "Couldn't close $file: $!";
}
__END__
This is the version I copied and pasted from the working script.
Its ugly so I purtied it up and posted the version you see
above this text. I'm leaving this ugly version here just in case I
introduced some bug I'm not aware of.
@files = glob "*.xml";
undef $/;
$tag = "<>/";
for $file (@files) {
$indent = 0;
open FILE, $file;
$_ = <FILE>;
s/(?<=>)\s+(?=<)//g;
s(<(/?)([^/>]+)(/?)>\s*(?=(</?))?)($indent+=$3?0:$1?-1:1;"<$1$2$3>
+".($1&&($4 eq"</")?"\n".(" "x$indent):$4?"\n".(" "x$indent):""))ge;
open FILE, ">$file";
print FILE;
}