UPDATE:
I went ahead and used the brute-force nuclear option and manually edited the declaration before passing it to XML::Twig.
open my $fh, '+<:utf8', 'file.in.xml' or die $!;
my $line = <$fh>;
$line=~s/<\?xml.+encoding="\Kutf-16"/utf-8" /
or die "didn't match line: '$line'";
seek $fh,0,0 or die $!;
print $fh $line;
close $fh;
Thank you haukex for your help!
Good afternoon!
I am trying to use XML::Twig to strip out comments in a file provided by an upstream process. Unfortunately, the upstream process is incorrectly marking the encoding as "utf-16" when it is not. This causes XML::Twig (and XML::Parser) to fail with a encoding specified in XML declaration is incorrect at line 1, column 30, byte 30 error
Is there an option in XML::Twig that can be set to "relax" the parsing to ignore the incorrect encoding specified in the declaration?
Thank you for your time.
Perl info:
perl -v
This is perl 5, version 24, subversion 0 (v5.24.0) built for MSWin32-x
+64-multi-thread
XML::Twig info:
3.52
Sample code:
#!/usr/bin/perl
use 5.024;
use strict;
use warnings;
use XML::Twig;
open (my $OFILE, '>:utf8', 'file.out.xml') or die "$!\n$^E";
my $t = XML::Twig->new(
twig_handlers => {
'/keys/key' => sub { $_[0]->flush($OFILE); },
},
output_encoding => 'utf-8',
pretty_print => 'indented',
comments => 'drop', # remove any comments
);
$t->safe_parse(\*DATA);
if ( $@ ) {
die "Error occured in XML data\n\n$@";
}
close $OFILE;
__DATA__
<?xml version="1.0" encoding="utf-16"?>
<keys>
<!-- One hen -->
<key>45646fa8-32e5-494c-93ff-0f00281fc2d6</key>
<!-- Two ducks -->
<key>b6bdc46f-3275-4312-bbbd-3e375208d05f</key>
<!-- Three squawking geese -->
<key>e5a37cf0-1f69-41a8-899c-23454600894a</key>
<!-- Four limerick oysters -->
<key>b6287f3d-f70c-498d-8360-5a2d8e863ab3</key>
<!-- Five corpulent porpoises -->
<key>118be380-5e69-47d4-81c6-756c34334936</key>
<!-- Six pair of Don Alverzo's tweezers -->
<key>46f9dd5b-d0e9-4f8f-a559-f698bea561fa</key>
<!-- Seven thousand Macedonians in full battle
array -->
<key>9627058f-29f0-4263-8978-fc77ac2fe0a3</key>
<!-- Eight brass monkeys from the ancient
sacred crypts of Egypt -->
<key>6038d393-ba81-423e-8429-01406779ff9e</key>
<!-- Nine apathetic, sympathetic, diabetic old
men on roller skates, with a marked propensity
towards procrastination and sloth -->
<key>5a67c3f0-ea6f-427c-bc3a-86fdb31fd117</key>
<!-- Ten lyrical, spherical, diabolical
denizens of the deep who stalk about the
corners of the cove all at the same time. -->
<key>7ac8b1d8-ff60-4b55-8fe0-ea809d9f5b02</key>
</keys>