http://www.perlmonks.org?node_id=894047


in reply to Re^2: parsing XML fragments (xml log files) with... a regex
in thread parsing XML fragments (xml log files) with XML::Parser

For those interested, it can't handle

Up to you to decide if it fits your needs or not.

* — A post-processor could fix this if no entities were processed at all.

** — A pre-processor such as the following would fix this:

sub _predecode { my $enc; if ( $_[0] =~ /^\xEF\xBB\xBF/ ) { $enc = 'UTF-8'; } elsif ( $_[0] =~ /^\xFF\xFE/ ) { $enc = 'UTF-16le'; } elsif ( $_[0] =~ /^\xFE\xFF/ ) { $enc = 'UTF-16be'; } elsif (substr($_[0], 0, 100) =~ /^[^>]* encoding="([^"]+)"/) { $en +c = $1; } else { $enc = 'UTF-8'; } return decode($enc, $_[0], Encode::FB_CROAK | Encode::LEAVE_SRC); }

*** — A post-processor could fix this, but one wasn't supplied.

Update: Added pre-processor I had previously coded.