http://www.perlmonks.org?node_id=1013162

bulk88 has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to use Encode::'s rawer API, and I am getting lots of crashes and warnings and other RANDOM behavior. Am I using Encode incorrectly or is this a bug with Encode?

edit: tests failing are fine, I didn't change the number or want them to pass since that changes the crash behavior and warnings a little bit, breathing on the script change behavior, if I run this script under Win32 Debugging Heap, it tells me allocations are corrupt (write to free, trashed allocation headers, etc) instead of crashing.

Throwing in an assert (" assert(s+ ulen + 1 == e);"), shows the problem is http://perl5.git.perl.org/perl.git/blob/5e0a247b35271159d629ea8562732e0993ed4594:/cpan/Encode/Unicode/Unicode.xs#l321 SvPVutf8 does not return SvPVX when the string SV is RO flagged ( see http://perl5.git.perl.org/perl.git/blob/5e0a247b35271159d629ea8562732e0993ed4594:/sv.c#l3060). So enc_pack() will write off the end of the PV buffer of SV result for the distance between char * s and char * e which are separate malloc blocks a random distance apart, not the beginning and end of SV utf8's PV buffer. It eventually crashes crashes (segv) either because unallocated VM between s and e, or unallocated VM after result SV's pv buffer was touched. Not sure what to do since Encode:: gets almost no maintenance (http://perl5.git.perl.org/perl.git/blob/5e0a247b35271159d629ea8562732e0993ed4594:/cpan/Encode/Changes), https://rt.cpan.org/Public/Dist/Display.html?Name=Encode and has many C bugs/crashes tickets open unanswered.
use strict; use warnings; use Test::More tests => 17; use Encode; my $utf32encoder = Encode::find_encoding('UTF-32LE'); sub xs_edistance { my $a1 = $utf32encoder->encode(shift,0); my $a2 = $utf32encoder->encode(shift,0); } is( xs_edistance('four','fxxr'), 1, 'kgjsdfjkdsafs'); is( xs_edistance('four','FOuR'), 1, 'kgjsdfjkdsafs'); is( xs_edistance('four',''), 1, 'kgjsdfjkdsafs'); is( xs_edistance('','four'), 1, 'kgjsdfjkdsafs'); is( xs_edistance('',''), 1, 'kgjsdfjkdsafs'); is( xs_edistance('four','fxxr'), 1, 'kgjsdfjkdsafs'); is( xs_edistance('four','FOuR'), 1, 'kgjsdfjkdsafs'); is( xs_edistance('four',''), 1, 'kgjsdfjkdsafs'); is( xs_edistance('','four'), 1, 'kgjsdfjkdsafs'); is( xs_edistance('',''), 1, 'kgjsdfjkdsafs');
gives
1..17 Malformed UTF-8 character (unexpected non-continuation byte 0x01, imme +diately af ter start byte 0xe8) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected non-continuation byte 0x60, imme +diately af ter start byte 0xc0) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xb0, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xb0, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0x98, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0x88, with no +preceding start byte) in subroutine entry at n1.pl line 9. Out of memory! Out of memory! *CRASH*
The crash happened at " my $a2 = $utf32encoder->encode(shift,0);". If I change anything in the script, either it will either not crash, crash quickly, or give infinite errors to console. Infinite error example

start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa8, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xb0, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected non-continuation byte 0xf7, imme +diately a ter start byte 0xf4) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected non-continuation byte 0x04, imme +diately a ter start byte 0xf7) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected non-continuation byte 0x2e, imme +diately a ter start byte 0xc4) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa8, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0x8c, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xb2, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0x80, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected non-continuation byte 0x2e, imme +diately a ter start byte 0xcc) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected non-continuation byte 0x2e, imme +diately a ter start byte 0xfc) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa8, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa6, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected non-continuation byte 0x04, imme +diately a ter start byte 0xf6) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xb6, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0x9c, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0x9b, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa8, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0x80, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0x8c, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected non-continuation byte 0x04, imme +diately a ter start byte 0xf7) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa8, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0x98, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa8, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa8, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0x80, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0x9c, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa8, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected non-continuation byte 0xf7, imme +diately a ter start byte 0xf4) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected non-continuation byte 0x04, imme +diately a ter start byte 0xf7) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected non-continuation byte 0x2f, imme +diately a ter start byte 0xdc) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected non-continuation byte 0x2f, imme +diately a ter start byte 0xd4) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa8, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0x84, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected non-continuation byte 0x2f, imme +diately a ter start byte 0xdc) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa6, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected non-continuation byte 0x04, imme +diately a ter start byte 0xf6) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xb6, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected non-continuation byte 0x2f, imme +diately a ter start byte 0xf4) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa8, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected non-continuation byte 0x2e, imme +diately a ter start byte 0xcc) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected non-continuation byte 0x2e, imme +diately a ter start byte 0xcc) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0x82, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected non-continuation byte 0x04, imme +diately a ter start byte 0xf7) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected non-continuation byte 0x2f, imme +diately a ter start byte 0xfc) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa8, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected non-continuation byte 0x2e, imme +diately a ter start byte 0xcc) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa6, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa8, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected non-continuation byte 0x2f, imme +diately a ter start byte 0xdc) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0x82, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected non-continuation byte 0x04, imme +diately a ter start byte 0xf7) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0x85, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa4, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa3, with no +preceding start byte) in subroutine entry at n1.pl line 9. Malformed UTF-8 character (unexpected continuation byte 0xa8, with no +preceding start byte) in subroutine entry at n1.pl line 9.

a callstack of the crash
ntdll.dll!_RtlAllocateHeap@12() + 0x26916 msvcr71.dll!_heap_alloc(unsigned int size=124) Line 212 C msvcr71.dll!_nh_malloc(unsigned int size=124, int nhFlag=0) Line + 113 C msvcr71.dll!malloc(unsigned int size=124) Line 54 + 0xf C > perl517.dll!VMem::Malloc(unsigned int size=112) Line 151 + 0xe + C perl517.dll!PerlMemMalloc(IPerlMem * piPerl=0x0034815c, unsigned +int size=112) Line 299 + 0x14 C perl517.dll!Perl_safesysmalloc(unsigned int size=112) Line 92 + C perl517.dll!Perl_av_extend_guts(interpreter * my_perl=0x00342c14, + av * av=0x00ad15b4, long key=28, int * maxp=0x00a30a24, sv * * * all +ocp=0x00b06280, sv * * * arrayp=0x00ad15c0) Line 163 + 0x31 C perl517.dll!Perl_av_extend(interpreter * my_perl=0x00342c14, av * + av=0x00000000, long key=12) Line 83 + 0x17 C perl517.dll!Perl_sv_add_backref(interpreter * my_perl=0x00342c14, + sv * const tsv=0x00ad1654, sv * const sv=0x00ad6784) Line 5627 + 0x +b C perl517.dll!Perl_gv_init_pvn(interpreter * my_perl=0x00342c14, gv + * gv=0x00ad6784, hv * stash=0x00ad1654, const char * name=0x280d592c +, unsigned int len=7, unsigned long flags=2) Line 382 + 0x8 C perl517.dll!Perl_gv_fetchmeth_pvn(interpreter * my_perl=0x00342c1 +4, hv * stash=0x00000007, const char * name=0x280d592c, unsigned int +len=7, long level=0, unsigned long flags=0) Line 692 + 0x16 C perl517.dll!Perl_gv_fetchmeth_pvn_autoload(interpreter * my_perl= +0x00342c14, hv * stash=0x00ad1654, const char * name=0x280d592c, unsi +gned int len=7, long level=0, unsigned long flags=0) Line 857 C perl517.dll!S_curse(interpreter * my_perl=0x00b06280, sv * const +sv=0x00ad0c64, const char check_refcnt='') Line 6446 + 0x12 C perl517.dll!Perl_sv_clear(interpreter * my_perl=0x00342c14, sv * +const orig_sv=0x00ad0c64) Line 6117 + 0xb C perl517.dll!Perl_sv_free2(interpreter * my_perl=0x00342c14, sv * +const sv=0x00ad0c64, const unsigned long rc=1) Line 6584 C perl517.dll!S_SvREFCNT_dec(interpreter * my_perl=0x00342c14, sv * + sv=0x00000020) Line 62 + 0xb C perl517.dll!do_clean_objs(interpreter * my_perl=0x00342c14, sv * +const ref=0x00ad0a04) Line 480 + 0x13 C perl517.dll!S_visit(interpreter * my_perl=0x00342c14, void (inter +preter *, sv *)* f=0x28083583, const unsigned long flags=2048, const +unsigned long mask=2048) Line 423 C perl517.dll!Perl_sv_clean_objs(interpreter * my_perl=0x00b06280) + Line 581 C perl517.dll!perl_destruct(interpreter * my_perl=0x00342c14) Line + 772 C perl517.dll!RunPerl(int argc=2, char * * argv=0x01345c98, char * +* env=0x00343f70) Line 275 C perl.exe!mainCRTStartup() Line 398 + 0xe C kernel32.dll!_BaseProcessStart@4() + 0x23
It tried to read point 0x25. Another crash
> Unicode.dll!enc_pack(interpreter * my_perl=0x00342c14, sv * resul +t=0x00000000, unsigned int size=11343540, unsigned char endian='V', u +nsigned long value=86) Line 104 C Unicode.dll!XS_Encode__Unicode_encode_xs(interpreter * my_perl=0x +00342c00, cv * cv=0x00ad0d34) Line 378 + 0x16 C perl517.dll!Perl_pp_entersub(interpreter * my_perl=0x00000002) L +ine 2877 C perl517.dll!Perl_runops_standard(interpreter * my_perl=0x00342c14 +) Line 42 + 0x4 C perl517.dll!S_run_body(interpreter * my_perl=0x00000004, long old +scope=1) Line 2430 + 0xa C perl517.dll!perl_run(interpreter * my_perl=0x00342c14) Line 2346 + + 0x8 C perl517.dll!RunPerl(int argc=2, char * * argv=0x01345c98, char * +* env=0x00343f70) Line 270 + 0x6 C perl.exe!mainCRTStartup() Line 398 + 0xe C kernel32.dll!_BaseProcessStart@4() + 0x23