Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: XML::Twig too many children?

by robbv (Initiate)
on Feb 21, 2012 at 20:33 UTC ( [id://955382]=note: print w/replies, xml ) Need Help??


in reply to Re: XML::Twig too many children?
in thread XML::Twig too many children?

Your example works fine for me too (except that it won't swallow the <<XML; ... XML construction, so I rewrote it).

However, it breaks if I have just one <lemma>-tag with many children. I find that the breaking point is at 4696/4697.

use warnings; use strict; use XML::Twig; my $children = 4697; my $found; my $xmlStr = '<XML><lemma>'.join("\n",@{['<line>1</line>' x $children] +}).'</lemma></XML>'; my $twig = XML::Twig->new( keep_encoding => 1, twig_handlers => {'lemma' => \&ProcessLemma} ); $twig->parse($xmlStr); print "Expected one, found $found\n"; sub ProcessLemma { my ($XmlTwig, $XmlLemma) = @_; ++$found; $XmlLemma->purge; return 1; }

Btw, I don't know if it matters, but I'm using Win32, ActivePerl 5.14.2.

Replies are listed 'Best First'.
Re^3: XML::Twig too many children?
by ikegami (Patriarch) on Feb 21, 2012 at 23:51 UTC

    5.14.2 (i686-linux-thread-multi), XML::Twig 3.39, XML::Parser 2.41.

    Died of a segfault with a sufficiently large number.

    Stack trace:

    Program received signal SIGSEGV, Segmentation fault. 0x0807632d in Perl_call_sv () (gdb) bt #0 0x0807632d in Perl_call_sv () #1 0x080e959d in Perl_sv_clear () #2 0x080e9c8a in Perl_sv_free2 () #3 0x080d7644 in Perl_hv_free_ent () #4 0x080d8bf3 in S_hfreeentries () #5 0x080db12e in Perl_hv_undef_flags () #6 0x080e97cb in Perl_sv_clear () #7 0x080e9c8a in Perl_sv_free2 () #8 0x080d7644 in Perl_hv_free_ent () #9 0x080d8bf3 in S_hfreeentries () #10 0x080db12e in Perl_hv_undef_flags () #11 0x080e97cb in Perl_sv_clear () #12 0x080e9c8a in Perl_sv_free2 () ... #87303 0x080d7644 in Perl_hv_free_ent () #87304 0x080d8bf3 in S_hfreeentries () #87305 0x080db12e in Perl_hv_undef_flags () #87306 0x080e97cb in Perl_sv_clear () #87307 0x080e9c8a in Perl_sv_free2 () #87308 0x080d7644 in Perl_hv_free_ent () #87309 0x080d8bf3 in S_hfreeentries () #87310 0x080db12e in Perl_hv_undef_flags () #87311 0x080e97cb in Perl_sv_clear () #87312 0x080e9c8a in Perl_sv_free2 () #87313 0x080d7644 in Perl_hv_free_ent () #87314 0x080d8bf3 in S_hfreeentries () #87315 0x080db12e in Perl_hv_undef_flags () #87316 0x080e97cb in Perl_sv_clear () #87317 0x080e9c8a in Perl_sv_free2 () #87318 0x08111ef1 in Perl_leave_scope () #87319 0x081120bc in Perl_pop_scope () #87320 0x0811dd60 in Perl_pp_return () #87321 0x080dd748 in Perl_runops_standard () #87322 0x08076475 in Perl_call_sv () #87323 0xb7ac2148 in endElement () from /home/eric/usr/perlbrew/perls/ +5.14.2t/lib/site_perl/5.14.2/i686-linux-thread-multi/auto/XML/Parser/ +Expat/Expat.so #87324 0xb7a93a55 in ?? () from /usr/lib/../lib/libexpat.so.1 #87325 0xb7a948a1 in ?? () from /usr/lib/../lib/libexpat.so.1 #87326 0xb7a95db1 in ?? () from /usr/lib/../lib/libexpat.so.1 #87327 0xb7a9696a in ?? () from /usr/lib/../lib/libexpat.so.1 #87328 0xb7a8d64c in XML_ParseBuffer () from /usr/lib/../lib/libexpat. +so.1 #87329 0xb7a8eab5 in XML_Parse () from /usr/lib/../lib/libexpat.so.1 #87330 0xb7ab6a78 in XS_XML__Parser__Expat_ParseString () from /home/e +ric/usr/perlbrew/perls/5.14.2t/lib/site_perl/5.14.2/i686-linux-thread +-multi/auto/XML/Parser/Expat/Expat.so #87331 0x080df181 in Perl_pp_entersub () #87332 0x080dd748 in Perl_runops_standard () #87333 0x080770ea in perl_run () #87334 0x0805fe3d in main ()

    First guess, a stack overflow from an endless(?) recursive loop. [Upd: It could be a stack overflow, but it's not from endless recursion. The pattern is clearly broken at the top. ]

    The odd thing is that the loop is in perl's code.

    Same with an older version of Perl: 5.10.1 (i686-linux-thread-multi), XML::Twig 3.39, XML::Parser 2.41.

    I'll install a debug build of Perl and see if I hit an assert.

      The odd thing is that the loop is in perl's code.

      I suspect that isn't really a loop as much as Perl free()ing some moderately deeply nested hashes.

      - tye        

      5.14.2 i686-linux-thread-multi with -Doptimize=-g didn't reveal much more.

      It wouldn't hurt to include this in your bug report, but file it with XML::Parser, not XML::Twig.

        Yup, it's a stack overflow:

        $ valgrind perl a.pl ==17235== Memcheck, a memory error detector ==17235== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et +al. ==17235== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h fo +r copyright info ==17235== Command: perl a.pl ==17235== ==17235== Stack overflow in thread 1: can't grow stack to 0xbe01ef78 ==17235== ...

        The more I think about it, the less I think it's a bug in XML::Parser. I think it's a bug in how Perl destroys deeply nested structures, which may or may not be fixable.

        There are few places where the C code that forms perl itself recurses, and those can lead to overflows of the limited-sized "C stack" (as opposed to the "Perl stack" the script uses), and this appears to be one of them.

        Updated after I remembered valgrind.

Re^3: XML::Twig too many children?
by choroba (Cardinal) on Feb 21, 2012 at 21:26 UTC
    For me, it works up to 20_140 (linux, i686, Perl 5.14.2). For 20_142, it usually dies of SIGSEGV, but sometimes still works.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://955382]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2024-04-19 04:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found