http://www.perlmonks.org?node_id=759194

hsmyers has asked for the wisdom of the Perl Monks concerning the following question:

While running a script on production data I noticed that it seemed locked up. Since it had sat there at the command line for several hours beyond the usual run time I pulled the plug with a ^C (this in XP BTW) only to see:

Can't coerce UNKNOWN to string in aelem at C:\Documents and Settings\hsmyers\Desktop\backlink\backlink.pl line 68, <$fh_IN> line 52878470.


instead of the usual message about Terminating on signal SIGINT(2). I hit ^C again and made it back out to the prompt, only to see the Microsoft message box about how the application had failed and did I want to tell Microsoft about it etc. My experience to date has suggested that perl.exe does not crash easily. Clearly, however, something here says otherwise. I've attached the code and would appreciate any information anyone might be able to toss my way. File sizes for the data in typically are in the Gigabyte range-- in the case of the crash, it was 2,322,832,377.
#!/usr/bin/perl # backlink.pl -- script to rewrite .ic files by adding backlink text. use strict; use warnings; use English; use Prosaix; use Data::Dumper::Simple; my %hrefs; my $url; my $re_url = qr/^\/url: (.*)/; my $re_end = qr/^\/endtext/; my $re_wc = qr/^\/wc: (.*)/; my $notfound = 0; my $total = 0; my $success = '_____found_____'; my $count = 1; my $base_file = $ARGV[0] or die "Missing input file name\n"; (my $output_file = $base_file) =~ s/\.txt/\+bl.txt/; open( my $fh_HREF, '<', $base_file . '.links' ) or die "Couldn't open '$base_file.links': $OS_ERROR\n"; open( my $fh_IN, '<', $base_file ) or die "Couldn't open '$base_file': $OS_ERROR\n"; open( my $fh_OUT, '>', $output_file ) or die "Couldn't open 'base_file.backlnked': $OS_ERROR\n"; open( my $fh_ERR, '>', $base_file . '.unlinked' ) or die "Couldn't open 'base_file.unlinked': $OS_ERROR\n"; binmode $fh_OUT; start(); while (<$fh_HREF>) { chomp; my ( $key, $value ) = split(/\|/); unless ( defined( $hrefs{$key} ) ) { $hrefs{$key} = []; } push( @{ $hrefs{$key} }, $value ); } while (<$fh_IN>) { print $fh_OUT $_; if (/$re_url/) { $url = $1; } if (/$re_wc/) { print "wc $count $1\n"; $count++; } elsif (/$re_end/) { print $fh_OUT "/backlinks\n"; if ( defined( $hrefs{$url} ) ) { my $s = $hrefs{$url}; if (defined($s->[0])) { my @text = collapse(@$s); print $fh_OUT join( ".\n", @text ), ".\n"; print join( ".\n", @text ), ".\n"; } push( @{ $hrefs{$url} }, $success ); } } } while ( my ( $url, $text ) = each(%hrefs) ) { if ( !defined( $text->[-1] ) ) { print $fh_ERR "$url|_____NO_TEXT_____\n"; } elsif ( $text->[-1] ne $success ) { $notfound++; if (defined($text->[0])) { my @text = collapse(@$text); print $fh_ERR "$url|", join( ",", @text ), "\n"; } else { print $fh_ERR "$url|\n"; } } $total++; } close($fh_HREF); close($fh_IN); close($fh_OUT); close($fh_ERR); print "$notfound URLs (of $total) pointing to about.com pages not\n"; print "in this IC ($base_file) written to $base_file.unlinked\n"; finish(); sub collapse { my @s = @_; my @t; for (@s) { next unless defined($_); push(@t,$_); } return @t; }

--hsm

"Never try to teach a pig to sing...it wastes your time and it annoys the pig."

Replies are listed 'Best First'.
Re: New to me crash message
by graff (Chancellor) on Apr 22, 2009 at 07:46 UTC
    I gather than line 68 (where the error is reported to have occurred) is here:
    while ( my ( $url, $text ) = each(%hrefs) ) { if ( !defined( $text->[-1] ) ) { print $fh_ERR "$url|_____NO_TEXT_____\n"; } elsif ( $text->[-1] ne $success ) { $notfound++; if (defined($text->[0])) { my @text = collapse(@$text); ### <--- line 68 print $fh_ERR "$url|", join( ",", @text ), "\n"; } else { print $fh_ERR "$url|\n"; } } $total++; }
    I'm stumped about the reference to "aelem" in the error message -- no clue what this might be referring to. Apart from that, when you mention "the usual runtime", do you mean that this script has "usually" worked prior to this failure?

    If so, the question becomes: what was different about this run relative to previous runs (when it worked as intended)? More input data? Corrupted input data? (I don't see much in the way of checking for bad input... what would happen if a line in your "$base_file.links" does not contain a "|" (vertical bar) character?)

    Probably not related to your problem, but you could replace the "collapse" function call with:

    grep { defined } @$text
    Also, I see you checking for hash elements with if(defined($hash{$key})), and it might make more sense to use if(exists($hash{$key})) instead.
      I'm stumped about the reference to "aelem" in the error message

      It is internals-speak for "array element", at the C level. It only surfaces when you grovel deeply in magic, or XS.

      • another intruder with the mooring in the heart of the Perl

        More precisely, it's one of the opcodes that implements $a[$i]. (There appears to be a aelemfast as well.)
        $ perl -MO=Concise -e'$a[$i]' 7 <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 1 -e:1) v ->3 6 <2> aelem vK/2 ->7 4 <1> rv2av sKR/1 ->5 3 <#> gv[*a] s ->4 - <1> ex-rv2sv sK/1 ->6 5 <#> gvsv[*i] s ->6 -e syntax OK

        But oddly, there's no array indexing at the source line number given by the message.

      aelem seems to be an Opcode, example
      # l <|> mapwhile(other->m)[t26] lK # m <#> gv[*_] s # n <1> rv2sv sKM/DREFAV,1 # o <1> rv2av[t4] sKR/1 # p <$> const[IV 0] s # q <2> aelem sK/2
      Simpler way to generate aelem
      Improvements also very welcome. Much of this code was written in quick and dirty mode, sensible re factoring need not apply. While anything is possible, this is the second part of two scripts, the first writes the file this one reads at an earlier point, the one with '|' as divider so at least that is unlikely. Thanks for the better design suggestions! I'm thinking that the best chance for a 'change' as smoking gun is a significant jump in data size. This is of course complicated by the time it takes to process a file of large size (hours--- companion takes the longest portion).

      --hsm

      "Never try to teach a pig to sing...it wastes your time and it annoys the pig."
        ... this is the second part of two scripts, the first writes the file this one reads at an earlier point, the one with '|' as divider so at least that is unlikely.

        In my experience, a statement like "that is unlikely" doesn't suffice for debugging the sort of problem you're having. Only testing will suffice. Test the output of that first script for the data set in question, and confirm that every record matches  /\S\|\S/.

        You can also create a small test set of data for input to the OP script, include a record that does not match that regex, and see what happens. If it blows up, that's a good cue for adding some defensive code in the OP script, to do something sensible when such data comes in (skip the record or die, with a suitable message to stderr).

Re: New to me crash message
by Anonymous Monk on Apr 22, 2009 at 07:19 UTC
    perldiag

    Can't coerce %s to string in %s

    (F) Certain types of SVs, in particular real symbol table entries (typeglobs), can't be forced to stop being what they are.

    67 if (defined($text->[0])) { 68 my @text = collapse(@$text); 69 print $fh_ERR "$url|", join( ",", @text ), "\n"; 70 }
    Can you reproduce the error if you remove Data::Dumper::Simple? How about English, and Prosix?
      Easy to get rid of first two; less easy to get rid of prosaix, but I'll see. Thanks for the debug direction!

      --hsm

      "Never try to teach a pig to sing...it wastes your time and it annoys the pig."
Re: New to me crash message
by gone2015 (Deacon) on Apr 22, 2009 at 14:33 UTC

    Poking around in the source, I find that the error:

    Can't coerce UNKNOWN to string in aelem ... line 68 ...
    is generated when Perl_sv_pvn_force_flags is presented with a value which is an Array, Hash, Code or IO or some completely bogus type: (SvTYPE(sv) > SVt_PVLV && SvTYPE(sv) != SVt_PVFM).

    The lines around the reported error location are:

    67: if (defined($text->[0])) { 68: my @text = collapse(@$text); 69: print $fh_ERR "$url|", join( ",", @text ), "\n"; 70: }

    The aelem suggests that the error is in an array element look up... however, things are all a bit mysterious:

    • there an array look up on line 67, but not on line 68.

    • I'm not convinced I can think of a case where aelem would be doing a force to string (PV).

    • I cannot see anywhere where defined would set about forcing to PV

    • the join on line 69 looks like the best candidate for forcing to PV -- but I don't see a good reason for the error being reported on the wrong line or in the wrong operation...

    • the reported UNKNOWN type is a BIG WORRY... because it indicates that whatever the SV is that is being forced to string, it's not of any type known to Perl -- which suggests something has gang awry in an omigod-could-this-be-a-bug-in-Perl sort of a way.

    It could be time to get out the debugger and place a breakpoint on the:

    if (SvTYPE(sv) > SVt_PVLV && SvTYPE(sv) != SVt_PVFM) Perl_croak(aTHX_ "Can't coerce %s to string in %s", sv_reftype(sv,0), OP_NAME(PL_op));
    in Perl_sv_pvn_force_flags and having a poke around in the entrails.

      the join on line 69 looks like the best candidate for forcing to PV -- but I don't see a good reason for the error being reported on the wrong line or in the wrong operation...

      He's using at least one source filter. As previously suggested, the first act should be to see if the bug occurs without loading any modules.

        Am about to fire off a 'use' free version of the code. Am anxious myself to see what happens--- hope it perturbs the dime for me.

        --hsm

        "Never try to teach a pig to sing...it wastes your time and it annoys the pig."