hsmyers has asked for the wisdom of the Perl Monks concerning the following question:
While running a script on production data I noticed that it seemed locked up. Since it had sat there at the command line for several hours beyond the usual run time I pulled the plug with a ^C (this in XP BTW) only to see:
Can't coerce UNKNOWN to string in aelem at C:\Documents and Settings\hsmyers\Desktop\backlink\backlink.pl line 68, <$fh_IN> line 52878470.
instead of the usual message about Terminating on signal SIGINT(2). I hit ^C again and made it back out to the prompt, only to see the Microsoft message box about how the application had failed and did I want to tell Microsoft about it etc. My experience to date has suggested that perl.exe does not crash easily. Clearly, however, something here says otherwise. I've attached the code and would appreciate any information anyone might be able to toss my way. File sizes for the data in typically are in the Gigabyte range-- in the case of the crash, it was 2,322,832,377.
Can't coerce UNKNOWN to string in aelem at C:\Documents and Settings\hsmyers\Desktop\backlink\backlink.pl line 68, <$fh_IN> line 52878470.
instead of the usual message about Terminating on signal SIGINT(2). I hit ^C again and made it back out to the prompt, only to see the Microsoft message box about how the application had failed and did I want to tell Microsoft about it etc. My experience to date has suggested that perl.exe does not crash easily. Clearly, however, something here says otherwise. I've attached the code and would appreciate any information anyone might be able to toss my way. File sizes for the data in typically are in the Gigabyte range-- in the case of the crash, it was 2,322,832,377.
#!/usr/bin/perl # backlink.pl -- script to rewrite .ic files by adding backlink text. use strict; use warnings; use English; use Prosaix; use Data::Dumper::Simple; my %hrefs; my $url; my $re_url = qr/^\/url: (.*)/; my $re_end = qr/^\/endtext/; my $re_wc = qr/^\/wc: (.*)/; my $notfound = 0; my $total = 0; my $success = '_____found_____'; my $count = 1; my $base_file = $ARGV[0] or die "Missing input file name\n"; (my $output_file = $base_file) =~ s/\.txt/\+bl.txt/; open( my $fh_HREF, '<', $base_file . '.links' ) or die "Couldn't open '$base_file.links': $OS_ERROR\n"; open( my $fh_IN, '<', $base_file ) or die "Couldn't open '$base_file': $OS_ERROR\n"; open( my $fh_OUT, '>', $output_file ) or die "Couldn't open 'base_file.backlnked': $OS_ERROR\n"; open( my $fh_ERR, '>', $base_file . '.unlinked' ) or die "Couldn't open 'base_file.unlinked': $OS_ERROR\n"; binmode $fh_OUT; start(); while (<$fh_HREF>) { chomp; my ( $key, $value ) = split(/\|/); unless ( defined( $hrefs{$key} ) ) { $hrefs{$key} = []; } push( @{ $hrefs{$key} }, $value ); } while (<$fh_IN>) { print $fh_OUT $_; if (/$re_url/) { $url = $1; } if (/$re_wc/) { print "wc $count $1\n"; $count++; } elsif (/$re_end/) { print $fh_OUT "/backlinks\n"; if ( defined( $hrefs{$url} ) ) { my $s = $hrefs{$url}; if (defined($s->[0])) { my @text = collapse(@$s); print $fh_OUT join( ".\n", @text ), ".\n"; print join( ".\n", @text ), ".\n"; } push( @{ $hrefs{$url} }, $success ); } } } while ( my ( $url, $text ) = each(%hrefs) ) { if ( !defined( $text->[-1] ) ) { print $fh_ERR "$url|_____NO_TEXT_____\n"; } elsif ( $text->[-1] ne $success ) { $notfound++; if (defined($text->[0])) { my @text = collapse(@$text); print $fh_ERR "$url|", join( ",", @text ), "\n"; } else { print $fh_ERR "$url|\n"; } } $total++; } close($fh_HREF); close($fh_IN); close($fh_OUT); close($fh_ERR); print "$notfound URLs (of $total) pointing to about.com pages not\n"; print "in this IC ($base_file) written to $base_file.unlinked\n"; finish(); sub collapse { my @s = @_; my @t; for (@s) { next unless defined($_); push(@t,$_); } return @t; }
--hsm
"Never try to teach a pig to sing...it wastes your time and it annoys the pig."
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: New to me crash message
by graff (Chancellor) on Apr 22, 2009 at 07:46 UTC | |
by grinder (Bishop) on Apr 22, 2009 at 08:03 UTC | |
by ikegami (Patriarch) on Apr 22, 2009 at 14:52 UTC | |
by Anonymous Monk on Apr 22, 2009 at 08:03 UTC | |
by hsmyers (Canon) on Apr 22, 2009 at 14:32 UTC | |
by graff (Chancellor) on Apr 22, 2009 at 18:27 UTC | |
by hsmyers (Canon) on Apr 22, 2009 at 21:14 UTC | |
Re: New to me crash message
by Anonymous Monk on Apr 22, 2009 at 07:19 UTC | |
by hsmyers (Canon) on Apr 22, 2009 at 14:21 UTC | |
Re: New to me crash message
by gone2015 (Deacon) on Apr 22, 2009 at 14:33 UTC | |
by ikegami (Patriarch) on Apr 22, 2009 at 15:00 UTC | |
by hsmyers (Canon) on Apr 22, 2009 at 20:56 UTC |
Back to
Seekers of Perl Wisdom