Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
While running a script on production data I noticed that it seemed locked up. Since it had sat there at the command line for several hours beyond the usual run time I pulled the plug with a ^C (this in XP BTW) only to see:

Can't coerce UNKNOWN to string in aelem at C:\Documents and Settings\hsmyers\Desktop\backlink\backlink.pl line 68, <$fh_IN> line 52878470.


instead of the usual message about Terminating on signal SIGINT(2). I hit ^C again and made it back out to the prompt, only to see the Microsoft message box about how the application had failed and did I want to tell Microsoft about it etc. My experience to date has suggested that perl.exe does not crash easily. Clearly, however, something here says otherwise. I've attached the code and would appreciate any information anyone might be able to toss my way. File sizes for the data in typically are in the Gigabyte range-- in the case of the crash, it was 2,322,832,377.
#!/usr/bin/perl # backlink.pl -- script to rewrite .ic files by adding backlink text. use strict; use warnings; use English; use Prosaix; use Data::Dumper::Simple; my %hrefs; my $url; my $re_url = qr/^\/url: (.*)/; my $re_end = qr/^\/endtext/; my $re_wc = qr/^\/wc: (.*)/; my $notfound = 0; my $total = 0; my $success = '_____found_____'; my $count = 1; my $base_file = $ARGV[0] or die "Missing input file name\n"; (my $output_file = $base_file) =~ s/\.txt/\+bl.txt/; open( my $fh_HREF, '<', $base_file . '.links' ) or die "Couldn't open '$base_file.links': $OS_ERROR\n"; open( my $fh_IN, '<', $base_file ) or die "Couldn't open '$base_file': $OS_ERROR\n"; open( my $fh_OUT, '>', $output_file ) or die "Couldn't open 'base_file.backlnked': $OS_ERROR\n"; open( my $fh_ERR, '>', $base_file . '.unlinked' ) or die "Couldn't open 'base_file.unlinked': $OS_ERROR\n"; binmode $fh_OUT; start(); while (<$fh_HREF>) { chomp; my ( $key, $value ) = split(/\|/); unless ( defined( $hrefs{$key} ) ) { $hrefs{$key} = []; } push( @{ $hrefs{$key} }, $value ); } while (<$fh_IN>) { print $fh_OUT $_; if (/$re_url/) { $url = $1; } if (/$re_wc/) { print "wc $count $1\n"; $count++; } elsif (/$re_end/) { print $fh_OUT "/backlinks\n"; if ( defined( $hrefs{$url} ) ) { my $s = $hrefs{$url}; if (defined($s->[0])) { my @text = collapse(@$s); print $fh_OUT join( ".\n", @text ), ".\n"; print join( ".\n", @text ), ".\n"; } push( @{ $hrefs{$url} }, $success ); } } } while ( my ( $url, $text ) = each(%hrefs) ) { if ( !defined( $text->[-1] ) ) { print $fh_ERR "$url|_____NO_TEXT_____\n"; } elsif ( $text->[-1] ne $success ) { $notfound++; if (defined($text->[0])) { my @text = collapse(@$text); print $fh_ERR "$url|", join( ",", @text ), "\n"; } else { print $fh_ERR "$url|\n"; } } $total++; } close($fh_HREF); close($fh_IN); close($fh_OUT); close($fh_ERR); print "$notfound URLs (of $total) pointing to about.com pages not\n"; print "in this IC ($base_file) written to $base_file.unlinked\n"; finish(); sub collapse { my @s = @_; my @t; for (@s) { next unless defined($_); push(@t,$_); } return @t; }

--hsm

"Never try to teach a pig to sing...it wastes your time and it annoys the pig."

In reply to New to me crash message by hsmyers

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others drinking their drinks and smoking their pipes about the Monastery: (9)
    As of 2014-08-22 10:15 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The best computer themed movie is:











      Results (153 votes), past polls