Re^3: Consumes memory then crashs

There doesn't seem any good reason to build up your regex a bit at a time in separate variables; then concatenate those bits into another variable; and then interpolate that into a regex:

     $re1='(gain)';
     $re2='(:)';
     $re3='(Overall)';
     $re4='(:)';
     $re5='(\\d+)';
     $re6='(:)';
     $re7='(\\d+)';
     $re=$re1.$re2.$re3.$re4.$re5.$re6.$re7;
     if ($lookup =~ m/$re/isg) {
[download]

And even less reason to repeat the same 3-stage exercise over and over again each time around a loop. Especially as it means that the regex engine needs to recompile the combined regex every time around that loop even though nothing changes.

And why capture 7 different bits of the string you are matching against:

  $re = '(gain)(:)(Overall)(:)(\\d+)(:)(\\d+)';
  ...
  "$name $7\n"
[download]

when 5 of them are constants; and you are only using one of them?

Equally, the is nothing to be gained from assigning a constant string to a variable and then interpolating it into a regex:

    $reg1='(ERROR)';
    
    ...

     elsif ($lookup =~ m/$reg1/isg)
[download]

Cleaning up those; and a few other things up; adding strict and -w; and moving the regexing into a subroutine (to make later threading easier), I get:

#! perl -slw
use strict;
use LWP::Simple;

sub lookup {
    my( $hf, $name ) = @_;
    my $lookup = get(
        "http://rscript.org/lookup.php?type=track&time=62899200&user=$
+name&skill=all"
    );

    print "Looking up $name...\n";

    if( $lookup =~ m/gain:Overall:\d+:(\d+)/isg ) {
        print { $fh } "$name $7\n";
    }
    elsif( $lookup =~ m/(ERROR)/isg ) {
        print { $fh } "$name doesn't exist \n"
    }
    else{
        print { $fh } "$name 0\n";
    }
}

my $names = 'zezima';

open( LOOKUP, '>>rstlookup.txt' ) or die $!;

while( my( $name ) = $names =~ m/([a-z0-9_]+)/isg ) {
    lookup( \*LOOKUP, $name );
}
close( LOOKUP );
[download]

Which will probably not run much more quickly as you are IO-bound, but (I hope you'll agree) is easier to read and will at least consume less cpu.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Comment on Re^3: Consumes memory then crashs Select or Download Code

Replies are listed 'Best First'.
Re^4: Consumes memory then crashs by allhellno (Novice) on Mar 24, 2012 at 15:25 UTC
small change `my( $fh, $name ) = @_;` [download] Now the problem is it only looks up the first name infinitely `while( my( $name ) = $names =~ m/([a-z0-9_]+)/isg ) { lookup( \LOOKUP, $name ); }` [download] Nothing seems wrong to me so I changed it slightly `while($names =~ m/([a-z0-9_]+)/isg ) { my $name = $1; lookup( \LOOKUP, $name ); }` [download] Now I am back to where I started, 1 response ever 1 or 2 seconds, too slow. Is there a simple solution to thread this properly?	[reply] [d/l] [select]
Re^5: Consumes memory then crashs by BrowserUk (Patriarch) on Mar 24, 2012 at 15:40 UTC
Now I am back to where I started, 1 response ever 1 or 2 seconds, too slow. Is there a simple solution to thread this properly? Yes. Try this: #! perl -slw use strict; use threads; use threads::shared; use Thread::Queue; use LWP::Simple; my $sem :shared; sub lookup { my( $fh, $name ) = @_; my $lookup = get( "http://rscript.org/lookup.php?type=track&time=62899200&user=$ +name&skill=all" ); print "Looking up $name...\n"; if( $lookup =~ m/gain:Overall:\d+:(\d+)/isg ) { lock $sem; print { $fh } "$name $1\n"; } elsif( $lookup =~ m/(ERROR)/isg ) { lock $sem; print { $fh } "$name doesn't exist \n" } else{ lock $sem; print { $fh } "$name 0\n"; } } our $THREADS //= 4; my $names = 'zezima fred bill john jack'; my $Q = new Thread::Queue; open( LOOKUP, '>>rstlookup.txt' ) or die $!; my @threads = map async( sub { while( my $name = $Q->dequeue ) { lookup( \LOOKUP, $name ); } } ), 1 .. $THREADS; while( $names =~ m/([a-z0-9_]+)/isg ) { $Q->enqueue( $1 ); sleep 1 while $Q->pending > $THREADS 2; } $Q->enqueue( (undef) x $THREADS ); $_->join for @threads; close( LOOKUP ); __END__ [15:38:57.93] C:\test>junk39 Looking up john... Looking up bill... Looking up fred... Looking up zezima... Looking up jack... [15:39:03.07] C:\test>type rstlookup.txt bill 0 fred 135601422 zezima 417155645 john 0 jack 8133157 [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l]
Re^6: Consumes memory then crashs by mbethke (Hermit) on Mar 24, 2012 at 19:35 UTC
This is actually a good case in point of my earlier post on threads. In a way, threads are like gatling guns: if you're The Terminator and can handle them, they can be very effective; for most people however they provide a million opportunities for shooting themselves in the foot. Unlike gatlings the holes produced may be rather subtle though, and may appear after a long time of seemingly successful use---the typical heisenbugs that appear once in a while, but never while you look closely. The problem here is that `print` is not atomic, in fact most of stdio is taboo in threaded code without further protective measures. A thread may be preempted after writing a fraction of a buffer and then resume after another thread has written to the same file. In your example that waits a lot between printing lines, the probability for this to happen is really very small, but that doesn't mean it can't happen to the first two lines of output. Here's a script that provokes it: `use strict; use threads; open my $fh, '>', 'outfile' or die $!; my $th = 0; my @threads = map { $th++; async( sub { sleep(1); for(1 .. 30_000) { print $fh "Thread $th\n" + } } ); } (1 .. 500); $_->join foreach @threads; close $fh;` [download] Sample output snippet: `Thread 3Thread 349 Threead 85 Thread Thread 333 Thre59 ad 349 Thread 3ad 333 Thread 3Thread 359 Thre49 33 ad 295 ThreThread 333 Thread 8Thread 338 ThreThread 350` [download] For an application like retrieving a large number of web pages where waiting for the other side is the major cause of delays (so spreading it out over multiple cores has no significant advantage), the solution of choice is the state machine. Event based programming may look like a lot of work to wrap one's head around but in the end it's easier to understand than threads if you consider all the rather lowlevely race conditions and other synchronization issues that you have to think about to write thread code that always works and not just most of the time. Regarding modules to facilitate the implementation of said state machine, one I found easy to use (actually the only one I've ever used in production code) is ~~POE::Component::Client::UserAgent~~POE::Component::Client::HTTP. (edited, it's been a while but the name didn't sound quite right) POE is rather heavyweight though (not that it mattered much here) so AnyEvent::Curl::Multi might be worth a look too.	[reply] [d/l] [select]
Re^7: Consumes memory then crashs by davido (Cardinal) on Mar 24, 2012 at 21:20 UTC
Re^7: Consumes memory then crashs by BrowserUk (Patriarch) on Mar 24, 2012 at 23:12 UTC
Re^8: Consumes memory then crashs by zwon (Abbot) on Mar 25, 2012 at 05:34 UTC
Some notes below your chosen depth have not been shown here
Re^8: Consumes memory then crashs by mbethke (Hermit) on Mar 25, 2012 at 07:24 UTC
Some notes below your chosen depth have not been shown here
Re^6: Consumes memory then crashs by allhellno (Novice) on Mar 24, 2012 at 16:00 UTC
Excellent, this has taught me a bit and is greatly appreciated!	[reply]
Re^7: Consumes memory then crashs by BrowserUk (Patriarch) on Mar 24, 2012 at 23:17 UTC
Re^5: Consumes memory then crashs by Corion (Patriarch) on Mar 24, 2012 at 15:27 UTC
For quickly looking up a bunch of addresses, AnyEvent::DNS may be of help.	[reply]


laziness, impatience, and hubris
	PerlMonks