Why this code is so slow if run in thread?

vr has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Why this code is so slow if run in thread? by BrowserUk (Patriarch) on Dec 12, 2016 at 04:31 UTC
Sorry for my earlier misdirection. By way of recompense I have what I believe (though it is essentially untested for lack of a suitable image), that addresses both the slowness of substr on utf strings within threads (which is just weird) and the problem I thought was the cause, that of cloning the returned array. It avoids the former by doing away with the encoding, searching instead for runs of pairs of non-null characters in the unencoded pdl; and the latter by accumulating the counts in a packed binary array stored in a scalar. sub test { my $fn = shift; my $img = PDL::IO::Image-> new_from_file( $fn ) or die "Failed to +load image"; my $pdl = $img->pixels_to_pdl->short; my $s = cc8compt( $pdl != 0 ); my $str = ${ $s-> get_dataref }; my ( $w, $h ) = $s-> dims; my $bounds = pack 'n4', $w, 0, $h, 0; $bounds x= $s->max; for my $y ( 0 .. $h - 1 ) { my $s = substr( $str, 2 * $y * $w, 2 * $w ); while( $s =~ m[(?:[^\0][^\0])+]g ) { my( $l, $r ) = ( $-[0]/2, (($+[0])-1)/2 ); my $c = ord( $& ); vec( $bounds, 4$c+0, 16 ) = $l if $l < vec( $bounds, 4$c ++0, 16 ); vec( $bounds, 4$c+1, 16 ) = $r if $r > vec( $bounds, 4$c ++1, 16 ); vec( $bounds, 4$c+2, 16 ) = $y if $y < vec( $bounds, 4$c ++2, 16 ); vec( $bounds, 4$c+3, 16 ) = $y if $y > vec( $bounds, 4$c ++3, 16 ); } } return $bounds; } [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity. In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^2: Why this code is so slow if run in thread? by vr (Curate) on Dec 12, 2016 at 07:44 UTC
Here's a link: https://drive.google.com/open?id=0Bxkg1eqq0xXxVjFhSzg0cHRkQkE, if anyone wants to run these tests. Sorry for delays with feedback. Thank you everyone for answers, and, BrowserUk, for code. I will test it later.	[reply]
Re^3: Why this code is so slow if run in thread? by BrowserUk (Patriarch) on Dec 12, 2016 at 08:45 UTC
Results of my workaround on your test image: `C:\test>1177606 vrtest.png C:\test>1177606 vrtest.png No thread --------- Took:0.818306923 Count: 145 Thread --------- Took:2.834208012 Count: 145` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity. In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^4: Why this code is so slow if run in thread? by vr (Curate) on Dec 12, 2016 at 11:00 UTC
Re^5: Why this code is so slow if run in thread? by BrowserUk (Patriarch) on Dec 12, 2016 at 12:01 UTC
Some notes below your chosen depth have not been shown here
Re: Why this code is so slow if run in thread? by LanX (Saint) on Dec 11, 2016 at 13:44 UTC
Did you check if the modules you use are thread safe? Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!}	[reply]
Re^2: Why this code is so slow if run in thread? by vr (Curate) on Dec 11, 2016 at 17:38 UTC
But then Perl's Core "Encode" is not thread safe? :( Suppose I split my program in two. First one creates 2 files, with single-byte encoded data and double-byte encoded data (and says "Equal", of course -- this image has less than 255 "objects"). `use strict; use warnings; use 5.020; use PDL; use PDL::IO::Image; use PDL::Image2D; use PDL::IO::FastRaw; use Encode qw/ decode /; my $img = PDL::IO::Image-> new_from_file( 'test.png' ); my $pdl = $img-> pixels_to_pdl-> short; my $s1 = cc8compt( $pdl == 0 )-> byte; my $s2 = cc8compt( $pdl == 0 ); say 'Equal' if decode( 'UTF16LE', ${ $s2-> get_dataref }) eq ${ $s1-> get_dataref }; writefraw( $s1, "fname-1" ); writefraw( $s2, "fname-2" );` [download] The 2nd program doesn't use PDL modules, it reads data from disk (and uses some hard-coded numbers): use strict; use warnings; use 5.020; use threads; use Encode qw/ decode /; say "No thread (byte)\n---------"; say 'Count: ', scalar @{ test( 1 )}; say "\nThread (byte)\n---------"; say 'Count: ', scalar @{ threads-> create( \&test, 1 )-> join }; say "\nNo thread (short)\n---------"; say 'Count: ', scalar @{ test( 2 )}; say "\nThread (short)\n---------"; say 'Count: ', scalar @{ threads-> create( \&test, 2 )-> join }; sub test { my $arg = shift; open my $fh, '<', "fname-$arg"; binmode $fh; my $str = do { local $/; <$fh> }; close $fh; $str = decode( 'UTF16LE', $str ) if $arg == 2; my ( $w, $h ) = ( 7616, 1200 ); my @b = map { [ [ $w, 0 ], [ $h, 0 ] ] } 0 .. 145; my $t = time; for my $y ( 0 .. $h - 1 ) { my $s = substr( $str, $y * $w, $w ); while( $s =~ m[[^\0]+]g ) { my $c = ord( $& ); $b[ $c ][ 0 ][ 0 ] = $-[0] if $-[0] < $b[ $c ][ 0 +][ 0 ]; $b[ $c ][ 0 ][ 1 ] = $+[0] - 1 if $+[0] - 1 > $b[ $c ][ 0 +][ 1 ]; $b[ $c ][ 1 ][ 0 ] = $y if $y < $b[ $c ][ 1 +][ 0 ]; $b[ $c ][ 1 ][ 1 ] = $y if $y > $b[ $c ][ 1 +][ 1 ]; } } say 'Time: ', time - $t; shift @b; return \@b; } [download] The output: `No thread (byte) --------- Time: 0 Count: 145 Thread (byte) --------- Time: 1 Count: 145 No thread (short) --------- Time: 1 Count: 145 Thread (short) --------- Time: 51 Count: 145` [download]	[reply] [d/l] [select]
Re^3: Why this code is so slow if run in thread? (create) by tye (Sage) on Dec 11, 2016 at 19:24 UTC
With Perl threads, the thing that is usually the slowest is creating a thread. This is because creating a Perl thread requires cloning every data structure you have created. So I would first determine if that is where the time is being spent. You can report that by comparing the time at the start of the routine running against the time just before you ask for a thread to be created. If that is indeed where most of the time is spent, then you should do what just about anybody who has gotten good at using Perl threads does: Create your threads very early and just ship work to them, usually via a thread queue such as Thread::Queue. Though, looking at your code, I don't see where you would have a large data structure that should not have been destroyed before you create the thread. But, that data could be something cached (perhaps by accident) in some module you have pulled in, for example. Update: Or it could be time spent destroying the second interpreter instance. - tye	[reply]
Re^3: Why this code is so slow if run in thread? by ikegami (Patriarch) on Dec 11, 2016 at 19:14 UTC
Did you look at your results? They're mostly useless! We can tell that shorts are 50x slower with threads, but we have no idea how much slower bytes are with threads. Could be there not slower at all (0.9999s vs 1.0000s). Could be billions time slower (0.000000001 vs 1.000000000). Please re-run the tests with `use Time::HiRes qw( sleep );`. Or provide the data files.	[reply] [d/l]
Re: Why this code is so slow if run in thread? (Wrong!) by BrowserUk (Patriarch) on Dec 11, 2016 at 19:41 UTC
Update:I just looked back at your code as opposed to my mock-up of it and realised this can't be the cause because you are not timing the return. Why this code is so slow if run in thread? Because you are returning a ref to an array of arrays, which in the unthreaded code means simply returning a scalar; but in the threaded code, means duplicating the entire contents of that AoAs into the calling threads memory and then returning a reference to the copy. This can be demonstrated without PDL or Encode like this: `#! perl -slw use strict; use threads; use Time::HiRes qw[ time ]; sub test { my @b = map[[0,0],[0,0]], 0 .. 255; return \@b; } my $start = time; my $nothread = @{ test() }; printf "Unthreaded took: %.6f\n", time() - $start; $start = time; my $thread = @{ threads->create( \&test )->join }; printf "Threaded took: %.6f\n", time() - $start; __END__ c:\test>junk Unthreaded took: 0.000851 Threaded took: 0.012510` [download] The simple solution (for your specific posted example, which isn't using the contents of the AoA, but only the count of elements on the outer array ) ): `say 'Count: ', scalar @{ threads-> create( \&test, 'test.png' )-> join + };` [download] would be to derive the count internal to the function and thus return a scalar rather than a compound data structure: `... shift @b; return scalar @b; }` [download] If your real code actually needs the AoAs returned to the main thread, then show an example of what you are doing with it there and I can perhaps suggest something else. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity. In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]


Perl: the Markov chain saw
	PerlMonks