vr has asked for the wisdom of the Perl Monks concerning the following question:
The question is related to PDL: Looking for efficient way to extract sub-images, by finding bounding boxes of "objects". But I don't think it has anything to do with PDL, images, etc. The $str is just a very long string. Using $1 instead of $& doesn't help.
Perl is 5.20.2 Win32 Strawberry, or 5.24 Win64.
use strict;
use warnings;
use 5.020;
use threads;
use PDL;
use PDL::IO::Image;
use PDL::Image2D;
use Encode qw/ decode /;
PDL::no_clone_skip_warning;
say "No thread\n---------";
say 'Count: ', scalar @{ test( 'test.png' )};
say "\nThread\n---------";
say 'Count: ', scalar @{ threads-> create( \&test, 'test.png' )-> join
+ };
sub test {
my $fn = shift;
my $img = PDL::IO::Image-> new_from_file( $fn );
my $pdl = $img-> pixels_to_pdl-> short;
my $s = cc8compt( $pdl == 0 );
my $str = decode 'UTF16LE', ${ $s-> get_dataref };
my ( $w, $h ) = $s-> dims;
my @b = map {
[ [ $w, 0 ], [ $h, 0 ] ]
} 0 .. $s-> max;
my $t = time;
for my $y ( 0 .. $h - 1 ) {
my $s = substr( $str, $y * $w, $w );
while( $s =~ m[[^\0]+]g ) {
my $c = ord( $& );
$b[ $c ][ 0 ][ 0 ] = $-[0] if $-[0] < $b[ $c ][ 0
+][ 0 ];
$b[ $c ][ 0 ][ 1 ] = $+[0] - 1 if $+[0] - 1 > $b[ $c ][ 0
+][ 1 ];
$b[ $c ][ 1 ][ 0 ] = $y if $y < $b[ $c ][ 1
+][ 0 ];
$b[ $c ][ 1 ][ 1 ] = $y if $y > $b[ $c ][ 1
+][ 1 ];
}
}
say 'Time: ', time - $t;
shift @b;
return \@b;
}
The output:
No thread
---------
Time: 2
Count: 145
Thread
---------
Time: 52
Count: 145
Re: Why this code is so slow if run in thread?
by BrowserUk (Patriarch) on Dec 12, 2016 at 04:31 UTC
|
Sorry for my earlier misdirection. By way of recompense I have what I believe (though it is essentially untested for lack of a suitable image), that addresses both the slowness of substr on utf strings within threads (which is just weird) and the problem I thought was the cause, that of cloning the returned array.
It avoids the former by doing away with the encoding, searching instead for runs of pairs of non-null characters in the unencoded pdl; and the latter by accumulating the counts in a packed binary array stored in a scalar.
sub test {
my $fn = shift;
my $img = PDL::IO::Image-> new_from_file( $fn ) or die "Failed to
+load image";
my $pdl = $img->pixels_to_pdl->short;
my $s = cc8compt( $pdl != 0 );
my $str = ${ $s-> get_dataref };
my ( $w, $h ) = $s-> dims;
my $bounds = pack 'n4', $w, 0, $h, 0;
$bounds x= $s->max;
for my $y ( 0 .. $h - 1 ) {
my $s = substr( $str, 2 * $y * $w, 2 * $w );
while( $s =~ m[(?:[^\0][^\0])+]g ) {
my( $l, $r ) = ( $-[0]/2, (($+[0])-1)/2 );
my $c = ord( $& );
vec( $bounds, 4*$c+0, 16 ) = $l if $l < vec( $bounds, 4*$c
++0, 16 );
vec( $bounds, 4*$c+1, 16 ) = $r if $r > vec( $bounds, 4*$c
++1, 16 );
vec( $bounds, 4*$c+2, 16 ) = $y if $y < vec( $bounds, 4*$c
++2, 16 );
vec( $bounds, 4*$c+3, 16 ) = $y if $y > vec( $bounds, 4*$c
++3, 16 );
}
}
return $bounds;
}
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
| [reply] |
|
C:\test>1177606 vrtest.png
C:\test>1177606 vrtest.png
No thread
---------
Took:0.818306923
Count: 145
Thread
---------
Took:2.834208012
Count: 145
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
|
|
Re: Why this code is so slow if run in thread?
by LanX (Saint) on Dec 11, 2016 at 13:44 UTC
|
| [reply] |
|
But then Perl's Core "Encode" is not thread safe? :(
Suppose I split my program in two. First one creates 2 files, with single-byte encoded data and double-byte encoded data (and says "Equal", of course -- this image has less than 255 "objects").
use strict;
use warnings;
use 5.020;
use PDL;
use PDL::IO::Image;
use PDL::Image2D;
use PDL::IO::FastRaw;
use Encode qw/ decode /;
my $img = PDL::IO::Image-> new_from_file( 'test.png' );
my $pdl = $img-> pixels_to_pdl-> short;
my $s1 = cc8compt( $pdl == 0 )-> byte;
my $s2 = cc8compt( $pdl == 0 );
say 'Equal' if decode( 'UTF16LE', ${ $s2-> get_dataref }) eq
${ $s1-> get_dataref };
writefraw( $s1, "fname-1" );
writefraw( $s2, "fname-2" );
The 2nd program doesn't use PDL modules, it reads data from disk (and uses some hard-coded numbers):
use strict;
use warnings;
use 5.020;
use threads;
use Encode qw/ decode /;
say "No thread (byte)\n---------";
say 'Count: ', scalar @{ test( 1 )};
say "\nThread (byte)\n---------";
say 'Count: ', scalar @{ threads-> create( \&test, 1 )-> join };
say "\nNo thread (short)\n---------";
say 'Count: ', scalar @{ test( 2 )};
say "\nThread (short)\n---------";
say 'Count: ', scalar @{ threads-> create( \&test, 2 )-> join };
sub test {
my $arg = shift;
open my $fh, '<', "fname-$arg";
binmode $fh;
my $str = do { local $/; <$fh> };
close $fh;
$str = decode( 'UTF16LE', $str ) if $arg == 2;
my ( $w, $h ) = ( 7616, 1200 );
my @b = map {
[ [ $w, 0 ], [ $h, 0 ] ]
} 0 .. 145;
my $t = time;
for my $y ( 0 .. $h - 1 ) {
my $s = substr( $str, $y * $w, $w );
while( $s =~ m[[^\0]+]g ) {
my $c = ord( $& );
$b[ $c ][ 0 ][ 0 ] = $-[0] if $-[0] < $b[ $c ][ 0
+][ 0 ];
$b[ $c ][ 0 ][ 1 ] = $+[0] - 1 if $+[0] - 1 > $b[ $c ][ 0
+][ 1 ];
$b[ $c ][ 1 ][ 0 ] = $y if $y < $b[ $c ][ 1
+][ 0 ];
$b[ $c ][ 1 ][ 1 ] = $y if $y > $b[ $c ][ 1
+][ 1 ];
}
}
say 'Time: ', time - $t;
shift @b;
return \@b;
}
The output:
No thread (byte)
---------
Time: 0
Count: 145
Thread (byte)
---------
Time: 1
Count: 145
No thread (short)
---------
Time: 1
Count: 145
Thread (short)
---------
Time: 51
Count: 145
| [reply] [d/l] [select] |
|
With Perl threads, the thing that is usually the slowest is creating a thread. This is because creating a Perl thread requires cloning every data structure you have created. So I would first determine if that is where the time is being spent. You can report that by comparing the time at the start of the routine running against the time just before you ask for a thread to be created.
If that is indeed where most of the time is spent, then you should do what just about anybody who has gotten good at using Perl threads does: Create your threads very early and just ship work to them, usually via a thread queue such as Thread::Queue.
Though, looking at your code, I don't see where you would have a large data structure that should not have been destroyed before you create the thread. But, that data could be something cached (perhaps by accident) in some module you have pulled in, for example.
Update: Or it could be time spent destroying the second interpreter instance.
| [reply] |
|
Did you look at your results? They're mostly useless!
We can tell that shorts are 50x slower with threads, but we have no idea how much slower bytes are with threads. Could be there not slower at all (0.9999s vs 1.0000s). Could be billions time slower (0.000000001 vs 1.000000000).
Please re-run the tests with use Time::HiRes qw( sleep );. Or provide the data files.
| [reply] [d/l] |
Re: Why this code is so slow if run in thread? (Wrong!)
by BrowserUk (Patriarch) on Dec 11, 2016 at 19:41 UTC
|
Update:I just looked back at your code as opposed to my mock-up of it and realised this can't be the cause because you are not timing the return.
Why this code is so slow if run in thread?
Because you are returning a ref to an array of arrays, which in the unthreaded code means simply returning a scalar; but in the threaded code, means duplicating the entire contents of that AoAs into the calling threads memory and then returning a reference to the copy.
This can be demonstrated without PDL or Encode like this: #! perl -slw
use strict;
use threads;
use Time::HiRes qw[ time ];
sub test {
my @b = map[[0,0],[0,0]], 0 .. 255;
return \@b;
}
my $start = time;
my $nothread = @{ test() };
printf "Unthreaded took: %.6f\n", time() - $start;
$start = time;
my $thread = @{ threads->create( \&test )->join };
printf "Threaded took: %.6f\n", time() - $start;
__END__
c:\test>junk
Unthreaded took: 0.000851
Threaded took: 0.012510
The simple solution (for your specific posted example, which isn't using the contents of the AoA, but only the count of elements on the outer array ) ): say 'Count: ', scalar @{ threads-> create( \&test, 'test.png' )-> join
+ };
would be to derive the count internal to the function and thus return a scalar rather than a compound data structure: ...
shift @b;
return scalar @b;
}
If your real code actually needs the AoAs returned to the main thread, then show an example of what you are doing with it there and I can perhaps suggest something else.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
|