Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^2: [OT] Swapping buffers in place.

by BrowserUk (Patriarch)
on Mar 01, 2015 at 20:24 UTC ( [id://1118311]=note: print w/replies, xml ) Need Help??


in reply to Re: [OT] Swapping buffers in place.
in thread [OT] Swapping buffers in place.

Okay. I jiggered and poked it (my apologies if you're sensitive to such things), and added a little trace and got this:

#! perl -slw use strict; sub show { my ( $ary, $swaps ) = @_; printf "%2d swaps => %s\n", $swaps, join ' ', @$ary; } sub swapElems { my $t = $_[0][ $_[1] ]; $_[0][ $_[1] ] = $_[0][ $_[2] ]; $_[0][ $_ +[2] ] = $t; print "[@{$_[0]}] $_[1] <=> $_[2]" } sub do_swaps { my( $ary, $x, $y, $swaps ) = @_; print "do_swaps: $x .. $y"; return $swaps if $y == @$ary; my $saved_y = $y; for( $x .. $y - 1 ) { $y = $saved_y if $y == @$ary; swapElems( $ary, $x++, $y++ ); ++$swaps; } return do_swaps( $ary, $x, $y, $swaps ); } sub swap { my ( $ary, $offset ) = @_; my $swaps = do_swaps( $ary, 0, $offset, 0 ); show( $ary, $swaps ); } #swap( [ qw( a b c 1 2 3 ) ], 3 ); # aref, offset of Y0 #swap( [ qw( a b c d 1 2 ) ], 4 ); #swap( [ qw( a b c d e 1 ) ], 5 ); swap( [ qw( a b c d e f g h i j 1 2 3 4 5 6 7 8 ) ], 10 ); #swap( [ qw( a b c d e f 1 2 3 ) ], 6 );

Which when run on the 6/3 testcase produces:

do_swaps: 0 .. 6 [1 b c d e f a 2 3] 0 <=> 6 [1 2 c d e f a b 3] 1 <=> 7 [1 2 3 d e f a b c] 2 <=> 8 [1 2 3 a e f d b c] 3 <=> 6 [1 2 3 a b f d e c] 4 <=> 7 [1 2 3 a b c d e f] 5 <=> 8 do_swaps: 6 .. 9 6 swaps => 1 2 3 a b c d e f

6 swaps is a perfect score. and the logic is very clear and very clean. So clean it is beautiful (which always feels right!):

  1. Switch the 3 from end with the 3 from the front.
  2. Then switch 3 from the middle with the 3 from the end.

With a 10/7 testcase things are little more muddy:

do_swaps: 0 .. 10 [1]b c d e f g h i j[a]2 3 4 5 6 7] 0 <=> 10 [1[2]c d e f g h i j a[b]3 4 5 6 7] 1 <=> 11 [1 2[3]d e f g h i j a b[c]4 5 6 7] 2 <=> 12 [1 2 3[4]e f g h i j a b c[d]5 6 7] 3 <=> 13 [1 2 3 4[5]f g h i j a b c d[e]6 7] 4 <=> 14 [1 2 3 4 5[6]g h i j a b c d e[f]7] 5 <=> 15 [1 2 3 4 5 6[7]h i j a b c d e f[g] 6 <=> 16 * [1 2 3 4 5 6 7 a[i]j h[b]c d e f g] 7 <=> 10 [1 2 3 4 5 6 7 a b[j]h i[c]d e f g] 8 <=> 11 [1 2 3 4 5 6 7 a b c[h]i j[d]e f g] 9 <=> 12 do_swaps: 10 .. 13 [1 2 3 4 5 6 7 a b c d[i]j h[e]f g] 10 <=> 13 [1 2 3 4 5 6 7 a b c d e[j]h i[f]g] 11 <=> 14 [1 2 3 4 5 6 7 a b c d e f[h]i j[g] 12 <=> 15 do_swaps: 13 .. 16 [1 2 3 4 5 6 7 a b c d e f g[i]j[h] 13 <=> 16 [1 2 3 4 5 6 7 a b c d e f g h[j|i] 14 <=> 16 [1 2 3 4 5 6 7 a b c d e f g h i j] 15 <=> 16 do_swaps: 16 .. 17 16 swaps => 1 2 3 4 5 6 7 a b c d e f g h i j
  1. Switch the 107 from the end, with the 107 from the front.
  2. Then switch the 3 so far unmoved with the adjacent 3 that have.
  3. The wiggle those latter 3 the rest of the way to the end in a series of interleaved swaps, in 2 passes.

I'm not sure what to think about that.

It sure looks like that, after the first 7(*) have been moved, then you could recurse to move the 3 to the end.

Ie. As your algorithm works regardless of long/short ordering, treat it as a 3/7 input.

Of course you'd have to 'lie' about the length of the array, which is easy to do in C, but not so in Perl.

I'm not sure if that is optimal -- I haven't (tried to) figured a formula, but it won't be grossly over -- but it sure looks like (that phrase again), that after a clean start, the end seems (perhaps, unnecessarily) busy?

I'll need to code yours and hdbs algorithms, (and finish my own/graff/bitingduck version, if I can), in C and run them on some big buffers with awkward ratios:

eg. 2GB left buffer and 536870909 & 536870923 (primes either side of 2GB/4) which ought to trigger pathological behaviour if it exists.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Replies are listed 'Best First'.
Re^3: [OT] Swapping buffers in place.
by Anonymous Monk on Mar 01, 2015 at 23:56 UTC
    Well the basic idea is this: suppose we have an array like
    1 2 3 4 5 6 a b c
    initially, x points to 1, y points to a:
    x y 1 2 3 4 5 6 a b c
    We want to move a b c all the way to the left,so we swap numbers and letters:
    x y a b c 4 5 6 1 2 3
    Now the rightmost elements are scrambled. Note that y at this stage points to the element one off the right end of the array. We restore its position:
    $y = $offset_y if $y == @$ary; x y a b c 4 5 6 1 2 3
    Now we basically have a smaller array 4 5 6 1 2 3 and we want to swap 4 5 6 and 1 2 3. Hence, recursion. But it's a tail call, aka 'while loop in disguise', so:
    sub do_swaps { my ( $ary, $offset_x, $offset_y, $swaps ) = @_; my ( $x, $y ) = ( $offset_x, $offset_y ); while ( $offset_y != @$ary ) { for ( $offset_x .. ( $offset_y - 1 ) ) { $y = $offset_y if $y == @$ary; my $temp = $ary->[$x]; $ary->[$x] = $ary->[$y]; $ary->[$y] = $temp; $x += 1; $y += 1; $swaps += 1; } ( $offset_x, $offset_y ) = ( $x, $y ); } return $swaps; }
    Same thing... I think :)

      And we have a (provisional) winner! (Needs more testing.)

      Your (my C version of) tail recursive algorithm beats (my C version of) roboticus' implementation of his/graff's/hdb's/bitingduck's iterative algorithm on a 3GB buffer with a 2GB offset.

      The output shows 10 elements at the beginning; either side of the offset; and at the end; before and after the swap:

      C:\test\C>bufswap 402653184 268435456 size:402653184 offset;268435456 [ 0 1 2 3 4 5 6 7 8 9 ...268435446 268435 +447 268435448 268435449 268435450 268435451 268435452 268435453 26843 +5454 268435455 268435456 268435457 268435458 268435459 268435460 2684 +35461 268435462 268435463 268435464 268435465 ...402653174 402653175 +402653176 402653177 402653178 402653179 402653180 402653181 402653182 + 402653183 ] [268435456 268435457 268435458 268435459 268435460 268435461 268435462 + 268435463 268435464 268435465 ...134217718 134217719 134217720 13421 +7721 134217722 134217723 134217724 134217725 134217726 134217727 1342 +17728 134217729 134217730 134217731 134217732 134217733 134217734 134 +217735 134217736 134217737 ...268435446 268435447 268435448 268435449 + 268435450 268435451 268435452 268435453 268435454 268435455 ] iterative took 3.217107510 [ 0 1 2 3 4 5 6 7 8 9 ...268435446 268435 +447 268435448 268435449 268435450 268435451 268435452 268435453 26843 +5454 268435455 268435456 268435457 268435458 268435459 268435460 2684 +35461 268435462 268435463 268435464 268435465 ...402653174 402653175 +402653176 402653177 402653178 402653179 402653180 402653181 402653182 + 402653183 ] [268435456 268435457 268435458 268435459 268435460 268435461 268435462 + 268435463 268435464 268435465 ...134217718 134217719 134217720 13421 +7721 134217722 134217723 134217724 134217725 134217726 134217727 1342 +17728 134217729 134217730 134217731 134217732 134217733 134217734 134 +217735 134217736 134217737 ...268435446 268435447 268435448 268435449 + 268435450 268435451 268435452 268435453 268435454 268435455 ] recursive took 2.445034532

      The C code (needs some sanity checks):

      /* Drawn from roboticus' (1118303) & hdb's(1118256) iterative implemen +tations (Algorithm also described by graff & bitingduck) */ void xlateBuffersI( register U64 *p, const U32 size, const U32 offset +) { U32 register cnt = size, dst = 0, cmp = 0; U64 register tmp = p[ dst ]; assert( offset < size ); assert( ( (U64)p & 7ull ) == 0 ); while( cnt-- ) { U32 register src = ( dst + offset ) % size; if( src == cmp ) { p[ dst ] = tmp; dst = ++cmp; tmp = p[ dst ]; } else { p[ dst ] = p[ src ]; dst = src; } } return; } __forceinline void swapElems( register U64 *p, const U32 x, const U32 +y ) { register U64 temp = p[ x ]; p[ x ] = p[ y ]; p[ y ] = temp; } /* Adapted from anonymonk's post 1118262 */ void xlateBuffersR( U64 *p, U32 size, U32 x, U32 y ) { U32 i, savedY = y; if( y == size ) return; for( i = x; i < savedY; ++i ) { if( y == size ) y = savedY; swapElems( p, x++, y++ ); } xlateBuffersR( p, size, x, y ); }

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
        Glad to help, BUK. Make sure you test it well, cause I'm not at all confident there aren't bugs in my code ;) But the idea is sound, IMO.

      Actually, it turns out its not (just) the number of swaps made.

      Even when the iterative version makes the same number (actually 1 more on a count of 536 million+); I can induce it to produce pathological timings (23.6 seconds versus 3.6 seconds):

      C:\test\C>bufswap 536870855 268435399 size:536870855 offset;268435399 [ 0 1 2 3 4 5 6 7 8 9 ...268435389 268435 +390 268435391 268435392 2684353 ... [268435399 268435400 268435401 268435402 268435403 268435404 268435405 + 268435406 268435407 268435408 ... iterative: swaps:536870855 took 23.625009728 secs. [ 0 1 2 3 4 5 6 7 8 9 ...268435389 268435 +390 268435391 268435392 2684353 ... [268435399 268435400 268435401 268435402 268435403 268435404 268435405 + 268435406 268435407 268435408 ... recursive: swaps:536870854 took 3.580721198 secs.

      Looking back at the trace output I generated from the perl versions, the only thing I can put the difference down to is cache misses.

      By choosing a left buffer of 268,435,456 U64 elements, (for a 2GB allocation) and a right buffer of 268,435,399 U64s, (the largest prime less), it induces the iterative versions modulo arithmetic to hit elements effectively randomly throughout the buffer; which cause massive cache misses.

      The recursive version tends to work through the array sequentially, minimising the cache misses and demostrating the importance of doing so very nicely.

      Took me a while to figure out the reason, but I've never seen a better demonstration of the importance of locality of reference.

      Which now is going to make me go back and look at bitingduck's triple-reverse mechanism.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

        BrowserUk:

        Yeah, I don't expect it would be a cache-friendly algorithm. I didn't expect you were swapping buffers that large, or I'd've probably gone with something that would do blocks at a time, rather than entries. I didn't check the assembler, either, but I'd the extra conditionals mine does would also jam things up.

        Anonymonk's version looks pretty nifty, so I'll have to go over it to be sure I know what it's doing. ;^)

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

      'Busier' example:
      x y a b 1 2 3 4 5 6 7 x y 1 2 a b 3 4 5 6 7 -- recurse x y 1 2 3 4 a b 5 6 7 -- recurse x y 1 2 3 4 5 6 a b 7 x y 1 2 3 4 5 6 7 b a -- first swap on this recusion level x y 1 2 3 4 5 6 7 b a -- restore y's position x y 1 2 3 4 5 6 7 a b -- second swap and recurse $y == @$ary now, so stop.
      Something like that...
        x y 1 2 3 4 5 6 7 a b -- second swap and recurse
        off by one :) should be, of course:
        x y 1 2 3 4 5 6 7 a b -- second swap and recurse

      I added back your swap count code and added some to the iterative version, and its easy to see the reason yours is faster:

      C:\test\C>bufswap 402653184 268435456 size:402653184 offset;268435456 [ 0 1 2 3 4 5 6 7 8 9 ...268435446 268435 +447 268435448 268435449 268435450 268435451 268435452 268435453 26843 +5454 268435455 268435456 268435457 268435458 268435459 268435460 2684 +35461 268435462 268435463 268435464 268435465 ...402653174 402653175 +402653176 402653177 402653178 402653179 402653180 402653181 402653182 + 402653183 ] [268435456 268435457 268435458 268435459 268435460 268435461 268435462 + 268435463 268435464 268435465 ...134217718 134217719 134217720 13421 +7721 134217722 134217723 134217724 134217725 134217726 134217727 1342 +17728 134217729 134217730 134217731 134217732 134217733 134217734 134 +217735 134217736 134217737 ...268435446 268435447 268435448 268435449 + 268435450 268435451 268435452 268435453 268435454 268435455 ] iterative: swaps:402653184 took 3.218781038 secs. [ 0 1 2 3 4 5 6 7 8 9 ...268435446 268435 +447 268435448 268435449 268435450 268435451 268435452 268435453 26843 +5454 268435455 268435456 268435457 268435458 268435459 268435460 2684 +35461 268435462 268435463 268435464 268435465 ...402653174 402653175 +402653176 402653177 402653178 402653179 402653180 402653181 402653182 + 402653183 ] [268435456 268435457 268435458 268435459 268435460 268435461 268435462 + 268435463 268435464 268435465 ...134217718 134217719 134217720 13421 +7721 134217722 134217723 134217724 134217725 134217726 134217727 1342 +17728 134217729 134217730 134217731 134217732 134217733 134217734 134 +217735 134217736 134217737 ...268435446 268435447 268435448 268435449 + 268435450 268435451 268435452 268435453 268435454 268435455 ] recursive: swaps:268435456 took 2.436202970 secs.

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
      Now we basically have a smaller array 4 5 6 1 2 3 and we want to swap 4 5 6 and 1 2 3. Hence, recursion.
      (actually, in this particular example we still have 3 iterations left in the for loop. The function recurses if the size of the left part of the array is less then or equal to the right part)
        (damn, I must be tired. disregard that part about recursion :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1118311]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2024-04-23 15:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found