Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Non-destructive array processing

by Juerd (Abbot)
on Jan 20, 2003 at 21:40 UTC ( #228501=perlmeditation: print w/ replies, xml ) Need Help??

my @array = 1..10; my $r = sub { \@_ }->(@array); while (my @chunk = splice @$r, 0, 2) { print "Chunk: @chunk\n"; } print "Original array is still intact! (@array)\n";

...or...

my @array = 1..10; for (my $i = 0; $i <= $#array; $i += 2) { my @chunk = @array[$i, $i + 1]; print "Chunk: @chunk\n"; }

Which do you prefer? Why?

Juerd
- http://juerd.nl/
- spamcollector_perlmonks@juerd.nl (do not use).

Comment on Non-destructive array processing
Select or Download Code
Re: Non-destructive array processing
by pdcawley (Hermit) on Jan 20, 2003 at 21:59 UTC
    I choose neither of the above.
    my @array = 1..10; my @ary_copy = (@array); while (my @chunk = splice @ary_copy, 0, 2) { print "Chunk: @chunk\n"; } print "Original array is still intact! (@array)\n";
    straightforward, easy to understand and not overly clever. (But the closure trick is very clever. I wonder which is faster)

      Yes, the closure trick is clever, but I don't wonder which is faster. Aside from my assumption that it must be slower due to the function overhead in Perl, the obfuscation factor alone would cause me to eschew it. However, there's a more subtle problem at work that's going to kill many programmers. Since @_ aliases the argument list, the following two lines are equivalent:

      my $r1 = sub { \@_ }->(@array); my $r2 = \@array;

      What that means is that any processing on $r is going to affect @array. The following snippet will clarify.

      use Data::Dumper; my @array = 1..10; my $aref = sub {\@_}->(@array); $_++ foreach @$aref; print Dumper \@array; my $r2 = \@array; print \@array,"\n",$r2;

      The way to get around that with a closure is to do this:

      my $r = sub {my @a = @_; \@a}->(@array);

      Clearly that's not going to be faster than simply copying the array.

      Cheers,
      Ovid

      New address of my CGI Course.
      Silence is Evil (feel free to copy and distribute widely - note copyright text)

        Heh. I'm so used to slinging objects around rather than simple scalars I just took it as read that it would be a shallow copy. Note that, in the original case that's not a problem because assigning to @chunk makes a copy of the value.
        Since @_ aliases the argument list, the following two lines are equivalent:
        my $r1 = sub { \@_ }->(@array); my $r2 = \@array;

        No they're not :-)

        The first is a reference to an array that has every element aliased to every element of @array.

        The second is a reference to @array.

        The "trick" wouldn't work otherwise, since changing $r2 will change @array. For example.

        my @array = (1..10); my $r1 = sub { \@_ }->(@array); pop @$r1; print "unchanged @array\n"; my $r2 = \@array; pop @$r2; print "changed @array\n";

        gives us

        unchanged 1 2 3 4 5 6 7 8 9 10 changed 1 2 3 4 5 6 7 8 9
      Im so glad I read the other replys before posting mine. This is exactly what I would have posted.

      The cleverness in both those examples from Juerd is ok for personal and even module code for CPAN, but IMO generally unusable within a work/production context. First off they dont really look like they do what they do, second they are confusing and error prone. Wheras yours looks exactly like what it does. No maintenance programmer is going to get confused years after ive left the company.

      ++

      --- demerphq
      my friends call me, usually because I'm late....

        No maintenance programmer is going to get confused years after I've left the company.
        Now there's a motto to live by.

        unusable within a work/production context

        I wouldn't handle large data sets in production code. This is primarily for one-time hacks, but I wondered what other people would prefer.

        No maintenance programmer is going to get confused years after ive left the company.

        Note that if code like this ever goes into production, I do of course add proper comments, including a note that you shouldn't use @$r elsewhere (re your other post).

        Juerd
        - http://juerd.nl/
        - spamcollector_perlmonks@juerd.nl (do not use).
        

      my @ary_copy = (@array); straightforward, easy to understand

      I agree, and it is exactly how I usually do this. Unfortunately, I ran into some huge data and couldn't copy without installing additional RAM :)

      But the closure trick is very clever. I wonder which is faster

      Wonder no more, unless my benchmark is wrong, of course :)

      #!/usr/bin/perl -w use strict; use Benchmark qw(cmpthese); our @array; sub pdcawley { my @copy = @array; while (my @chunk = splice @copy, 0, 2) { } } sub juerd { my $refs = sub { \@_ }->(@array); while (my @chunk = splice @$refs, 0, 2) { } } sub bench { printf "\n\e[1m%s\e[0m\n", shift; cmpthese(-10, { pdcawley => \&pdcawley, juerd => \&juerd }); } @array = (1) x 32767; bench "Long array, tiny values"; @array = ("x" x 32) x 32767; bench "Long array, small values"; @array = (1) x 32; bench "Short array, tiny values"; @array = ("x" x 32) x 32; bench "Short array, small values"; @array = ("x" x (2**20)) x 32; bench "Short array, large values"; @array = ("x" x (8 * 2**20)) x 32; bench "Short array, huge values";

      (Note: stripped)

      Long array, tiny values pdcawley 26.1/s -- -17% juerd 31.4/s 20% -- Long array, small values pdcawley 12.9/s -- -38% juerd 20.7/s 60% -- Short array, tiny values pdcawley 32909/s -- -1% juerd 33197/s 1% -- Short array, small values pdcawley 19203/s -- -17% juerd 23084/s 20% -- Short array, large values pdcawley 1.83/s -- -53% juerd 3.89/s 112% -- Short array, huge values pdcawley 4.32 -- -53% juerd 2.04 112% --

      I'd like to test it with an array of 32 elements of 20 MB each, but the copy doesn't fit in memory.

      Anyhow, it seems that using the array of aliases is much more efficient than using a copy, especially with large data sets.

      Juerd
      - http://juerd.nl/
      - spamcollector_perlmonks@juerd.nl (do not use).
      

        Crumbs. Clarity really costs in some cases doesn't it?

        I normally fight shy of commenting code if I can possibly help it, generally preferring to sweat over making the code as clear as possible, but if I found myself having to use that trick then I'd definitely fence it around with comments.

Re: Non-destructive array processing
by jmcnamara (Monsignor) on Jan 20, 2003 at 22:19 UTC

    I like this a little better:
    for my $i (0 .. $#array/2) { my @chunk = @array[2*$i, 2*$i + 1]; print "Chunk: @chunk\n"; }

    --
    John.

Re: Non-destructive array processing
by BrowserUk (Pope) on Jan 20, 2003 at 22:54 UTC

    Another alternative, which I think I prefer. Best thing is you don't get "Use of uninitialized value in join or string at ..." if the array size isn't an exact multiple of the chunk size.

    sub getIter (\@$;$) { my ($ref, $size, $next) = @_; $next ||= 0; return sub { $next = 0, return () unless $next <= $#$ref; my $start = $next; $next = $next+$size <= $#$ref ? $next+$size-1 : $#$ref; @$ref[ $start .. $next++ ] } } my $iter = getIter( @array, 2 ); while( my @chunk = $iter->() ) { print "Chunk: @chunk"; }

    Examine what is said, not who speaks.

    The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

      In what context would you use the $next variable? It looks to be useful only for setting a first value to return. (Sorta like "Skip the first N values" thingy...)

      Is that what it is?

      ------
      We are the carpenters and bricklayers of the Information Age.

      Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

        That's it exactly. Skip the $first n and then give'm to me in chunks of $size after that.


        Examine what is said, not who speaks.

        The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

      Best thing is you don't get "Use of uninitialized value in join or string at ..." if the array size isn't an exact multiple of the chunk size.

      I *want* those warnings. If the array length is not an exact multiple of the chunk size, something is wrong, and I would like to be informed. In production code, I'd let it croak if @array % 2 even before looping.

      Juerd
      - http://juerd.nl/
      - spamcollector_perlmonks@juerd.nl (do not use).
      

Re: Non-destructive array processing
by runrig (Abbot) on Jan 20, 2003 at 23:34 UTC
    Nice trick. I like it as a sort of answer to LISP-like linked lists without having to resort to 2-element array references:
    my @array = qw(a b c d); my $r = sub { \@_ }->(@array); my $a = cdr($r); print "arr: @array\n"; print "a: @$a\n"; print "r: @$r\n"; $array[2]="hello"; print "arr: @array\n"; print "a: @$a\n"; print "r: @$r\n"; sub cdr { my $r = shift; my $a = sub { \@_ }->(@$r); shift @$a; $a; }
Re: Non-destructive array processing
by elusion (Curate) on Jan 20, 2003 at 23:53 UTC
    I prefer Perl 6.
    my @array = 1..10; for @array -> $one, $two { my @chunk = ($one, $two); print "Chunk: @chunk\n"; }
    However, if I need to use Perl 5, I'd pick BrowserUK's method, or something like it. It appears a little overkill for the example, but the example is a bit contrived.

    elusion : http://matt.diephouse.com

      I prefer Perl 6.

      So do I, so do I...

      Wouldn't for @array -> @chunk[0, 1], &foo work? With @chunk predeclared, of course. (Not that I ever need the chunk as an array. I use it only to store whatever splice returns. I love the Perl 6 syntax.)

      Juerd
      - http://juerd.nl/
      - spamcollector_perlmonks@juerd.nl (do not use).
      

(jeffa) Re: Non-destructive array processing
by jeffa (Chancellor) on Jan 21, 2003 at 02:43 UTC
    I think the Perl 6 solution looks the best, but if i had to pick one of your original two, it would be the former. I think the predicate for the while loop looks, while the predicate for the for loop looks like C. I do love the way you copy the array, though ... that's sure to make some ears bleed. ;) But ... is there any benefit in doing so? Isn't my @r = @array; just as effective, or i am i missing a scalability issue here?

    I have grown to dislike the C-style for(;;;) over the years, so much that i like to sometimes substitute a bit of speed for evilness such as:

    my @array = 1..10; for my $i (grep $_%2, 0..$#array) { my @chunk = @array[$i - 1, $i]; print "Chunk: @chunk\n"; } print "Original array is still intact! (@array)\n";
    However, my wise uncle would remind me that your second snippet is the best because it is simple and brute force. It has more potential to have a wider audience of programmers understand how it ticks then the first snippet does. I still like to make ears bleed, tho ... ;)

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    

      It has more potential to have a wider audience of programmers understand how it ticks then the first snippet does.

      Yeah :(

      Too bad the average Perl coder is stupid :(
      No offense...

      Juerd
      - http://juerd.nl/
      - spamcollector_perlmonks@juerd.nl (do not use).
      

        Too bad the average Perl coder is stupid :(

        Yeah, stupid people shouldn't be allowed on the Internet.

        In fact, they shouldn't be allowed to even program in Perl. Actually, why stop there? They shouldn't even be allowed to run Perl programs. Perl should just be for us 31337 h4x0rs. All these n00bs just hurt our image. I say we make Perl as hard as possible to use, that way these people won't even be able to figure it out. Fsck the maintenance programmers, they should be fired if they can't figure it out. Fsck people trying to learn from the code, if they can't figure it out they shouldn't be a programmer. I am great, everyone else is a fool.

        Software isn't about elitism, it's about empowering people. The more diverse and accessible the tools, the better off society will be. Programming languages can be designed to allow maximum usebility, both by the expert and by the novice. It neednt be a choice between marginalizing one group or the other. Perl is an excellent example of this, let's keep it that way.

        Peace :-)


        Reason: BrowserUk Delete: Inflammatory

        For more information on this node visit: this

Re: Non-destructive array processing
by Gilimanjaro (Hermit) on Jan 21, 2003 at 11:02 UTC
    Isn't this what local is for? Something like the following;

    my @array = 1..10; { local @array = @array; while (my @chunk = splice @array, 0, 2) { print "Chunk: @chunk\n"; } } print "Original array is still intact! (@array)\n";
      Nope, because local() doesn't work on my variables. Nice thought, though.

      jdporter
      The 6th Rule of Perl Club is -- There is no Rule #6.

      Nearly.
      local *array = \@array;
      This will work with lexically scoped variables too.

      Makeshifts last the longest.

        But what will simply alias the lexical @array with the dynamical @array. So I don't see what you've achieved by doing this.

        One problem with this that you probably didn't foresee is that lexicals are resolved before dynamic variables. Example:

        my @foo = 1..4; local *foo = ['a'..'d']; print @foo; # 1234
        The problem is solved through our() since that creates an aliased lexical:
        my @foo = 1..4; our @foo = 'a'..'d'; print @foo; # abcd

        ihb
      No, but you can use my:
      my @array = 1..10; { my @array = @array; while (my @chunk = splice @array, 0, 2) { print "Chunk: @chunk\n"; } } print "Original array is still intact! (@array)\n";
      Note though that modifications of an @array element (i.e. via $array[$n]) will disappear when the scope is left. This is simply because the inner @array simply is another variable with the values copied. The idea of Juerd's routine was that the elements would be aliased but the array different.

      Hope I've helped,
      ihb
Re: Non-destructive array processing
by jdporter (Canon) on Jan 21, 2003 at 22:28 UTC
    Hey, that's a neat trick! But how about:
    my @array = 1..10; sub { while ( my @chunk = splice @_, 0, 2 ) { print "Chunk: @chunk\n"; } }->( @array ); print "Original array is still intact! (@array)\n";

    jdporter
    The 6th Rule of Perl Club is -- There is no Rule #6.

Re: Non-destructive array processing
by Aristotle (Chancellor) on Jan 22, 2003 at 16:16 UTC
    Life is not a JAPH contest.
    sub make_get_pairs { my $alias = \@_; sub { splice @$alias, 0, 2 } } my @foo = 1..10; my $get_foo_pair = make_get_pairs(@foo); while (my @chunk = $get_foo_pair->()) { print "Chunk: @chunk\n"; } print "\@foo is still intact: @foo\n";

    Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://228501]
Approved by gmax
Front-paged by jarich
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (11)
As of 2014-11-21 12:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (111 votes), past polls