http://www.perlmonks.org?node_id=590847


in reply to Re^2: secret code generator
in thread secret code generator

Why bother with a function

Because it allows for code reuse:

my $lower = gen( 97 .. 122 ); my $nums = gen( 48 .. 57 );
an anonymous function

Because it allows me to run multiple independent generators:

#! perl -slw use strict; sub gen { my @set = map chr,@_; my @is = 0; return sub { my $di = $#is; { if( ( $is[ $di ] ||= 0 ) == $#set ) { $is[ $di ] = 0; unshift @is, $di = 0 if --$di < 0; redo; } ++$is[ $di ]; return [ @set[ @is ] ]; } }; } my $lower = gen( 97 .. 122 ); my $nums = gen( 48 .. 57 ); print join'', @{ $lower->() }, '-', @{ $nums->() } for 1 .. 100;
and all those nasty indices?

Ignoring that [-1] is an index, one reason is efficiency.

I ran a benchmark of your original code above against my code above and it came out to be 7000% slower!

Now, I realise that either there is a bug in the code (which can be fixed) or my attempt to tweak it so that it would self-terminate after producing a set number of iterations may have broken it. But, having spent an hour trying to figure out how it is meant to work, I failed. So I could not make the correction.

And that bring me to the second reason. Those "nasty indices", make for a eminently understandable algorithm. The generating statement, return [ @set[ @is ] ]; is obviously an array slice using the array of indices @is to select characters from the set. The rest of the code is a simple loop incrementing the indices in sequences, and 'rolling over', when each one reaches the number of characters in the set. This is what Limbic~Region dubbed the odometer pattern.

In your first attempt, you have

  1. a loop;
  2. calling a sub which generates an anonymous sub and a list of array references, and passes them to ...
  3. another sub which calls another sub passing the anonymous sub and the list of array references;
  4. which creates another anonymous sub that calls the first anonymous sub in a loop,
  5. and then recurses on itself passing the second anonymous sub to loop over the first anonymous sub.

It took me nearly an hour to try and type that description, and I'm still not sure it's right. I can see no benefit in that level of complexity for any task, let alone this one. Even if it worked. And to think I get accused of posting unmaintainable code.

When I wrapped your second attempt into a sub in order to benchmark it, it produced reams of warnings. There appears to be an out-by-one error when it is run multiple times. Possible due to my attempt to make it stop after a set number of iterations? But since I (again) couldn't work out how to correct the error, attempting to trace the contents of your @stack which seems to accumulate a lot of content, all but the first element of which you discard, defeated me. Again, unwarranted complexity.

To my critique of the use of OO

Some of the benefits I've attributed to my solution--multiplicity and code reuse--could also be cited as reason for using OO in the Python solution posted by the OP, but I would like to counter them.

The class contains a constructor, 2 methods--value and next and an iterator. In order for the object state to make a transition, both the methods have to be called in sequence, which is a useless split of functionality, and is probably why it is seen as necessary to add the iterator method. But, unintuitive, the value method has to be used to get the current value before the next method is called, which means that the value method has to temporarily cache the value.

Finally, as there is no possibility of resetting the generator, it becomes a 'use once and discard' instantiation. The only instance variables it has are those used entirely for it's own internal workings. And the only methods it has, other than the instantiator, could equally effectively be combined into a single method. This makes it (IMO, but also that of others), a "useless use of OO". OO for the sake of OO. OO dogma.

So each of these objects is a single use object, with only internal use state and a single method for getting the results. That is an unusual, but accurate description of an iterator subroutine. And exactly what my solution provided. I don't have Python--I threw it away about a week after I installed it because I really disliked the significant whitespace aspect of it--so I couldn't benchmark that. Instead, I converted it verbatim to Perl and compared it against my solution. Even without going through the hoops of adding performance sapping accessor/mutator methods, it's about half the speed.

More importantly, it's considerably more complex. The individual methods are simple enough, and easy enough to construct and maintain, but the complexity lies in the recursive nature of the code and data structure with one instance methods calling those of it's nested instances. There is 'hidden' state maintained in the depth of the structure--the length of the passwords being generated--along with the current state of the overall result being maintained across those multiple, nested instances. This makes it very difficult to debug or trace. Harder even that a normal recursive function.

Imagine for instance if it was decided to limit the length of the passwords to say 8 characters so that the iterator will eventually self terminate. It becomes necessary to add a test to see how deep the chain of instances is, by chaining down the self.Outer chain until it bottoms out counting as you go. But that would also require that the depth test was only initiated from the top level instance, which is another test to be run at every iteration.

In summary

I know 'efficiency' doesn't rate too highly around here, but since to generate a list of all the passwords to a maximum of 8 characters would require 5.6e15 iterations, each microsecond saved will save 177 years on the overall task! With over 50% savings relative to the (lightweight OO) version, and 7000% compared to your first attempt, these are not insignificant savings.

Even if you ignore efficiency on the basis that no one would attempt to generate such a large number of passwords--though I see no other use for this code--(IMO), my solution wins over the others on the grounds of usability and maintainability also.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^4: secret code generator
by tilly (Archbishop) on Dec 21, 2006 at 03:58 UTC
    Are you finished the self-congratulation?

    The fact is that if I did not know what your index-based code was supposed to be doing, I would have had a dickens of a time figuring out what that code does. Perhaps in your coding style names like $di and @is make sense to you, but they are meaningless to me. And presumably would be meaningless to any other maintainance programmer.

    Now about my first attempt. How it works doesn't matter. That it does, does. The key piece is

    my $x = 0; while (++$x) { nested_for( sub {print join "", @_, "\n";} , map \@chars, 1..$x ); }
    which means for every length of string, print all combinations. How does that work? Well it depends on nested_for, which should be pushed somewhere else and documented. In fact it is useful enough that I'd want to see something like that on CPAN. Lo and behold, it has been put on CPAN! See Algorithm::Loops' NestedLoops function. (Thanks, tye.) I avoided using the module largely because I wanted to give the complete solution rather than hiding any bits.

    Why did I pick that solution? Well I knew I had the code for that lying around, easy to pick up, so it was very fast to code. And secondly the Python programmer was obviously trying to snow the (admittedly novice) Perl programmer with techniques the Perl guy didn't know, so I was coming back with techniques the Python guy probably wouldn't know. In other words I was being cute. And was definitely not trying to be maintainable. (Though once the complicated bit is pushed to a CPAN module, it actually is maintainable as well.)

    Now the solution that you're deriding did have a bug. I just fixed that. Guess I should have tested it first, but I was just trying to use even fewer features than you did. (And the reason why I did that is that you were criticizing me for using so many features.) As for how it works, it is the mirror image of yours, but rather than keep an array and then move an index, I'm just keeping a partial array of what is left to go through. It is slower because I'm constantly creating and destroying arrays of scalars while you aren't.

    About performance, that was obviously not a goal of mine. But even so, if every microsecond it takes to generate each answer adds 177 years on the overall task, then neither of our solutions are going to get through all of the 8 character passwords any time soon.

      How does that work? Well it depends on nested_for, which should be pushed somewhere else and documented. In fact it is useful enough that I'd want to see something like that on CPAN. Lo and behold, it has been put on CPAN! See Algorithm::Loops' NestedLoops function.

      Oh no. That's utter *(*^%&. The relationship between the code you posted (attempt 1), and tye's NestedLoops() is as that between a Ferrari 599 and an Ox cart. tye's is highly tuned and blazingly fast. Yours--dog slow.

      I fully understand how NestedLoops() works, I just cannot wrap my brain around the interface.

      And the reason why I did that is that you were criticizing me for using so many features.

      Hmm. The funny bit is, I don't think I had even seen your code when I posted. Either way, my critisisms were aimed solely at the Python code.

      About performance, that was obviously not a goal of mine. But even so, if every microsecond it takes to generate each answer adds 177 years on the overall task, then neither of our solutions are going to get through all of the 8 character passwords any time soon.

      Maybe not with a 93 character set, but maybe for a 26 char set. Then each microsecond is only 58 hours. A long long time, but doable--if you don't waste time with overly complicated algorithms.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        I wasn't particularly impressed with the readability of any of the previously presented solutions, so I have added a few more solutions below (two of my own: #2 and #3, and one based on NestedLoops: #4). For readability, I like #4 (NestedLoops) best and #3 next best. #2 is an optimized version of #3 which is still much more readable IMO than the other solutions. I have tried to keep the solutions close in spirit to the original #1 (by Tilly). The technique of passing an accumulator is used by a lot of Lisp idioms.

        As far as benchmarking goes, #1 is faster than #3, but #2 is the fastest by a substantial margin. #4 (NestedLoops) is dead last. I have included the benchmarking code for you to try yourself.

        All the solutions except #1 (the original solution) work by permuting the rightmost "digit" fastest. #1 permutes the leftmost "digit" fastest. Inserting the two calls to reverse (which I have commented out below) slows it down a bit but not enough to affect it's ranking.

        #! /usr/bin/perl -w use strict; use Benchmark; use Algorithm::Loops qw(NestedLoops); use vars qw(@chars $loop_cnt $printit); $loop_cnt = 3; @chars = map chr, 0x21 .. 0x7e; # @chars = ("a" .. "c"); sub printit { print @_ if $printit; } #------------------------------------------------ sub doit4 { my $x = 1; while ($x <= $loop_cnt) { NestedLoops( [ map \@chars, 1 .. $x++ ], sub { printit join "", @_, "\n" } ); } } #------------------------------------------------ sub ret_iter3 { my $accum = shift; my $range = shift; if ($range) { ret_iter3( [@$accum, $_], @_ ) for @$range } else { printit join "", @$accum, "\n" } } sub nested_for3 { ret_iter3 [], @_; } sub doit3 { my $x = 1; while ($x <= $loop_cnt) { nested_for3 map \@chars, 1 .. $x++; } } #------------------------------------------------ sub ret_iter2 { my $accum = shift; my $range = shift; if (@_) { for (@$range) { push @$accum, $_; ret_iter2($accum, @_); pop @$accum; } } else { printit join "", @$accum, $_, "\n" for @$range; } } sub nested_for2 { ret_iter2 [], @_; } sub doit2 { my $x = 1; while ($x <= $loop_cnt) { nested_for2 map \@chars, 1 .. $x++; } } #------------------------------------------------ sub doit1 { my $x = 0; while (++$x <= $loop_cnt) { nested_for1( sub {printit join "", @_, "\n";} , map \@chars, 1..$x ); # nested_for1( # sub {printit join "", reverse(@_), "\n";} # , reverse(map \@chars, 1..$x) # ); } } sub nested_for1 { ret_iter(@_)->(); } sub ret_iter { my $fn = shift; my $range = shift; my $sub = sub {$fn->($_, @_) for @$range}; return @_ ? ret_iter($sub, @_) : $sub; } #------------------------------------------------ # $printit = 1; # doit1; # doit2; # doit3; # doit4; # exit; timethese (-10, { doit1 => 'doit1', doit2 => 'doit2', doit3 => 'doit3', doit4 => 'doit4', });