Thanks again to everyone for the feedback.
First, the easy one. I'm not adding a prototype. The existing routines don't have prototypes and I only use prototypes in very specific situations and this isn't one of those.
While a LoL isn't very deeply nested, it does open up the possiblity for
creating a very large number of very small anonymous arrays. Each array
adds a chunk of overhead. For smalls arrays, this overhead becomes a
significant percentage. So I like splitting up the return value some but
would make it optional.
As for adding more fields for convenience, I'm reluctant for two reasons.
The first is the same as above; it reduces the size of problem you can
address with the same resources. The second is that it makes it harder
to remember which index gives you back which bit of information.
In fact, this has made me realize that much of what I was returning is
redundant. I could reduce it from 5 down to just 2 like so:
# @a @b @a @b
# same? range range values values
( 0, 0=>0, 0=>-1, # a !
1, 1=>2, 0=> 1, # b c = b c
0, 3=>2, 2=> 2, # ! d
1, 3=>3, 3=> 3, # e = e
0, 4=>4, 4=> 4, # f ! h
1, 5=>5, 5=> 5, # j = j
0, 6=>5, 6=> 6, # ! k
1, 6=>7, 7=> 8, # l m = l m
0, 8=>9, 9=>11, # n p ! r s t
);
# @a @b @a @b
# start start values values
( 0, 0, # =
0, 0, # a !
1, 0, # b c = b c
3, 2, # ! d
3, 3, # e = e
4, 4, # f ! h
5, 5, # j = j
6, 6, # ! k
6, 7, # l m = l m
8, 9, # n p ! r s t
10, 12, #
);
by just making the loop logic a bit more complex like so:
# old loop code
########################################
my $diff= easy( \@a, \@b );
while( @$diff ) {
my( $same, $aMin, $aMax, $bMin, $bMax )=
splice @$diff, 0, 5;
# ...
}
# new loop code
########################################
my $diff= easy( \@a, \@b );
my $same= 0;
while( @$diff ) {
my( $aMin, $bMin, $aMax, $bMax )= @$diff;
--$aMax; --$bMax;
$same= 1 - $same;
splice @$diff, 0, 2;
# ...
}
which gives me a neat idea!
I'll store the data very efficiently (a single list with 2 entries per
'hunk') but provide routines for traversing the list in a convenient
manner (which also solves my 'naming' problem).
my $diff= Algorithm::Diff->new( \@a, \@b );
$diff->base( 1 ); # Return line numbers, not indices
while( $diff->next() ) {
next if $diff->same();
my $sep= '';
if( ! $diff->bRange() ) {
sprintf "%d,%dd%d\n",
$diff->get(qw( aMin aMax bMax ));
} elsif( ! $diff->aRange() ) {
sprint "%da%d,%d\n",
$diff->get(qw( aMax bMin bMax ));
} else {
$sep= "---\n";
sprintf "%d,%dc%d%d\n",
$diff->get(qw( aMin aMax bMin bMax ));
}
print "< $_" for $diff->aRange();
print $sep;
print "> $_" for $diff->bRange();
}
Note how $diff->aRange() returns the lines falling
into that range but, in a scalar context, returns the number
of lines in that range (the $aLen you wanted) which also
happens to be "false" only if there are no lines in that range
(like the bits of $diff you asked for).
Other methods would include
- $diff->prev() to move along the chunks in the other
direction
- $diff->end() to reset the iterator prematurely so
subsequently next() would go to the top while prev() would got to the bottom.
- $diff->aMin() would be the same as
$diff->get("aMin"), etc.
Takes about 60% less RAM but is much more flexible. Thanks!
Note, in case you plan to complain about me optimizing for RAM size
by resorting to OO methods, which run a bit slower than regular
subroutines (or direct manipulation of a data structure). It is
a good trade-off, IMO, because if your problem requires
30% more RAM than what you have available, then either it just
fails or it runs really, really slowly. While, it running
10% slower is a minor problem in comparison.
-
tye (but my friends call me "Tye")
|