Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Algorithm::Diff icult to use

by Aristotle (Chancellor)
on Aug 21, 2002 at 19:18 UTC ( [id://191842]=note: print w/replies, xml ) Need Help??


in reply to Algorithm::Diff icult to use

I strongly second the LoL notion. Also, I think it would be nice if each subarray contained $same, $aMin, $aMax, $aLen, $bMin, $bMax, $bLen seeing as $aLen = $aMax - $aMin + 1, which is clumsy if you need it often. Also, it might useful if $same was rather $diff, which would be 0 if no differences, 1 if $bLen == 0, 2 if $aLen == 0 and 3 if we have two different line runs. A (\@\@) prototype on the function might be nice; I'm not sure how often one will want to pass anonymous arrays constructed on the fly into it.

Makeshifts last the longest.

Replies are listed 'Best First'.
(tye)Re: Algorithm::Diff icult to use
by tye (Sage) on Aug 22, 2002 at 07:42 UTC

    Thanks again to everyone for the feedback.

    First, the easy one. I'm not adding a prototype. The existing routines don't have prototypes and I only use prototypes in very specific situations and this isn't one of those.

    While a LoL isn't very deeply nested, it does open up the possiblity for creating a very large number of very small anonymous arrays. Each array adds a chunk of overhead. For smalls arrays, this overhead becomes a significant percentage. So I like splitting up the return value some but would make it optional.

    As for adding more fields for convenience, I'm reluctant for two reasons. The first is the same as above; it reduces the size of problem you can address with the same resources. The second is that it makes it harder to remember which index gives you back which bit of information.

    In fact, this has made me realize that much of what I was returning is redundant. I could reduce it from 5 down to just 2 like so:

    # @a @b @a @b # same? range range values values ( 0, 0=>0, 0=>-1, # a ! 1, 1=>2, 0=> 1, # b c = b c 0, 3=>2, 2=> 2, # ! d 1, 3=>3, 3=> 3, # e = e 0, 4=>4, 4=> 4, # f ! h 1, 5=>5, 5=> 5, # j = j 0, 6=>5, 6=> 6, # ! k 1, 6=>7, 7=> 8, # l m = l m 0, 8=>9, 9=>11, # n p ! r s t ); # @a @b @a @b # start start values values ( 0, 0, # = 0, 0, # a ! 1, 0, # b c = b c 3, 2, # ! d 3, 3, # e = e 4, 4, # f ! h 5, 5, # j = j 6, 6, # ! k 6, 7, # l m = l m 8, 9, # n p ! r s t 10, 12, # );

    by just making the loop logic a bit more complex like so:

    # old loop code ######################################## my $diff= easy( \@a, \@b ); while( @$diff ) { my( $same, $aMin, $aMax, $bMin, $bMax )= splice @$diff, 0, 5; # ... } # new loop code ######################################## my $diff= easy( \@a, \@b ); my $same= 0; while( @$diff ) { my( $aMin, $bMin, $aMax, $bMax )= @$diff; --$aMax; --$bMax; $same= 1 - $same; splice @$diff, 0, 2; # ... }

    which gives me a neat idea!

    I'll store the data very efficiently (a single list with 2 entries per 'hunk') but provide routines for traversing the list in a convenient manner (which also solves my 'naming' problem).

    my $diff= Algorithm::Diff->new( \@a, \@b ); $diff->base( 1 ); # Return line numbers, not indices while( $diff->next() ) { next if $diff->same(); my $sep= ''; if( ! $diff->bRange() ) { sprintf "%d,%dd%d\n", $diff->get(qw( aMin aMax bMax )); } elsif( ! $diff->aRange() ) { sprint "%da%d,%d\n", $diff->get(qw( aMax bMin bMax )); } else { $sep= "---\n"; sprintf "%d,%dc%d%d\n", $diff->get(qw( aMin aMax bMin bMax )); } print "< $_" for $diff->aRange(); print $sep; print "> $_" for $diff->bRange(); }

    Note how $diff->aRange() returns the lines falling into that range but, in a scalar context, returns the number of lines in that range (the $aLen you wanted) which also happens to be "false" only if there are no lines in that range (like the bits of $diff you asked for).

    Other methods would include

    • $diff->prev() to move along the chunks in the other direction
    • $diff->end() to reset the iterator prematurely so subsequently next() would go to the top while prev() would got to the bottom.
    • $diff->aMin() would be the same as $diff->get("aMin"), etc.
    Takes about 60% less RAM but is much more flexible. Thanks!

    Note, in case you plan to complain about me optimizing for RAM size by resorting to OO methods, which run a bit slower than regular subroutines (or direct manipulation of a data structure). It is a good trade-off, IMO, because if your problem requires 30% more RAM than what you have available, then either it just fails or it runs really, really slowly. While, it running 10% slower is a minor problem in comparison.

            - tye (but my friends call me "Tye")
      # old loop code ######################################## my $diff= easy( \@a, \@b ); while( @$diff ) { my( $same, $aMin, $aMax, $bMin, $bMax )= splice @$diff, 0, 5; # ... } # new loop code ######################################## my $diff= easy( \@a, \@b ); my $same= 0; while( @$diff ) { my( $aMin, $bMin, $aMax, $bMax )= @$diff; # ... splice @$diff, 0, 2; # ... }

      Ouch! Precisely what I wanted to avoid: a modification in the number of items. Well, you could export a constant for this number, whether it be 2 or 5.

      BTW if memory efficiency is what you're worried about, why don't you just use 1 scalar for each item? After all, these are all just integers, so messing with pack()/unpack() or vec() would allow you to combine them, in one scalar. Only an 8 (or 20) byte string per item! The joy! The savings! (The pain!)

      You are right that the changes in the structure would have cost more overhead, but I was proposing them for the convenience and readability they'd provide. I was going to interject some points in that vein as I was reading; until you presented the OO interface. That's perfect as far as I'm concerned.

      Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://191842]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-03-19 06:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found