Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: string diff and compare

by jwkrahn (Abbot)
on May 04, 2012 at 08:47 UTC ( [id://968890]=note: print w/replies, xml ) Need Help??


in reply to string diff and compare

you have an error at:

my @new = split //, $new; my $longer = (@cold > @new) ? $#cold : $#new;

Which should be:

my @cnew = split //, $new; my $longer = (@cold > @cnew) ? $#cold : $#cnew;

And at:

substr ($old, subscr, 0) = $span; substr ($new, subscr, 0) = $span;

Which should be:

substr ($old, $subscr, 0) = $span; substr ($new, $subscr, 0) = $span;


Also this:

my $losubscr; #two subscripts because differ len of new and old my $lnsubscr; @cold = reverse @cold; @cnew = reverse @cnew; #we're gonna search these backwards for (0..$longer) { if ($cold[$_] ne $cnew[$_]) { $losubscr = @cold - $_; $lnsubscr = @cnew - $_; last; } } #got pertinent subscripts, now insert span tags substr ($old, $losubscr, 0) = $espan; substr ($new, $lnsubscr, 0) = $espan; substr ($old, subscr, 0) = $span; substr ($new, subscr, 0) = $span;

Could be shortened to:

my $lsubscr; @cold = reverse @cold; @cnew = reverse @cnew; #we're gonna search these backwards for (0..$longer) { if ($cold[$_] ne $cnew[$_]) { $lsubscr = -$_; last; } } #got pertinent subscripts, now insert span tags substr $old, $lsubscr, 0, $espan; substr $new, $lsubscr, 0, $espan; substr $old, $subscr, 0, $span; substr $new, $subscr, 0, $span;


Another way to write that would be:

while ( defined( my $first = <DATA> ) && defined( my $second = <DATA> +) ) { print spandiffstr( $first, $second ); } sub spandiffstr { my ( $old, $new ) = @_; my $span = '<span>'; my $espan = '</span>'; ( $old ^ $new ) =~ /^\0*/ and my $subscr = $+[0]; my ( $rold, $rnew ) = map scalar reverse, $old, $new; ( $rold ^ $rnew ) =~ /^\0*/ and my $lsubscr = -$+[0]; # got pertinent subscripts, now insert span tags substr $old, $lsubscr, 0, $espan; substr $new, $lsubscr, 0, $espan; substr $old, $subscr, 0, $span; substr $new, $subscr, 0, $span; return $old, $new; }

Replies are listed 'Best First'.
Re^2: string diff and compare
by raybies (Chaplain) on May 04, 2012 at 15:36 UTC

    So I've gone through your example now. I understand everything in it, except...

     my ( $rold, $rnew ) = map scalar reverse, $old, $new;

    What's up with the map scalar reverse syntax? What does the scalar do for example? I mean, I get that you're reversing two strings and putting them into two new strings, but I am really having difficulty parsing how Perl reads this to do what you've done here.

    Anyone care to explain it, decompress it (for us mortals)?

      From perlod ...

      reverse LIST
      In list context, returns a list value consisting of the elements of LIST in the opposite order. In scalar context, concatenates the elements of LIST and returns a string value with all characters in the opposite order.
      ...

      And, read map chain as: map { scalar( reverse $_ ) } ( $old , $new ).

        "perlod"? Ha! :-) sounds like "perllord" was misspelled.
      my ( $rold, $rnew ) = map scalar reverse, $old, $new;

      Is just a different way of saying:

      my $rold = reverse $old; my $rnew = reverse $new;

      You need to use scalar because reverse is in list context in the map expression.

        There are other ways of enforcing scalar context which are a lot more fun. :-)

        use 5.010; use strict; use warnings all => 'FATAL'; my ( $old, $new ) = qw( old new ); my ( $rold, $rnew ) = map~~reverse, $old, $new; say for "$old => $rold", "$new => $rnew";
        perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re^2: string diff and compare
by raybies (Chaplain) on May 04, 2012 at 12:03 UTC

    Thanks jw!

    I fixed the errors (my internet machine, and my Perl are on two different systems, and I copied it by hand...).

    The optimizations are really interesting (still pondering it in awe...), I didn't know about @+ til this example. That makes matching strings so much better than having to pull them appart into seperate character arrays. The trick of using negative subscripts is really cool too. Thanks for taking the time to respond, it's little things like this that makes Perl a pure expression of algorithmic joy... :-D

      I fixed the errors

      Except for this one:

      my $longer = (@cold > @new) ? $#cold : $#new;

      Which should be:

      my $longer = (@cold > @cnew) ? $#cold : $#cnew;
Re^2: string diff and compare
by raybies (Chaplain) on May 07, 2012 at 19:55 UTC
    So, fwiw, I had to do the following to your code to get your method to work the way I wanted and cover all corner cases I've tested. (If the change spans to the very last character for example, then your first condition fails and you never assign $+[0] to $lsubscr ... So I had to do the following.
    . . . ($rold ^ $rnew) =~ /^\0*/ and my $lsubscr = (-$+[0] + 1 && $+[0]) ? +-$+[0] + 1 : length $new; my $losubscr = ($lsubscr == length $new) ? length $old : $lsubscr; substr $old, $losubscr, 0, '</span>'; . . .

    This was to fix if the difference was the very last string char and one character is missing from another. Like in the case of data like:

    All sentences end with what All sentences end with what?

    In this case the difference is -1, but because I had to add one to it, it was set to 0. But 0 is the beginning of the string, not the character just past the end of the string.

      So for the sake of completeness, and posterity... and not because anyone cares, here's the final version that I decided fixed things up the way I wanted. Turns out by reordering the substr I was able to remove the space ambiguity I was seeing in one of my example data failures where I was getting nested span tags.
      sub spandiffstr { my ( $old, $new ) = @_; my $span = '<span>'; my $espan = '</span>'; ( $old ^ $new ) =~ /^\0*/ and my $subscr = $+[0]; substr $old, $subscr, 0, $span; substr $new, $subscr, 0, $span; my ( $rold, $rnew ) = map scalar reverse, $old, $new; ( $rold ^ $rnew ) =~ /^\0*/ and my $lsubscr = -$+[0]; my $losubscr = $lsubscr; unless ($lsubscr) { $losubscr = length $old; $lsubscr = length $new; } substr $old, $losubscr, 0, $espan; substr $new, $lsubscr, 0, $espan; return $old, $new; }
      Thanks for all the fish!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://968890]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (5)
As of 2024-04-19 04:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found