Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^2: Quantitative Change instead of Boolean

by titivillus (Beadle)
on Aug 11, 2006 at 17:40 UTC ( #566897=note: print w/ replies, xml ) Need Help??


in reply to Re: Quantitative Change instead of Boolean
in thread Quantitative Change instead of Boolean

I've taken a look at Digest::Nilsimsa. It is a cool thing. However, I've found a problem.

for my $d ( 30 .. 36 ) { my $this = $nil->text2digest( ( 'a' x $d ) . 'b' ) ; my $that = $nil->text2digest( ( 'a' x ( $d + 1 ) ) ) ; print nilcomp( $this , $that ) ; } sub nilcomp { my $diff = 0 ; my $diff2 = 0 ; my @this = split /|/ , shift ; my @that = split /|/ , shift ; for my $a ( 0 .. scalar(@this)-1 ) { $diff++ if $this[$a] ne $that[$a] ; my $is = hex $this[$a] ; my $at = hex $that[$a] ; if ( $is != $at ) { $diff2 += abs $is - $at ; } } return ( join "" , @this) . qq(\n) . ( join "" , @that) . qq(\n) . $diff . qq( characters different\n) . ( abs $diff2 ) . qq( bits different\n\n); }
gives you
000000000000900000010021000008105000080010000004000c400000000008 0000000000009000000000200000080040000000000000040008400000000000 8 characters different 25 bits different 000000000000900000010021000008105000080010000004000c400000000008 0000000000009000000000200000080040000000000000040008400000000000 8 characters different 25 bits different 000000000000900000010021000008105000080010000004000c400000000008 0000000000009000000000200000080040000000000000040008400000000000 8 characters different 25 bits different 000000000000900000010021000008105000080010000004000c400000000008 0000000000009000000000200000080040000000000000040008400000000000 8 characters different 25 bits different 000000000000900000010021000008105000080010000004000c400000000008 0000000000009000000000200000080040000000000000040008400000000000 8 characters different 25 bits different 0000000000009000000000200000080040000000000000040008400000000000 0000000000009000000000200000080040000000000000040008400000000000 0 characters different 0 bits different 0000000000009000000000200000080040000000000000040008400000000000 0000000000009000000000200000080040000000000000040008400000000000 0 characters different 0 bits different
If all I was trying to use this on were 35-character data sets, that'd be cool, but I'm trying to run this on whole web pages. I pull out all markup and whitespace, I'll still be in the headers by the time the 35th character rolled around. I love it in theory, but in practice, the data's too big for the module. So, I could do this:  $output =~ s[(.{35})][$nilsimsa->text2digest($1)]ge ; or something of the sort, but that seems ... goofy. But it does point out that taking length $output and comparing it to last time should indicate a small change, if they're only a small number of characters apart.

.sig goes here


Comment on Re^2: Quantitative Change instead of Boolean
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://566897]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (3)
As of 2014-09-18 01:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (103 votes), past polls