Re: Tell whether two strings differ at only one position
by BrowserUk (Patriarch) on Aug 04, 2005 at 17:32 UTC
|
sub compare{
return ( length $_[0] == length $_[1] )
&& ( length( $_[0] ) - 1 <= ( $_[0] ^ $_[1] ) =~ tr[\0][\0] )
}
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
| [reply] [Watch: Dir/Any] [d/l] |
Re: Tell whether two strings differ at only one position
by japhy (Canon) on Aug 04, 2005 at 18:50 UTC
|
Here's another XOR solution that I think is far more to the point:
sub compare {
(($_[0] ^ $_[1]) =~ tr/\0//c) < 2;
}
It counts the number of non-NULLs produced by XORing the strings together. If that number is less than 2, then the strings were either identical or differed by only one character. This could also be written as:
sub compare {
($_[0] ^ $_[1]) =~ /^\0*(?:[^\0]\0*)?$/
}
The regex ensures the XORed string is either entirely NULLs or has just one non-NULL in it.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
sub compare {
(($_[0] ^ $_[1]) =~ tr/\0//c) < 2;
}
While I liked each and every solution other wise monks have been so kind to come up with so far, when I looked at the solution of japhy, I immediately saw that this is the one I have been searching for! It is clean and concise and perlish, and, quite amazingly, it is just one statement, so I can eliminate the compare sub alltogether which, as I hope, will make my script even faster!
I am beginning to see that after a certain point, Perl programming becomes a matter of pure aesthetics...
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
sub compare {
(($_[0] ^ $_[1]) =~ tr/\0//c) < 2;
}
print((compare('abc', 'abcd') ? 'ok' : 'not ok'), "\n");
It should be
sub compare {
length $_[0] == length $_[1] && (($_[0] ^ $_[1]) =~ tr/\0//c) < 2;
}
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
|
This is what I wanted to do myself in the first place. Only that... I plainly didn't remember about the /c switch! And I tried with the /d one (without assignment, that is) but it failed like this:
Can't modify bitwise xor (^) in transliteration (tr///) at foo.pl line
+ 7, near "tr/\0//d) "
So I adopted a temp variable. Incidentally I wonder why I don't get the same error if I omit /d (with or without /c)... in both cases I'm attempting at modifying it, am I not? | [reply] [Watch: Dir/Any] [d/l] [select] |
|
With /d and /c, you are modifying the string.
Without either, and with source & destination tables being the same, is a special case designed for counting only.
There would be no point in modifying the string just to replace like with like.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
| [reply] [Watch: Dir/Any] |
Re: Tell whether two strings differ at only one position
by blazar (Canon) on Aug 04, 2005 at 17:26 UTC
|
sub compare {
(my $xor=$_[0]^$_[1]) =~ tr/\0//d;
length $xor < 2;
}
Untested. And not benchmarked. But that would be my first try at doing what you need. | [reply] [Watch: Dir/Any] [d/l] |
Re: Tell whether two strings differ at only one position
by Albannach (Monsignor) on Aug 04, 2005 at 18:13 UTC
|
I make a lot of use of Text::Levenshtein which has an XS version that might be competitive, so here is a quick test. Fair warning: I'm far from a Benchmark guru (comments welcome), and the test set you provided might have a different distribution of types than you actually encounter, so if a particular method takes longer for a particular result, there will be some bias.
Rate rg0now ikegami levenshtein blazar
+BrowserUk
rg0now 8801/s -- -57% -82% -87%
+ -87%
ikegami 20414/s 132% -- -57% -69%
+ -70%
levenshtein 47786/s 443% 134% -- -28%
+ -30%
blazar 66258/s 653% 225% 39% --
+ -4%
BrowserUk 68683/s 680% 236% 44% 4%
+ --
The code follows.
--
I'd like to be able to assign to an luser
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Tell whether two strings differ at only one position
by ikegami (Patriarch) on Aug 04, 2005 at 17:33 UTC
|
sub compare {
return unless length $_[0] == length $_[1];
# Omit if expecting differences almost all the time.
return 1 if $_[0] eq $_[1];
my $limit = 1;
foreach (split('', $_[0]^$_[1])) {
next unless ord;
return 0 unless $limit--;
}
return 1;
}
Doesn't work with Unicode.
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Tell whether two strings differ at only one position
by Not_a_Number (Prior) on Aug 04, 2005 at 20:27 UTC
|
use strict;
use warnings;
# @data contains the (chomped, lower-cased) entries
# from a 70K dictionary file (2076 words beginning with 'j')
my $count; # Or whatever: this can check algorithm validity
foreach my $item ( @data ) {
compare( $item, $_ ) and $count++ for @data;
}
print $count;
sub compare {
return 0 unless length $_[0] == length $_[1];
my $diff = 0;
for ( 0 .. length $_[0] ) {
$diff++ if substr( $_[0], $_, 1 ) ne substr( $_[1], $_, 1 );
return 0 if $diff > 1;
}
return 1;
}
is very much faster than your original code.
Surprisingly(?), initial testing seems to show that it is also considerably faster than any of the replies you have hitherto received using bitwise XOR.
I haven't got the time to do any precise benchmarking at the moment: however, since you need the "fastest solution on Earth", I shall leave that up to you | [reply] [Watch: Dir/Any] [d/l] |
|
I adopted Albannach's nice benchmark (thanks Albannach) and reBenchmarked all the solutions again. But now, for data, I used a random set of nice Hungarian words. Here are the results:
Rate rg0now ikegami Not_a_Number BrowserUk blazar japhy
rg0now 288/s -- -66% -82% -95% -96% -97%
ikegami 849/s 195% -- -45% -87% -88% -90%
Not_a_Number 1557/s 441% 83% -- -76% -78% -83%
BrowserUk 6382/s 2118% 652% 310% -- -10% -28%
blazar 7084/s 2362% 735% 355% 11% -- -20%
japhy 8900/s 2993% 949% 472% 39% 26% --
It seems that japhy's solution is the clear winner, even if it is degraded to a sub call. I think that your results might be attributed your specific choice of data, but I do not know enough about the intrinsics of Perl string handling to say the decisive words (in fact, I do not know anything about it, so...)
| [reply] [Watch: Dir/Any] [d/l] |
|
Both blazar's and japhy's return true when passed 'fred', 'freda' which obviously isn't right.
When you add the code to detect that error, you get:
Rate rg0now ikegami Not_a_Number blazar japhy
+ BrowserUk
rg0now 412/s -- -69% -85% -96% -96%
+ -96%
ikegami 1311/s 218% -- -54% -86% -88%
+ -89%
Not_a_Number 2842/s 589% 117% -- -70% -74%
+ -75%
blazar 9628/s 2236% 634% 239% -- -13%
+ -17%
japhy 11086/s 2590% 745% 290% 15% --
+ -4%
BrowserUk 11580/s 2709% 783% 307% 20% 4%
+ --
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
| [reply] [Watch: Dir/Any] [d/l] |
|
sub compare_blazar_mod {
( $_ = $_[0] ^ $_[1] ) =~ tr/\0//d;
2 < length;
}
| [reply] [Watch: Dir/Any] [d/l] |
Re: Tell whether two strings differ at only one position
by demerphq (Chancellor) on Aug 05, 2005 at 08:05 UTC
|
Its not clear to me if it would help or not, but you may want to look at Algorithm Showdown: Fuzzy Matching which has some high speed solutions for doing this type of problem, albeit for finding all of the fuzzy matches of a list of a words of the same length in gene strings. Warning: some of the discussion in that thread is somewhat acrimonious.
---
$world=~s/war/peace/g
| [reply] [Watch: Dir/Any] |
Re: Tell whether two strings differ at only one position
by NiJo (Friar) on Aug 05, 2005 at 18:07 UTC
|
The speed race for the most generic case seems to be over. But looking at the real data and considering hit probabilities should provide the largest speed up. Assuming the language dictionary comparison, hits are quite unlikely. Most comparisons can be optimized away completely.
Bloom::Filter (or an Bit::Vector based version of it) can tell you very quickly if there is a hit in the database, but can not tell you where it matched. It produces some false positives. You have to weight the creation of the hash table against getting a nearly linear scalability.
Playing some tricks with an 'any' char seems to be neccessary. 'ab#d' would be an extended version of lowercasing. | [reply] [Watch: Dir/Any] |