laziness, impatience, and hubris PerlMonks

### Comment on

 Need Help??

Unless I'm completely missing your point, it looks like your sample strings do not contain the original phone numbers. The phone numbers are 10 digits, while the strings are 27. I'm going to assume that's a typo, and that the actual strings you're dealing with are concatenations of the three 10 digit numbers you listed, i.e.:

512567000151256700025125670003
512567000251256700015125670003
512567000351256700015125670002

If I'm misunderstanding you in some weird way, please let me know.

By "sameness check", I'm guessing you want a hashing function that will hash the above 3 30-character strings identically. That is, if the 10-digit numbers are \$a, \$b, and \$c, the following 30-character strings should hash equivalently:

abc, acb, bac, bca, cab, cba

Finally, no other 30-character strings should hash to the same value.

If my interpretation of your requirements is correct, there's certainly more than one way to do it:

```#!/usr/bin/env perl

use 5.014;
use warnings;
use Time::HiRes qw/time/;
use Benchmark qw/cmpthese timethese/;
use Inline 'C';

sub hash_pack(\$)   { join '', sort unpack '(A10)*', shift }
sub hash_re(\$)     { join '', sort \$_[0] =~ /(\d{10})/g   }
sub hash_substr(\$) {
my @nums; my \$s = shift;
while (\$s) {
push @nums, substr(\$s,0,10);
\$s = substr(\$s,10);
}
join '',sort @nums;
}
# Only considers first 3 numbers
sub hash_substr2(\$) {
join '', sort substr(\$_[0],0,10),substr(\$_[0],10,10),substr(\$_[0],
+20,10);
}

my @funcs = map { "hash_\$_" } qw/pack re substr substr2 c/;

my @strings = qw/512567000151256700025125670003
512567000251256700015125670003
512567000351256700015125670002/;

for my \$s (@strings) {
printf "%12s(%s) => %s\n", \$_, \$s, eval "\$_(\\$s)" for @funcs;
}

my \$s = \$strings[0];
cmpthese timethese(-5, { map { \$_ => "\$_('\$s')" } @funcs });

__END__
__C__

/* Try our own splitter sort. This swaps the numbers in-place
* as necessary to obtain a sorted order. */
#include <string.h>
#define SIZE    10
#define strswap(s1,s2,size) {           \
int i;                              \
for (i = 0; i < size; i++) {        \
s1[i] = s1[i] ^ s2[i];          \
s2[i] = s1[i] ^ s2[i];          \
s1[i] = s1[i] ^ s2[i];          \
}                                   \
}

char * hash_c(char *str) {
char *n0 = str;
char *n1 = str + SIZE;
char *n2 = str + SIZE + SIZE;

if (strncmp(n0, n1, SIZE) > 0)
strswap(n0, n1, SIZE);

if (strncmp(n1, n2, SIZE) > 0)
strswap(n1, n2, SIZE);

if (strncmp(n0, n1, SIZE) > 0)
strswap(n0, n1, SIZE);

return str;
}

## Output

```   hash_pack(512567000151256700025125670003) => 5125670001512567000251
+25670003
hash_re(512567000151256700025125670003) => 5125670001512567000251
+25670003
hash_substr(512567000151256700025125670003) => 5125670001512567000251
+25670003
hash_substr2(512567000151256700025125670003) => 5125670001512567000251
+25670003
hash_c(512567000151256700025125670003) => 5125670001512567000251
+25670003
hash_pack(512567000251256700015125670003) => 5125670001512567000251
+25670003
hash_re(512567000251256700015125670003) => 5125670001512567000251
+25670003
hash_substr(512567000251256700015125670003) => 5125670001512567000251
+25670003
hash_substr2(512567000251256700015125670003) => 5125670001512567000251
+25670003
hash_c(512567000151256700025125670003) => 5125670001512567000251
+25670003
hash_pack(512567000351256700015125670002) => 5125670001512567000251
+25670003
hash_re(512567000351256700015125670002) => 5125670001512567000251
+25670003
hash_substr(512567000351256700015125670002) => 5125670001512567000251
+25670003
hash_substr2(512567000351256700015125670002) => 5125670001512567000251
+25670003
hash_c(512567000151256700025125670003) => 5125670001512567000251
+25670003
Benchmark: running hash_c, hash_pack, hash_re, hash_substr, hash_subst
+r2 for at least 5 CPU seconds...
hash_c:  6 wallclock secs ( 5.71 usr +  0.00 sys =  5.71 CPU) @ 46
+06276.36/s (n=26301838)
hash_pack:  6 wallclock secs ( 5.07 usr +  0.00 sys =  5.07 CPU) @ 64
+6938.07/s (n=3279976)
hash_re:  6 wallclock secs ( 5.03 usr +  0.00 sys =  5.03 CPU) @ 42
+2000.20/s (n=2122661)
hash_substr:  5 wallclock secs ( 5.04 usr +  0.00 sys =  5.04 CPU) @ 3
+28204.96/s (n=1654153)
hash_substr2:  4 wallclock secs ( 5.14 usr +  0.00 sys =  5.14 CPU) @
+965458.95/s (n=4962459)
Rate hash_substr    hash_re  hash_pack hash_substr2
+    hash_c
hash_substr   328205/s          --       -22%       -49%         -66%
+      -93%
hash_re       422000/s         29%         --       -35%         -56%
+      -91%
hash_pack     646938/s         97%        53%         --         -33%
+      -86%
hash_substr2  965459/s        194%       129%        49%           --
+      -79%
hash_c       4606276/s       1303%       992%       612%         377%
+        --

You'll need to decide for yourself which is more appealing, and how much performance you'll need to squeeze out of this function. The C solution might be overkill, or the 3.77x speed gain compared to a pure Perl solution might be just what you need.

Input validation is left as an exercise to the reader.

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

• Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
• Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
• Read Where should I post X? if you're not absolutely sure you're posting in the right place.
• Posts may use any of the Perl Monks Approved HTML tags:
a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
• You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
 For: Use: & & < < > > [ [ ] ]
• Link using PerlMonks shortcuts! What shortcuts can I use for linking?

Create A New User
Chatterbox?
 [james28909]: seems to work fine so far. have not tested any modules yet though

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (4)
As of 2018-05-22 22:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
World peace can best be achieved by:

Results (166 votes). Check out past polls.

Notices?