 Pathologically Eclectic Rubbish Lister PerlMonks

### Re: Comparing a value to a list of numbers

by jcb (Parson)
 on Jan 31, 2021 at 04:50 UTC Need Help??

in reply to Comparing a value to a list of numbers

Other monks have suggested a binary search solution, but did not provide an example. I took writing an optimized binary search for this as a challenge and wrote this:

```#!/usr/bin/perl
# -*- CPerl -*-

use strict;
use warnings;

# sparse range structure:
#  array of arrayref/number
#   single integers represented as number
#   contigous ranges as arrayref [LOW,HIGH] inclusive

# given: string "A,B,C,X-Y,Z"
# return: sorted array [A, B, C, [X, Y], Z]
sub parse (\$) {
my @elements = split /,/, shift;

foreach (@elements) {
s/\s+//g;                # prune whitespace
next unless m/\d+-\d+/;        # skip conversion if single integer
\$_ = [split /-/, \$_];        # convert ranges to arrayrefs
}

# sort range set
@elements = sort {(ref(\$a)?\$a->:\$a) <=> (ref(\$b)?\$b->:\$b)} @el
+ements;

# merge overlapping loose elements into preceding ranges
for (my \$i = 0; \$i < \$#elements; \$i++) {
next unless ref \$elements[\$i];    # skip single integers
while (\$i+1 <= \$#elements and \$elements[\$i+1] <= \$elements[\$i])
{ splice @elements, \$i+1, 1 }    # remove elements included in r
+ange
}

# coalesce contiguous integers into ranges
for (my \$i = 0; \$i < \$#elements; \$i++) {
next if ref \$elements[\$i];        # skip ranges
if (\$elements[\$i]+1 == \$elements[\$i+1]) {
my \$j = 1+\$i;
\$j++ while !ref(\$elements[\$j]) && \$elements[\$j]+1 == \$elements[\$
+j+1];
splice @elements, \$i, 1+\$j-\$i, [\$elements[\$i], \$elements[\$j]];
}
}

# merge adjacent loose elements into succeeding ranges
for (my \$i = 0; \$i < \$#elements; \$i++) {
next if ref \$elements[\$i];        # skip ranges
next unless ref \$elements[\$i+1];    # but next element is a range
# There can be at most one such element, since contiguous integers
+ were
# coalesced into ranges above.
if (\$elements[\$i]+1 == \$elements[\$i+1])
{ splice @elements, \$i, 2, [\$elements[\$i], \$elements[\$i+1]] }
}

# merge adjacent loose elements into preceding ranges
for (my \$i = 0; \$i < \$#elements; \$i++) {
next unless ref \$elements[\$i];    # skip single integers
next if ref \$elements[\$i+1];    # but next element is a single int
+eger
# There can be at most one such element, since contiguous integers
+ were
# coalesced into ranges above.
if (\$elements[\$i]+1 == \$elements[\$i+1])
{ splice @elements, \$i, 2, [\$elements[\$i], \$elements[\$i+1]] }
}

# merge overlapping ranges
for (my \$i = 0; \$i < \$#elements; \$i++) {
next unless ref \$elements[\$i] and ref \$elements[\$i+1];
splice @elements, \$i, 2, [\$elements[\$i], \$elements[\$i+1]]
if \$elements[\$i] >= \$elements[\$i+1];
}

# merge adjacent ranges
for (my \$i = 0; \$i < \$#elements; \$i++) {
next unless ref \$elements[\$i] and ref \$elements[\$i+1];
splice @elements, \$i, 2, [\$elements[\$i], \$elements[\$i+1]]
if \$elements[\$i]+1 == \$elements[\$i+1];
}

return \@elements;
}

# given: sorted array from sub parse and integer
# return true if the integer is in the sorted array
sub search (\$\$) {
my \$set = shift;
my \$num = shift;

my \$left = 0; my \$right = \$#\$set;
my \$i = \$#\$set >> 1;            # bitshift for integer /2
while (\$left < \$right) {
if (ref(\$set->[\$i])) {        # evaluate a range
return 1        # number within this range
if \$num >= \$set->[\$i] && \$num <= \$set->[\$i];
if (\$num > \$set->[\$i])    { \$left  = \$i+1 }
else            { \$right = \$i-1 }
} else {                # evaluate a single integer
return 1        # number matched
if \$num == \$set->[\$i];
if (\$num > \$set->[\$i])    { \$left  = \$i+1 }
else            { \$right = \$i-1 }
}
\$i = (\$left + \$right) >> 1;        # bitshift for integer /2
}

# last check
if (ref(\$set->[\$i])) {
return \$num >= \$set->[\$i] && \$num <= \$set->[\$i]
} else {
return \$num == \$set->[\$i]
}
}

my \$Set = parse shift;

use Data::Dumper;
print Data::Dumper->new([\$Set],[qw(Set)])->Indent(0)->Dump,"\n";

foreach (@ARGV) {
print \$_, search(\$Set, \$_) ? ' is' : ' is not', " in the set\n";
}
[download]```

The script expects a set of numbers as its first command line argument, and then various numbers to test against the set as additional arguments. Examples:

```\$ ./bsearch.pl 1,2,5,6,9,10,41-56 1 4 42 17
\$Set = [[1,2],[5,6],[9,10],[41,'56']];
1 is in the set
4 is not in the set
42 is in the set
17 is not in the set
[download]```
```\$ ./bsearch.pl 1,2,11-16,6,7,19,9,5-8,13,14,15,4 1 2 3 4 5 8 9 10 11 1
+2 16 17 18 19 20
\$Set = [[1,2],[4,9],[11,16],19];
1 is in the set
2 is in the set
3 is not in the set
4 is in the set
5 is in the set
8 is in the set
9 is in the set
10 is not in the set
11 is in the set
12 is in the set
16 is in the set
17 is not in the set
18 is not in the set
19 is in the set
20 is not in the set
[download]```

This latter example was used for developing the set-optimization logic.

Replies are listed 'Best First'.
Re^2: Comparing a value to a list of numbers
by haukex (Bishop) on Jan 31, 2021 at 09:09 UTC
Other monks have suggested a binary search solution, but did not provide an example.

Set::IntSpan uses a binary search.

Interesting, so why is it restricted to Integers?

Likely because that's simply its purpose, plus I imagine certain set operations would be problematic due to floating-point inaccuracies.

Re^2: Comparing a value to a list of numbers
by LanX (Cardinal) on Jan 31, 2021 at 19:55 UTC
> Other monks have suggested a binary search solution, but did not provide an example.

FWIW I did, but didn't publish, since it didn't fit into davido's benchmark (which has also it flaws by only testing 42)

```  DB<47> sub tst { if (\$_>=9) { if (\$_>=41 and \$_<=56) {1} elsif (\$_<=
+10) {1} } elsif (\$_<=6) { if (\$_>=5){1} elsif \
(\$_<=2 and \$_>=1) {1} }}

DB<48> %in = map { \$_ => 1 } 1,2,5,6,9,10,41..56

DB<49> tst != \$in{\$_} and print "error \$_" for 1..190

DB<50>
[download]```

I was also pondering about how to optimize a solution (time and memory) for all cases.

Like potentially

• many points
• many ranges
• floats instead of integers
• multiple dimensions (like rectangular areas in a plane)
An "optimal" solution would involve combining different techniques ...

And some techniques which were prominent in similar older threads haven't been mentioned yet.

Anyway all of this is far beyond what the OP wanted.

But yes: again "I'm suggesting but not providing an example" ;-)

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

I assumed that the set would vary (although using eval to convert the arrayref returned from parse into a closure like your tst could be interesting even if compile could end up longer than parse) and tried to optimize for the "many points" and "many ranges" cases; thus all those extra passes in the parse sub to simplify the result. I excluded floating point because exact comparisons are not reliable and it is easier to restrict the domain to integers. Multiple dimensions start to get complex enough that I suspect a general optimal solution is impossible — optimizing that case will depend on the actual data used.

The "suggesting but not providing an example" remark was an excuse for posting my solution — usually these questions have been "done to death" before I get to them. ;-)

(although using eval to convert the arrayref returned from parse into a closure like your tst could be interesting even if compile could end up longer than parse)

Ah, heck with it — compile turned out to be only slightly longer than search...

```#!/usr/bin/perl
# -*- CPerl -*-

use strict;
use warnings;

# sparse range structure:
#  array of arrayref/number
#   single integers represented as number
#   contigous ranges as arrayref [LOW,HIGH] inclusive

# given: string "A,B,C,X-Y,Z"
# return: sorted array [A, B, C, [X, Y], Z]
sub parse (\$) {
my @elements = split /,/, shift;

foreach (@elements) {
s/\s+//g;                # prune whitespace
next unless m/\d+-\d+/;        # skip conversion if single integer
\$_ = [split /-/, \$_];        # convert ranges to arrayrefs
}

# sort range set
@elements = sort {(ref(\$a)?\$a->:\$a) <=> (ref(\$b)?\$b->:\$b)} @el
+ements;

# merge overlapping loose elements into preceding ranges
for (my \$i = 0; \$i < \$#elements; \$i++) {
next unless ref \$elements[\$i];    # skip single integers
while (\$i+1 <= \$#elements and \$elements[\$i+1] <= \$elements[\$i])
{ splice @elements, \$i+1, 1 }    # remove elements included in r
+ange
}

# coalesce contiguous integers into ranges
for (my \$i = 0; \$i < \$#elements; \$i++) {
next if ref \$elements[\$i];        # skip ranges
if (\$elements[\$i]+1 == \$elements[\$i+1]) {
my \$j = 1+\$i;
\$j++ while !ref(\$elements[\$j]) && \$elements[\$j]+1 == \$elements[\$
+j+1];
splice @elements, \$i, 1+\$j-\$i, [\$elements[\$i], \$elements[\$j]];
}
}

# merge adjacent loose elements into succeeding ranges
for (my \$i = 0; \$i < \$#elements; \$i++) {
next if ref \$elements[\$i];        # skip ranges
next unless ref \$elements[\$i+1];    # but next element is a range
# There can be at most one such element, since contiguous integers
+ were
# coalesced into ranges above.
if (\$elements[\$i]+1 == \$elements[\$i+1])
{ splice @elements, \$i, 2, [\$elements[\$i], \$elements[\$i+1]] }
}

# merge adjacent loose elements into preceding ranges
for (my \$i = 0; \$i < \$#elements; \$i++) {
next unless ref \$elements[\$i];    # skip single integers
next if ref \$elements[\$i+1];    # but next element is a single int
+eger
# There can be at most one such element, since contiguous integers
+ were
# coalesced into ranges above.
if (\$elements[\$i]+1 == \$elements[\$i+1])
{ splice @elements, \$i, 2, [\$elements[\$i], \$elements[\$i+1]] }
}

# merge overlapping ranges
for (my \$i = 0; \$i < \$#elements; \$i++) {
next unless ref \$elements[\$i] and ref \$elements[\$i+1];
splice @elements, \$i, 2, [\$elements[\$i], \$elements[\$i+1]]
if \$elements[\$i] >= \$elements[\$i+1];
}

# merge adjacent ranges
for (my \$i = 0; \$i < \$#elements; \$i++) {
next unless ref \$elements[\$i] and ref \$elements[\$i+1];
splice @elements, \$i, 2, [\$elements[\$i], \$elements[\$i+1]]
if \$elements[\$i]+1 == \$elements[\$i+1];
}

return \@elements;
}

# given: sorted array from sub parse and integer
# return true if the integer is in the sorted array
sub search (\$\$) {
my \$set = shift;
my \$num = shift;

my \$left = 0; my \$right = \$#\$set;
my \$i = \$#\$set >> 1;            # bitshift for integer /2
while (\$left < \$right) {
if (ref(\$set->[\$i])) {        # evaluate a range
return 1        # number within this range
if \$num >= \$set->[\$i] && \$num <= \$set->[\$i];
if (\$num > \$set->[\$i])    { \$left  = \$i+1 }
else            { \$right = \$i-1 }
} else {                # evaluate a single integer
return 1        # number matched
if \$num == \$set->[\$i];
if (\$num > \$set->[\$i])    { \$left  = \$i+1 }
else            { \$right = \$i-1 }
}
\$i = (\$left + \$right) >> 1;        # bitshift for integer /2
}

# last check
if (ref(\$set->[\$i])) {
return \$num >= \$set->[\$i] && \$num <= \$set->[\$i]
} else {
return \$num == \$set->[\$i]
}
}

# given: sorted array from sub parse
# return: closure accepting an integer and returning true iff it is in
+ the set
sub wrap (\$) {
my \$set = shift;
return sub { return search \$set, shift };
}

# given: sorted array from sub parse
# return: closure same as from wrap but using unrolled binary search
sub compile (\$) {
my \$set = shift;

my \$expand; \$expand = sub {
my \$mid = \$#_ >> 1;
my \$branchcond; my \$matchcond;
if (ref \$_[\$mid]) {            # pivot on range
\$branchcond = "\\$X > \$_[\$mid]";
\$matchcond = "\\$X >= \$_[\$mid] && \\$X <= \$_[\$mid]";
} else {                # pivot on single integer
\$branchcond = "\\$X > \$_[\$mid]";
\$matchcond = "\\$X == \$_[\$mid]";
}
my \$fragment = "  return \$matchcond;\n";
if (\$mid+1 <= \$#_) {
\$fragment =
("  return 1 if \$matchcond;\n".
"  if (\$branchcond) {\n".
\$expand->(@_[\$mid+1 .. \$#_]).
"  }");
if (\$mid-1 >= 0) {
\$fragment .=
(" else {\n".
\$expand->(@_[0 .. \$mid-1]).
"  }\n");
} else {
\$fragment .= "\n";
}
}
return \$fragment
};

my \$tree = \$expand->(@\$set);
return eval <<CODE
sub {
my \\$X = shift;
\$tree
return !1; # false value for no match if end reached
}
CODE
}

my \$Set = parse shift;
my \$Test = wrap \$Set;
my \$Clos = compile \$Set;

use Data::Dumper;
print Data::Dumper->new([\$Set],[qw(Set)])->Indent(0)->Dump,"\n";
print Data::Dumper->new([\$Clos], [qw(Closure)])->Deparse(1)->Dump;

sub sym (\$) { \$_ ? 'X' : ' ' }
printf "%4s | basic | wrapped | compile\n", '';
foreach (@ARGV) {
printf "%4d |   %1s   |    %1s    |    %1s\n",
\$_, sym(search \$Set, \$_), sym(\$Test->(\$_)), sym(\$Clos->(\$_));
}
[download]```

Edit 2021-02-02 by jcb: Use B::Deparse with Data::Dumper to also output the compiled closure. (line 169 inserted)

Log In?
 Username: Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11127703]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (6)
As of 2021-06-15 09:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
What does the "s" stand for in "perls"? (Whence perls)

Results (69 votes). Check out past polls.

Notices?