Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Comparing a value to a list of numbers

by jcb (Parson)
on Jan 31, 2021 at 04:50 UTC ( #11127703=note: print w/replies, xml ) Need Help??


in reply to Comparing a value to a list of numbers

Other monks have suggested a binary search solution, but did not provide an example. I took writing an optimized binary search for this as a challenge and wrote this:

#!/usr/bin/perl # -*- CPerl -*- use strict; use warnings; # sparse range structure: # array of arrayref/number # single integers represented as number # contigous ranges as arrayref [LOW,HIGH] inclusive # given: string "A,B,C,X-Y,Z" # return: sorted array [A, B, C, [X, Y], Z] sub parse ($) { my @elements = split /,/, shift; foreach (@elements) { s/\s+//g; # prune whitespace next unless m/\d+-\d+/; # skip conversion if single integer $_ = [split /-/, $_]; # convert ranges to arrayrefs } # sort range set @elements = sort {(ref($a)?$a->[0]:$a) <=> (ref($b)?$b->[0]:$b)} @el +ements; # merge overlapping loose elements into preceding ranges for (my $i = 0; $i < $#elements; $i++) { next unless ref $elements[$i]; # skip single integers while ($i+1 <= $#elements and $elements[$i+1] <= $elements[$i][1]) { splice @elements, $i+1, 1 } # remove elements included in r +ange } # coalesce contiguous integers into ranges for (my $i = 0; $i < $#elements; $i++) { next if ref $elements[$i]; # skip ranges if ($elements[$i]+1 == $elements[$i+1]) { my $j = 1+$i; $j++ while !ref($elements[$j]) && $elements[$j]+1 == $elements[$ +j+1]; splice @elements, $i, 1+$j-$i, [$elements[$i], $elements[$j]]; } } # merge adjacent loose elements into succeeding ranges for (my $i = 0; $i < $#elements; $i++) { next if ref $elements[$i]; # skip ranges next unless ref $elements[$i+1]; # but next element is a range # There can be at most one such element, since contiguous integers + were # coalesced into ranges above. if ($elements[$i]+1 == $elements[$i+1][0]) { splice @elements, $i, 2, [$elements[$i], $elements[$i+1][1]] } } # merge adjacent loose elements into preceding ranges for (my $i = 0; $i < $#elements; $i++) { next unless ref $elements[$i]; # skip single integers next if ref $elements[$i+1]; # but next element is a single int +eger # There can be at most one such element, since contiguous integers + were # coalesced into ranges above. if ($elements[$i][1]+1 == $elements[$i+1]) { splice @elements, $i, 2, [$elements[$i][0], $elements[$i+1]] } } # merge overlapping ranges for (my $i = 0; $i < $#elements; $i++) { next unless ref $elements[$i] and ref $elements[$i+1]; splice @elements, $i, 2, [$elements[$i][0], $elements[$i+1][1]] if $elements[$i][1] >= $elements[$i+1][0]; } # merge adjacent ranges for (my $i = 0; $i < $#elements; $i++) { next unless ref $elements[$i] and ref $elements[$i+1]; splice @elements, $i, 2, [$elements[$i][0], $elements[$i+1][1]] if $elements[$i][1]+1 == $elements[$i+1][0]; } return \@elements; } # given: sorted array from sub parse and integer # return true if the integer is in the sorted array sub search ($$) { my $set = shift; my $num = shift; my $left = 0; my $right = $#$set; my $i = $#$set >> 1; # bitshift for integer /2 while ($left < $right) { if (ref($set->[$i])) { # evaluate a range return 1 # number within this range if $num >= $set->[$i][0] && $num <= $set->[$i][1]; if ($num > $set->[$i][0]) { $left = $i+1 } else { $right = $i-1 } } else { # evaluate a single integer return 1 # number matched if $num == $set->[$i]; if ($num > $set->[$i]) { $left = $i+1 } else { $right = $i-1 } } $i = ($left + $right) >> 1; # bitshift for integer /2 } # last check if (ref($set->[$i])) { return $num >= $set->[$i][0] && $num <= $set->[$i][1] } else { return $num == $set->[$i] } } my $Set = parse shift; use Data::Dumper; print Data::Dumper->new([$Set],[qw(Set)])->Indent(0)->Dump,"\n"; foreach (@ARGV) { print $_, search($Set, $_) ? ' is' : ' is not', " in the set\n"; }

The script expects a set of numbers as its first command line argument, and then various numbers to test against the set as additional arguments. Examples:

$ ./bsearch.pl 1,2,5,6,9,10,41-56 1 4 42 17 $Set = [[1,2],[5,6],[9,10],[41,'56']]; 1 is in the set 4 is not in the set 42 is in the set 17 is not in the set
$ ./bsearch.pl 1,2,11-16,6,7,19,9,5-8,13,14,15,4 1 2 3 4 5 8 9 10 11 1 +2 16 17 18 19 20 $Set = [[1,2],[4,9],[11,16],19]; 1 is in the set 2 is in the set 3 is not in the set 4 is in the set 5 is in the set 8 is in the set 9 is in the set 10 is not in the set 11 is in the set 12 is in the set 16 is in the set 17 is not in the set 18 is not in the set 19 is in the set 20 is not in the set

This latter example was used for developing the set-optimization logic.

Replies are listed 'Best First'.
Re^2: Comparing a value to a list of numbers
by haukex (Bishop) on Jan 31, 2021 at 09:09 UTC
    Other monks have suggested a binary search solution, but did not provide an example.

    Set::IntSpan uses a binary search.

        Interesting, so why is it restricted to Integers?

        Likely because that's simply its purpose, plus I imagine certain set operations would be problematic due to floating-point inaccuracies.

Re^2: Comparing a value to a list of numbers
by LanX (Cardinal) on Jan 31, 2021 at 19:55 UTC
    > Other monks have suggested a binary search solution, but did not provide an example.

    FWIW I did, but didn't publish, since it didn't fit into davido's benchmark (which has also it flaws by only testing 42)

    DB<47> sub tst { if ($_>=9) { if ($_>=41 and $_<=56) {1} elsif ($_<= +10) {1} } elsif ($_<=6) { if ($_>=5){1} elsif \ ($_<=2 and $_>=1) {1} }} DB<48> %in = map { $_ => 1 } 1,2,5,6,9,10,41..56 DB<49> tst != $in{$_} and print "error $_" for 1..190 DB<50>

    I was also pondering about how to optimize a solution (time and memory) for all cases.

    Like potentially

    • many points
    • many ranges
    • floats instead of integers
    • multiple dimensions (like rectangular areas in a plane)
    An "optimal" solution would involve combining different techniques ...

    And some techniques which were prominent in similar older threads haven't been mentioned yet.

    Anyway all of this is far beyond what the OP wanted.

    But yes: again "I'm suggesting but not providing an example" ;-)

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      I assumed that the set would vary (although using eval to convert the arrayref returned from parse into a closure like your tst could be interesting even if compile could end up longer than parse) and tried to optimize for the "many points" and "many ranges" cases; thus all those extra passes in the parse sub to simplify the result. I excluded floating point because exact comparisons are not reliable and it is easier to restrict the domain to integers. Multiple dimensions start to get complex enough that I suspect a general optimal solution is impossible — optimizing that case will depend on the actual data used.

      The "suggesting but not providing an example" remark was an excuse for posting my solution — usually these questions have been "done to death" before I get to them. ;-)

        (although using eval to convert the arrayref returned from parse into a closure like your tst could be interesting even if compile could end up longer than parse)

        Ah, heck with it — compile turned out to be only slightly longer than search...

        #!/usr/bin/perl # -*- CPerl -*- use strict; use warnings; # sparse range structure: # array of arrayref/number # single integers represented as number # contigous ranges as arrayref [LOW,HIGH] inclusive # given: string "A,B,C,X-Y,Z" # return: sorted array [A, B, C, [X, Y], Z] sub parse ($) { my @elements = split /,/, shift; foreach (@elements) { s/\s+//g; # prune whitespace next unless m/\d+-\d+/; # skip conversion if single integer $_ = [split /-/, $_]; # convert ranges to arrayrefs } # sort range set @elements = sort {(ref($a)?$a->[0]:$a) <=> (ref($b)?$b->[0]:$b)} @el +ements; # merge overlapping loose elements into preceding ranges for (my $i = 0; $i < $#elements; $i++) { next unless ref $elements[$i]; # skip single integers while ($i+1 <= $#elements and $elements[$i+1] <= $elements[$i][1]) { splice @elements, $i+1, 1 } # remove elements included in r +ange } # coalesce contiguous integers into ranges for (my $i = 0; $i < $#elements; $i++) { next if ref $elements[$i]; # skip ranges if ($elements[$i]+1 == $elements[$i+1]) { my $j = 1+$i; $j++ while !ref($elements[$j]) && $elements[$j]+1 == $elements[$ +j+1]; splice @elements, $i, 1+$j-$i, [$elements[$i], $elements[$j]]; } } # merge adjacent loose elements into succeeding ranges for (my $i = 0; $i < $#elements; $i++) { next if ref $elements[$i]; # skip ranges next unless ref $elements[$i+1]; # but next element is a range # There can be at most one such element, since contiguous integers + were # coalesced into ranges above. if ($elements[$i]+1 == $elements[$i+1][0]) { splice @elements, $i, 2, [$elements[$i], $elements[$i+1][1]] } } # merge adjacent loose elements into preceding ranges for (my $i = 0; $i < $#elements; $i++) { next unless ref $elements[$i]; # skip single integers next if ref $elements[$i+1]; # but next element is a single int +eger # There can be at most one such element, since contiguous integers + were # coalesced into ranges above. if ($elements[$i][1]+1 == $elements[$i+1]) { splice @elements, $i, 2, [$elements[$i][0], $elements[$i+1]] } } # merge overlapping ranges for (my $i = 0; $i < $#elements; $i++) { next unless ref $elements[$i] and ref $elements[$i+1]; splice @elements, $i, 2, [$elements[$i][0], $elements[$i+1][1]] if $elements[$i][1] >= $elements[$i+1][0]; } # merge adjacent ranges for (my $i = 0; $i < $#elements; $i++) { next unless ref $elements[$i] and ref $elements[$i+1]; splice @elements, $i, 2, [$elements[$i][0], $elements[$i+1][1]] if $elements[$i][1]+1 == $elements[$i+1][0]; } return \@elements; } # given: sorted array from sub parse and integer # return true if the integer is in the sorted array sub search ($$) { my $set = shift; my $num = shift; my $left = 0; my $right = $#$set; my $i = $#$set >> 1; # bitshift for integer /2 while ($left < $right) { if (ref($set->[$i])) { # evaluate a range return 1 # number within this range if $num >= $set->[$i][0] && $num <= $set->[$i][1]; if ($num > $set->[$i][0]) { $left = $i+1 } else { $right = $i-1 } } else { # evaluate a single integer return 1 # number matched if $num == $set->[$i]; if ($num > $set->[$i]) { $left = $i+1 } else { $right = $i-1 } } $i = ($left + $right) >> 1; # bitshift for integer /2 } # last check if (ref($set->[$i])) { return $num >= $set->[$i][0] && $num <= $set->[$i][1] } else { return $num == $set->[$i] } } # given: sorted array from sub parse # return: closure accepting an integer and returning true iff it is in + the set sub wrap ($) { my $set = shift; return sub { return search $set, shift }; } # given: sorted array from sub parse # return: closure same as from wrap but using unrolled binary search sub compile ($) { my $set = shift; my $expand; $expand = sub { my $mid = $#_ >> 1; my $branchcond; my $matchcond; if (ref $_[$mid]) { # pivot on range $branchcond = "\$X > $_[$mid][0]"; $matchcond = "\$X >= $_[$mid][0] && \$X <= $_[$mid][1]"; } else { # pivot on single integer $branchcond = "\$X > $_[$mid]"; $matchcond = "\$X == $_[$mid]"; } my $fragment = " return $matchcond;\n"; if ($mid+1 <= $#_) { $fragment = (" return 1 if $matchcond;\n". " if ($branchcond) {\n". $expand->(@_[$mid+1 .. $#_]). " }"); if ($mid-1 >= 0) { $fragment .= (" else {\n". $expand->(@_[0 .. $mid-1]). " }\n"); } else { $fragment .= "\n"; } } return $fragment }; my $tree = $expand->(@$set); return eval <<CODE sub { my \$X = shift; $tree return !1; # false value for no match if end reached } CODE } my $Set = parse shift; my $Test = wrap $Set; my $Clos = compile $Set; use Data::Dumper; print Data::Dumper->new([$Set],[qw(Set)])->Indent(0)->Dump,"\n"; print Data::Dumper->new([$Clos], [qw(Closure)])->Deparse(1)->Dump; sub sym ($) { $_[0] ? 'X' : ' ' } printf "%4s | basic | wrapped | compile\n", ''; foreach (@ARGV) { printf "%4d | %1s | %1s | %1s\n", $_, sym(search $Set, $_), sym($Test->($_)), sym($Clos->($_)); }

        Edit 2021-02-02 by jcb: Use B::Deparse with Data::Dumper to also output the compiled closure. (line 169 inserted)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11127703]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (6)
As of 2021-06-15 09:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What does the "s" stand for in "perls"? (Whence perls)












    Results (69 votes). Check out past polls.

    Notices?