Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

bit-vector > global minimum

by baxy77bax (Deacon)
on Oct 03, 2011 at 16:16 UTC ( #929375=perlquestion: print w/replies, xml ) Need Help??
baxy77bax has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks ,

i have a question for you. It is more of a theoretical question, question requiring a tip rather then the actual code. so the problem is i have this bit-vector of size N:

Example: A: 0 1 2 3 4 5 6 7 8 9 10 11 1 0 0 1 0 1 0 0 0 1 0 0
and what i need to figure out is the position of an overall minimum given two indexes i,j. so if this is the vector and 1 marks the point of increase and 0 marks the point of decrease in some not known value,then the vector can be converted into :

if i start counting from 10 then B[0] = A[0]+10 = 11 B: 0 1 2 3 4 5 6 7 8 9 10 11 11 10 9 10 9 10 9 8 7 8 7 6
Essentially what i have here is a Cartesian tree. and this means that there are only certain types of indexes that can be picked:

i,j = B[1],B[7] (A[1],A[7]) i,j != B[6],B[10] -> then there are two min positions (B[1] and B[7]) +and this cannot be....
so given two indexes B[1],B[7] I wolud like to figure out the min value between them which in my example is 8 on the position B[7].

current solutions all include dealing with this problem by precomputing the Sparse table and then do constant time picks-> and this is really fast, but in my case i cannot create an array type B nor a classical Sparse table since i'm bound by dealing only with bits, meaning, all i have and can work with are the arrays of type A (bit-vector). so my current solution is to divide the A into chunks and then evaluate if the overall min of the chunk is greater or smaller then the overall min of the previous chunk and then create another bit-vector reflecting the relative overall growth or decay and then do this in a tree fashion until i reach the symmetrical min (here what i have is a binary tree). problem with this approach is that it runs rather slow since i cannot do constant picks like in the Sparse table and construction complexity increases from n -> n log n. since i'm running this on big datasets, putting one such structure in memory (using 64 bit OS) memory requirements grow up to 300 GB of ram (i tested it the other day on the university cluster :))

so my question is , does anyone have an idea on how to do this type of searches without building the actual tree atop the initial bit-vector (A) like i'm doing now.



Replies are listed 'Best First'.
Re: bit-vector > global minimum
by BrowserUk (Pope) on Oct 03, 2011 at 17:23 UTC

    This seems too easy, (which usually means I didn't understand the question but...):

    #! perl -slw use strict; sub minpos { my( $v, $s, $e ) = @_; my( $n, $min, $o ) = ( 0, 1e30, 0 ); for my $p ( $s .. $e ) { $n += vec( $v, $p, 1 ) ? 1 : -1; ( $min, $o ) = ( $n, $p ) if $n < $min; } return $o; } my $vec = pack 'b*', '100101000100'; my( $s, $e ) = ( 1 , 7 ); printf "The minima between %d - %d is at %d\n", $s, $e, minpos( $vec, $s, $e ); __END__ C:\test>junk5 The minima between 1 - 7 is at 7

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      ok yes this is the first level that i probably didn't explain as well as i shpuld. but that is what i was talking about and complaining that this runs in O(X) time where X= |A[i,j]|. Now if you would preprocess this one more time to get from :
      100101000100 A[0] A[7] 1 | 001010 | 0
      1 | 0 | 0 > in this array you need to do 3 jumps (for +my $p ( $s .. $e )) #the reduction follows from the Cartesian tree that follows from the b +it-vector 100101000100#
      you would need to do only 3 jumps and so on and so on. and this iterative preprocessing will in the end place a tree on top of my vector. and searching such binary tree is faster. So what i need is a better preprocessig of my initial 100101000100. so i can save it in a Sparse table and in just 2 comparisons (steps) do what you achieved here in 7 (X)steps. Or if something totally crazy is suggested that will blow my mind straight down-under :) I will not complain :)

      Thank You !!


        I think that this may actually be a live sighting of the mythical "PM XY problem".

        Is it fair to sum up your question as:

        You don't want to use the straight forward linear calculation because it will be too slow, but you don't want to build the obvious tree structure because it takes too much memory? Is there some magical third way?

        You've describe the query you need to satisfy as given a range of positions (n,m), where along it lies the minima. How many of those queries do you have to satisfy?

        Are they effectively random queries. Ie. random start position and random length?

        Or are you calculating one (or a few) length(s) for all start positions?

        Or all lengths for all start positions?

        You mentioned 300GB. Is that a single huge bit-vector, or many short vectors?

        Basically what I'm getting at here is a clearer description of what this data is; how big it is; the nature of the required processing; etc. rather than just your current approach to this very specific problem, might trigger a different or more innovative approach to the overall problem.

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: bit-vector > global minimum
by choroba (Bishop) on Oct 03, 2011 at 16:44 UTC
    Just an idea: If a zero follows a 1, it cannot be the minimum. So you can ommit all the zeros preceded by ones (remove them also), recursively. At the end, there are just zeros followed by ones. The last zero is the minimum (or the first 1 if there are no zeros). Does it help?
    Update: re-worded (underscored).

      This algorithm you suggest would fail on a bit vector of 15 1s followed by 5 zeroes.

        I updated the algorithm, an important part was missing from the post (silly me). It should work now:
        1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 - - 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 - - - - 0 0 0 1 1 1 1 1 1 1 1 1 1 1 - - - - - - 0 0 1 1 1 1 1 1 1 1 1 1 - - - - - - - - 0 1 1 1 1 1 1 1 1 1 - - - - - - - - - - ^ minimum
      but this implies that i would need to do this for every given pair of indexes which would lead to a:

      If X = |A[i,j]|, O(X) time algorithm which is higher then my tree solution that runs in O(logX) if i understood you correctly

      But never the less, this is a nice trick that didn't cross my mind. Thank you !!!!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://929375]
Approved by Corion
Front-paged by davido
and the monastery is silent...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (7)
As of 2018-06-22 23:39 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (124 votes). Check out past polls.