Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^2: bit-vector > global minimum

by baxy77bax (Chaplain)
on Oct 03, 2011 at 18:05 UTC ( #929394=note: print w/ replies, xml ) Need Help??


in reply to Re: bit-vector > global minimum
in thread bit-vector > global minimum

ok yes this is the first level that i probably didn't explain as well as i shpuld. but that is what i was talking about and complaining that this runs in O(X) time where X= |A[i,j]|. Now if you would preprocess this one more time to get from :

100101000100 A[0] A[7] 1 | 001010 | 0
to
1 | 0 | 0 > in this array you need to do 3 jumps (for +my $p ( $s .. $e )) #the reduction follows from the Cartesian tree that follows from the b +it-vector 100101000100#
you would need to do only 3 jumps and so on and so on. and this iterative preprocessing will in the end place a tree on top of my vector. and searching such binary tree is faster. So what i need is a better preprocessig of my initial 100101000100. so i can save it in a Sparse table and in just 2 comparisons (steps) do what you achieved here in 7 (X)steps. Or if something totally crazy is suggested that will blow my mind straight down-under :) I will not complain :)

Thank You !!

baxy


Comment on Re^2: bit-vector > global minimum
Select or Download Code
Re^3: bit-vector > global minimum
by BrowserUk (Pope) on Oct 03, 2011 at 19:04 UTC

    I think that this may actually be a live sighting of the mythical "PM XY problem".

    Is it fair to sum up your question as:

    You don't want to use the straight forward linear calculation because it will be too slow, but you don't want to build the obvious tree structure because it takes too much memory? Is there some magical third way?

    You've describe the query you need to satisfy as given a range of positions (n,m), where along it lies the minima. How many of those queries do you have to satisfy?

    Are they effectively random queries. Ie. random start position and random length?

    Or are you calculating one (or a few) length(s) for all start positions?

    Or all lengths for all start positions?

    You mentioned 300GB. Is that a single huge bit-vector, or many short vectors?

    Basically what I'm getting at here is a clearer description of what this data is; how big it is; the nature of the required processing; etc. rather than just your current approach to this very specific problem, might trigger a different or more innovative approach to the overall problem.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Fair enough !

      "You don't want to use the straight forward linear calculation because it will be too slow, but you don't want to build the obvious tree structure because it takes too much memory? Is there some magical third way?"

      A:Correct, and there is some magical third way by doing O(1) checkups in the Spares table. The problem is to reduce ST entries to bits (0,1).

      "You've describe the query you need to satisfy as given a range of positions (n,m), where along it lies the minima. How many of those queries do you have to satisfy?

      Are they effectively random queries. Ie. random start position and random length?

      Or are you calculating one (or a few) length(s) for all start positions?

      Or all lengths for all start positions?"

      A: All possible queries and all possible lengths!

      "You mentioned 300GB. Is that a single huge bit-vector, or many short vectors?"

      A:One big and many short vectors due to the reduction that I have described in the earlier post.

      So this is a really big string of 0's and 1's and i have no idea what is it for i only know that it is used in string matching and alignment. the thing is, this is a classical LCA problem because you have an ordered structure in the background (that you cannot change because then the whole concept of the algorithm falls apart) and you need to solve it.

      solving it means converting it to an array of integers. And if this is done then memory requiremants jump through the roof. The reason why i have tried to simplify things is because when you give an extensive description of the problem people, me being the first, loose interest in messing with it because ,if you are not in the field then there is a bunch of strange stuff going on that needs to be accounted for and then it all comes down to google , Sedgewick, Gusfield, Fischer ..... which is what i am trying to avoid, been there done that and now i'm looking for a person to give an opinion on this particular sub-problem that i feel could do the trick. If not, then back to the drawing board for me:). So I avoided the higher purpose and just focused on the small bit which may not be well explained but i feel it is simple enough to at least receive couple of suggestions (like the ones you guys made ), and this is what i prefer more, and find more informative, then referencing me to a google site.

      So thank you and any suggestion comment or even saying that the problem as given cannot be solved is more then welcome!

      Cheers

      baxy

        I don't think that I understand the question. Help me out...because this just seems too simple. Successive table look-ups that are combined together - no input vector bit is used more than once.

        I will proffer that O(n) notation may not be the best for describing what is efficient or not. In an abstract sense, the number of operations is paramount. In an implementation, how fast/slow each operation is also matters. Sometimes using more really fast operations is "better" than fewer slow operations.

        Re: bit-vector > global minimum sounded pretty good to me. It appears to me that every bit in the bit vector will need to be examined. If this is not done bit by bit, then why not do it in groups of bits as a look-up table?

        A binary search tree that takes into account all possible variations of a really, really long vector could be huge!

        If I do this 8 bits at a time, then there is a table of 512 possible results and I think the result can be calculated one look-up at time.
        01234567
        10010010
        bits 2,5 are lowest min values, that whole vector '10010010' could be an index into an array of struct that lists the values of 2,5. I think there needs to be an extra test of '0' at the end of the 8 bits and a way to propagate results to the next 8 bits to keep track of the "best" so far. The implementation may work better if bits are inverted before the table look-up. And I think that probably one more bit needs to be added to the look-up table to help with a "stitch" of the previous result together with current result is required.

        I am thinking that a static table would suffice. A dynamic table would perhaps be better (shorter)?

        Hm. Whilst looking up some of your terminology -- LCA; Sparse Table -- I ran across this. Maybe you've seen it already, but it sounds vaguely related.

        Maybe you'll understand it well enough to work out how to implement it. Assuming it is actually useful.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://929394]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (11)
As of 2014-07-22 22:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (129 votes), past polls