Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Update: I now have a working algorithm in Re^2: decomposing binary matrices (2) below; the rest is a SMOO.

After getting a variety of useful leads in the discussion at decomposing binary matrices, I've thought about this a bit more. I think I can simplify the problem a bit by splitting into two parts - first cleaning the matrix up to remove all the useless '1'-bits, and then (if that still seems useful) to split the matrix into discrete submatrices.

To recap and amplify the problem: I have a matrix of booleans in which rows represent variables and columns represent values, in which a value of TRUE (1) means "this variable may take this value", and FALSE (0) means "this variable cannot take this value". (Note: the rows/columns were transposed in my original post.) We have the additional constraint of uniqueness - the value of each variable is distinct from that of every other value.

The task is a) "cleanup" - to gain additional information from the uniqueness constraint, by identifying TRUE values for variable/value combinations that cannot form a part of any solution, and setting them to FALSE; b) "splitting" - to simplify the matrix (if possible) by splitting it into two or more disjoint submatrices.

For example:
12345
A11000
B11000
C11111
D11111
(by cleanup) becomes
12345
A11000
B11000
C00111
D00111
12345
A11000
B11000
C00111
D00111
(by splitting) becomes
12
A11
B11
345
C111
D111

Both of these examples represent the same gleaned information: due to the uniqueness constraint, A and B must between them consume the values 1 and 2, so C and D are constrained between them to the values 3, 4 and 5.

In my original formulation I reversed the order of these tasks - I hoped to find an easy way to split the matrix, and have the additional information to be gained from the uniqueness constraint "fall out" as a result. I suspect now that splitting is a hard problem, and may be made easier by doing the cleanup first.

Here is some pseudocode for a recursive procedure to solve the cleanup problem:

define clean [ matrix m ] matrix m2 = matrix_of_zeroes(m.dimensions) for point p (m.allpoints) if m.value(p) and not m2.value(p) if not try_solution(m, m2, [ p ]) m.value(p) = 0 # by now m and m2 are the same define try_solution [ matrix m, matrix m2, array of points plist ] int row = first_row_not_already_tried(plist) if not defined row # all rows tried, so this is a solution for point p (@plist) m2.value(p) = 1 return true for int col (grep column_not_in_plist(plist, $_), m.allcols) point p = point_at(row, col) if m.value(p) if try_solution(m, m2, [ @plist, p ]) return true return false

The idea here is, for each TRUE value in the original matrix, see if there is any consistent solution that assigns a unique value to every variable; if not, mark it as FALSE instead. Short-circuit some of the work by recording every assignment used whenever a consistent solution is found: none of those needs to be retried by the top level.

I also plan an additional optimisation (not shown) such that if in try_solution() we pick a row that has no TRUE values in the available columns, we signal that fact to the caller - assuming that the caller is also try_solution(), it can then switch to trying that row instead (or propagate further up the call chain).

The biggest savings seem to come from getting a value set in m2 before a point is ever tried in clean(), and there are a couple of thigs I can do to try and maximise the number of times that happens - maybe by selecting a random row in try_solution(), or by preferring not-yet-seen columns, or by seeding it with a batch of random solutions.

There may also be benefits to be gained from an initial transformation of the matrix, to order both the rows and the columns in increasing order of the number of TRUE values.

As to how to implement this, I'm not sure. I suspect PDL might be a good route to go, but I've never used it, and am not sure where to start with it; I also can't see that is has support for a datatype any smaller than "byte", so it may not be appropriate to use in this case. Failing that, I think I might be able to achieve reasonable efficiency encoding the matrix as a list of bit vectors, and making the 'plist' a structure that caches the list of rows and the list of columns used so far.

Even so, the combinatorial explosion may make the whole thing senseless - my initial target to try this out on (after testing) is a 26 x 300 matrix, with a potential number of solutions around comb(300, 26) =~= 2E37. I also hope therefore to identify characteristics of the starting matrix that mean no gain is possible, so that I can skip the effort entirely in the worst cases.

If I can get this working at all usefully, I'll look at the splitting problem again.

Any suggestions for implementation or a better algorithm would be gratefully received.

Hugo


In reply to decomposing binary matrices (2) by hv

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-03-28 16:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found