Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Hi

I realize this is a long shoot but maybe , just maybe someone willing to share can help me. The problem that I am trying to solve involves two integer arrays. One is static and one changes. Let x[] be the array whoes content does not change and y[] the one that changes. both arrays are 15000 int's long and they only contain values of 0 and 1. Example:

x[0 0 0 0 1 1 0 0 0 0 1 0 ...] y[1 0 0 0 1 0 1 0 1 0 0 0 ...]
there are about 5000000 y-arrays that i need to compare with 100000 x arrays. What I could do is to start from 0 and pairwise compare each entry(x1<=>y1, x1<=>y2 ... x2<=>y1, ...xn<=>yn). Though fast this is just not fast enough (by my calculations that could last several days on a cluster machine i have at my disposal and i need to repeat this app. 1000 times each month).
the other thing i could do is to preprocess y arrays so that i only store positions where 1's are. The for each y just check if x[y[i]] == 1 since there are about 500 times less 1's then 0's this is going to run much faster (and will save a lot of memory). And this is where i stopped. However, with this approach it takes me 1 sec to compare one x with all y arrays. When extrapolated to 100000 x arrays this turns out to be a lot of time, especially when considered that i need to repeat this 1000 times. Aditional information. My end result is not to know, for each (x,y) array how many 1's they share just to know what are the top 10 y arrays that share the most 1' with each x array. Distribution od 1's in both y and x arrays is random (by the looks of it they maybe uniformly distributed) My goal is to speed this comparison up at least 3 times because then i can do something with it. Obviously to speed this up i need to do as less comparisons (checkups) as possible. I was thinking about computing a hash key (a checksum of come sort) but then i would be able to identify only identical (x,y) pairs. I was also thinking of converting x array to a bit array but am not sure how will this help me speed-wise. I did rewrite this in c and i gained a lot of speed but still not enough. As you can see I am stuck. So if anyone has any input for me please do not hesitate to post, no matter how crazy or slow the idea is. And i apologize if this is not strictly speaking a Perl (syntax, code -wise) related issue.

Maybe there should be a subforum for this type of questions/discussions as Perl is being used more and more for theoretical/conceptual problem solving tasks , because you don't have to worry about technicalities and since a lot of people that do this type of work are coming from c there is no syntax leap.

Anyway, thank you

baxy


In reply to Comparing two arrays by baxy77bax

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2024-03-29 04:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found