Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re: Table shuffling challenge

by zork42 (Monk)
on Aug 24, 2013 at 05:00 UTC ( #1050757=note: print w/replies, xml ) Need Help??

in reply to Table shuffling challenge

Let me see if I understand. Please say if each question is correct or not...

Q1: You have 110,000 biomarkers, represented by the 110,000 rows?
Q2: and 10 different cancerous tissue samples, represented by the 10 columns (ignoring the first column)?

Q3: The first table represents the results of the tests?
I'll call this the Test Results Table, or TRT.

Q4: Are the other 999,999 tables derrived from the TRT in some way?

Q5: So at the moment you're going:
TRT --> (some function) --> extra 999,999 tables --> process 1,000,000 tables?

Q6: Would it be quicker to go:
TRT --> process 1 table, using (some function) internally?

I have a tab-delimited table with 11 columns and approximately 110,000 rows. It has column headings and the first column is merely a count of the rows (1, 2, 3, 4, 5 etc.). Each entry in the table is either a 1 or a 0. I need to randomly select an entry from columns 2-11, sum up their values and record the sum (will be a number between 0 and 10). I need to do this until all values in the table are gone and no values are used more than once per table. Aaaand here's the kicker: I need to do this 1,000,000 times (ie, repeat for 1,000,000 tables).
Q7: What do you mean by "I need to randomly select an entry from columns 2-11, sum up their values and record the sum (will be a number between 0 and 10)."?

Q8: Do you mean "I need to randomly select a row, sum up the row's 10 columns and record the sum (will be a number between 0 and 10)."?
I have two working programs that take about the same amount of time. Both start by saving all variables in each column to separate arrays (excluding headers). One program then shuffles each array and sums up all the nth-numbered values in the array. The other one uses the following subroutine to select values from each array until the arrays are empty:
Q9: What does "sums up all the nth-numbered values in the array" mean?

Q10: Could you explain each stage in the process by using a short (maybe 10 row) example table?

Replies are listed 'Best First'.
Re^2: Table shuffling challenge
by BrowserUk (Pope) on Aug 24, 2013 at 05:20 UTC

    Well asked++ The description of the requirements has so far left me also completely bewildered.

    I suspect there are vast improvements -- 2, 3, even 4 orders of magnitude -- in processing performance to be had for this problem; but we need to understand the problem first.

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1050757]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2017-01-22 08:09 GMT
Find Nodes?
    Voting Booth?
    Do you watch meteor showers?

    Results (186 votes). Check out past polls.