|Problems? Is your data what you think it is?|
Re: Table shuffling challengeby zork42 (Monk)
|on Aug 24, 2013 at 05:00 UTC||Need Help??|
Let me see if I understand. Please say if each question is correct or not...
Q1: You have 110,000 biomarkers, represented by the 110,000 rows?
Q2: and 10 different cancerous tissue samples, represented by the 10 columns (ignoring the first column)?
Q3: The first table represents the results of the tests?
I'll call this the Test Results Table, or TRT.
Q4: Are the other 999,999 tables derrived from the TRT in some way?
Q5: So at the moment you're going:
TRT --> (some function) --> extra 999,999 tables --> process 1,000,000 tables?
Q6: Would it be quicker to go:
TRT --> process 1 table, using (some function) internally?
I have a tab-delimited table with 11 columns and approximately 110,000 rows. It has column headings and the first column is merely a count of the rows (1, 2, 3, 4, 5 etc.). Each entry in the table is either a 1 or a 0. I need to randomly select an entry from columns 2-11, sum up their values and record the sum (will be a number between 0 and 10). I need to do this until all values in the table are gone and no values are used more than once per table. Aaaand here's the kicker: I need to do this 1,000,000 times (ie, repeat for 1,000,000 tables).Q7: What do you mean by "I need to randomly select an entry from columns 2-11, sum up their values and record the sum (will be a number between 0 and 10)."?
Q8: Do you mean "I need to randomly select a row, sum up the row's 10 columns and record the sum (will be a number between 0 and 10)."?
I have two working programs that take about the same amount of time. Both start by saving all variables in each column to separate arrays (excluding headers). One program then shuffles each array and sums up all the nth-numbered values in the array. The other one uses the following subroutine to select values from each array until the arrays are empty:Q9: What does "sums up all the nth-numbered values in the array" mean?
Q10: Could you explain each stage in the process by using a short (maybe 10 row) example table?