Problems? Is your data what you think it is? PerlMonks

### Re^3: How to generate test data?

by roboticus (Chancellor)
 on Nov 25, 2012 at 02:33 UTC ( #1005455=note: print w/replies, xml ) Need Help??

in reply to Re^2: How to generate test data?
in thread How to generate test data?

Regarding how to choose the size of a dataset to make it take 15 minutes: If I wanted to do that, I'd start out by using progressively larger datasets to see how the time changes with dataset size. For example, look at these three datasets:

Dataset sizeSubroutine ASubroutine BSubroutine C
1000 6 1 30
2000 11 4 40
3000 17 8 48
4000 22 16 55

Once I get a few samples, I'd try to predict the next dataset size. If you look at the values for subroutine A, it looks like a simple linear progression: it looks like it handles about 160-ish items per second for all four dataset sizes. So if I wanted to make it run for 15 minutes, I'd expect it to take 15*60*160 data items. Subroutine B, however isn't linear. It looks like it gets slower and slower as the dataset increases--in this case it takes roughly T = (X/1000)^2 seconds for a dataset. Solve for X when T=15*60 seconds and that would be a reasonable prediction. The third subroutine starts out pretty slow, but you can see that the time it consumes changes less and less as you add data samples. (I was shooting for a logarithmic progression, but I don't feel like doing the math, so that one's left as an exercise for the reader!)

*HOWEVER*, these predictions assume that everything else will remain the same as the dataset grows. But you may find that at a certain dataset size, an algorithm may take a sudden, drastic increase in the time it takes. (For example you might exhaust your main memory and the OS may start swapping.) So rather than immediately going for 15 minutes, you might try to predict a dataset size that would take less time, like one or two minutes and see how far off you are. I frequently approach a final value by doubling each time (unless I'm using something like subroutine B).

I hope this is somewhat helpful...

Modern computers are so fast, though, that I expect it'll take a pretty large dataset to consume 15 minutes. (That, or a sufficiently horrible sort algorithm.)

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Replies are listed 'Best First'.
Re^4: How to generate test data?
by abdullah.yildiz (Novice) on Nov 25, 2012 at 11:48 UTC

Create A New User
Node Status?
node history
Node Type: note [id://1005455]
help
Chatterbox?
 [LanX]: big [Your Mother]: That show, in hindsight, is terribly ominous... playing apologist and glorifier for Vietnam vets. Attempting to neuter violence as a valid tool where no one ever really gets hurt... [Your Mother]: I wonder just how much that single show influenced the severe cognitive dissonance of my generation in the US. [LanX]: A Team or McG ? [Your Mother]: A-Team. [LanX]: never liked it, but a friend of my mother was a big "fan" of Mr T xD [LanX]: (talking about minority chicks) [LanX]: Hollywood is built on stereotypes which sell, this includes stereotypes about English, French, russians and ... Americans [Your Mother]: I think to some degree, maybe a large one, the stereotypes are purely products OF Hollywood and not what would sell best.

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (11)
As of 2018-03-19 14:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
When I think of a mole I think of:

Results (240 votes). Check out past polls.

Notices?