http://www.perlmonks.org?node_id=1053961

jdlev has asked for the wisdom of the Perl Monks concerning the following question:

Well, the fantasy football optimization algorithm is up and running. Like all software, it's still got a few bugs, but I've tested it on smaller data sets, and it seems to hum along with the smaller data sets (usually less than 5 players per position).

So here's the rub, I've got it running on an older dell poweredge 2600 server with the 2.4gHz xeon processor running your typical sata hard drives. The program runs, and runs, and runs...and runs.

I extrapolated that with 10 positions to fill, and narrowing down the choices to about 150 options, the program has to run roughly 1.5 Quadrillion iterations to find the optimal solution.

I read up on the buffering issues, but confess, I have no idea what buffering is other than a place to store data (I assume in the memory) for quick access?

Is there a way to figure out how long a program should theoretically take to run through the full program? The whole I/O, pipes, tunnels, etc on computer communications is somewhat intimidating

I'm looking forward to the weekend, so I can load the program on my home server. I'm sure it will run through the program like a ferrari as opposed to the donkey cart the program currently runs on. I think the biggest difference will be made by the dual ssds drives it has. Those things move! The i7-3820 processor and 64gb of Ram should help as well :) I added a few lines to keep track of the start & completion time of the program to see how they compare. I will say if you guys don't have ssds - they're worth every penny. It completes a full check disk in under 5 minutes. My application servers in the office....3 friggin hours. Anyways...I'll stop ramblin now. Have a good weekend everyone :)

I love it when a program comes together - jdhannibal
  • Comment on CPU Boundries, Buffering, & Speed Discussion

Replies are listed 'Best First'.
Re: CPU Boundries, Buffering, & Speed Discussion
by Laurent_R (Canon) on Sep 13, 2013 at 21:54 UTC

    I think that you have to think about strategies to prune the combinations early, in order to cut down drastically the work to be done. Sometimes it can be done, sometimes it can't.

    In February of this year, I was asked to analyze a game where an individual plays against a computer (or a smart phone). This game had 2^16 (65,536) possible starting positions, 46 possible moves each time and usually 5 moves, leading in theory to 1.35 x 10^13 possible games, each consisting of 1 to 5 moves. Not a quadrillion, but not that far either. A test on a small subset of the problem showed that a brute-force approach would probably last about 5,400 hours on my computer (225 days). Besides I would run out of memory and possibly even disk space long before I could store all these possible games. But that would be mostly duplicate work. I made a backward analysis of the game: there is only one winning position, and only 46 moves can lead to this winning position, so that there are 46 positions one move away from winning the game. Each of these positions can in term be reached by 46 moves, and so on. That approach reduces the number of actual games to be potentially considered to 206 million, a much more manageable number. But it also turned out that many of the positions found at win minus two moves had already been found at win minus one move, so that if I was playing the game and found such a position, I would obviously choose the win in 1 move rather than win in 2 moves, so that all these non optimal strategies could be eliminated early, further reducing considerably the number of moves and positions to be examined. In the end my Perl program actually studied only 2,335,926 moves and lasted about 2 seconds. The program also proved that all possible initial positions can be solved in 5 moves or less.

    This problem was relatively easy to simplify, it is often much harder. But if you are talking about a quadrillion iterations, forget about throwing better hardware at it, try to find a powerful pruning strategy.

      Did you do an analysis for offshore gambling company?

        This was just a game to be played for pleasure or passing time, no money involved. ;-)

Re: CPU Boundries, Buffering, & Speed Discussion
by Laurent_R (Canon) on Sep 13, 2013 at 18:57 UTC

    I read up on the buffering issues, but confess, I have no idea what buffering is other than a place to store data (I assume in the memory) for quick access?

    Buffering is, for example, about reading more data from the disk then you currently need, based on the assumption you will probably also need the next data chunks. The idea is that it doesn't take much more to read a full memory page than to read just a few bytes. So, you read the whole page from the disk and buffer what you don't need immediately. But, in general, Perl is doing this kind of buffering without you having to do something.

    Another keyword which might be of interest to you is caching. For example, storing in memory data that are costly to calculate, to avoid calculating it again. You might want to have a look to the Memoize module, which automatize some caching for you.

Re: CPU Boundries, Buffering, & Speed Discussion
by CountZero (Bishop) on Sep 13, 2013 at 19:05 UTC
    Assuming you are using the "short" scale then 1.5 quadrillion is 1.5 x 10^15, which means that if you want to finish the job in a weekend of 48 hours, your computer will have to crunch almost 9 BILLION (9 x 10^9) iterations PER SECOND. Even with your superlative hardware I guess that is too much.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: CPU Boundries, Buffering, & Speed Discussion
by Albannach (Monsignor) on Sep 14, 2013 at 01:18 UTC
    Laurent_R has the key I think. When brute force takes a few seconds, or some tolerable time, thinking over the method may not be worth it, but when brute force makes you wait too long, it needs to be at least helped along, if not abandoned for a better method. In this case, not knowing how you constructed your player value function, can you think of a way to prune out the players that you can't conceive of being in a winning team? If not obvious up front, you could get your code to do something like:
    • calculate the best possible team score, and discard players that can't deliver at least a substantial fraction of that total
    • tag players that are consistently below their team and/or position average (or higher percentile), and favour them less in subsequent selections
    • start by building sub teams and sort them by costs and values, then you can try assembling them in a second stage
    • if your value function incorporates combinations of players (i.e. some players with some characteristics play better/worse when on a team with other players with other certain characteristics) genetics may be your best option to arrive at a good solution, if not provably optimal, in a satisfactory time

    --
    I'd like to be able to assign to an luser

Re: CPU Boundries, Buffering, & Speed Discussion
by karlgoethebier (Abbot) on Sep 13, 2013 at 18:25 UTC
    "...1.5 Quadrillion iterations..."

    Isn't a quadrillion 1x10^15 (1,000,000,000,000,000)?

    What (the hell) are you doing?

    Update:

    "..how long a program should theoretically take to run..."

    ...it might take a while ;-)

    Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

Re: CPU Boundries, Buffering, & Speed Discussion
by Happy-the-monk (Canon) on Sep 13, 2013 at 18:35 UTC

    Is there a way to figure out how long a program should theoretically take to run through the full program? The whole I/O, pipes, tunnels, etc on computer communications is somewhat intimidating

    You will need a computer to do that. It is called "running it". :-)

    scnr. Actually, it can quickly become impossible to foresee (and test) all the eventualities a programme can theoretically reach, especially if it is already too complex for the programmers to fully understand it.

    Cheers, Sören

    Créateur des bugs mobiles - let loose once, run everywhere.
    (hooked on the Perl Programming language)

Re: CPU Boundries, Buffering, & Speed Discussion
by jakeease (Friar) on Sep 15, 2013 at 08:11 UTC

    Laurent_R has made some good suggestions about pruning combinations early, and Albannach has expanded on it. If you are using an algorithm to solve the knapsack problem, keep in mind that it is a notoriously difficult problem (NP Complete) and could easily put you into quadrillions if there are too many combinations.G;ive some thought to heuristic solutions; after all, in solving a knapsack problem, each item has a value and a weight. In Fantasy Football, value is subjective, or heuristic; I guess weight just means you need to choose 11 players, including certain positions. Knapsack seeks a mathematically optimal solution; fantasy seeks a heuristically satisfying lineup with a good probability of winning a game--quite different from weighing and valuing a knapsack.