Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Baseball line up (best rotation)

by LeGo (Chaplain)
on May 17, 2001 at 19:07 UTC ( [id://81264]=perlquestion: print w/replies, xml ) Need Help??

LeGo has asked for the wisdom of the Perl Monks concerning the following question:

Okay a quick history to a program idea. I have an 8 year old brother that plays little league baseball. He hits maybe 8th on his team of about 14 players that all hit. He ussually gets a good hit and gets on base but he never gets to score becuase the players behind him never hit him in. They have not gone past the 4th inning this year becuase they get a skunk rule applied (too many points down to catch up).

I would like to talk to his coach about the line up but would like to with some statistics behind my reasoning. I want to come up with the best batting line up based on the hitting record/out record of the previous games.

Any help would be appreciated. I have in my mind a way that I would like to set it up but don't feel that it is the best and I would like some help.

Just some quick facts about his league and the games.
---Every kid on the team gets to hit whether they are in the game or not. So x number of kids hit consecutavely.
---You can only score 6 runs an inning max.
---There are 3 outs to an inning.

I can compile the data in any method (for coding purposes), all I see needed is the players name/number, and what base they landed on.

If there is any more info needed or if this is unclear please let me know.

LeGo

Replies are listed 'Best First'.
Re (tilly) 1: Baseball line up (best rotation)
by tilly (Archbishop) on May 17, 2001 at 21:30 UTC
    First of all don't worry about this looking like homework. Homework is closed ended and well specified. Your problem is neither.

    Your basic problem is a modelling issue. How detailed do your want your model to be of what goes into baseball, and how much data do you have to defend that model? After you have your model you have a completely separate analysis issue of figuring out the expected performance.

    So what goes into a model? Well you can try to model everything, but I would say that you should go for simple. A person goes up to bat. One of several things happen. They get out. A hit advances people x bases (0, 1, 2, 3, homerun) and the top y people get out. Does your data look something like this? Ignore details like, "He runs really well" and assume it does.

    Your next step is to fit the model to the people on the team. You have a number of outcomes when foo goes up to bat. Estimate the relative probabilities. The simpler your model, the fewer possibilities, the more data, the more comfortable you will be with your fit. But conversely the simpler, the less that is taken into account, the worse your model.

    For the analysis I suggest Monte Carlo. You have your model. You have your numbers. Play Ball! There are only 87178291200 possible line-ups, a computer can crank through that in abou...

    Oh shoot. That will take a while.

    What you will need to do is take your players and rank them into a few roughly equal groups. Rather than try each lineup you want to try every way of scattering your fixed groups around the lineup. For instance if your groups are the star, 2 more good players, 6 more OK ones, and the 5 who demonstrate why it is little league, then you have about a half-million possible lineups to consider.

    So now play ball. Play each of these lineups for 100 innings. (By play a lineup I mean randomly line up the players within the lineup, generate random numbers, and play.) That is about 50 million simulated innings, it will take a while. Drop 2/3 of them. Try that again. Keep on doing that until you get down to a hundred or so grouped lineups. Then take your groupings and split your groups in half. That will get you a lot more lineups again. Wash, rinse, and repeat until you have (by your numbers) the top few lineups.

    If your kid brother doesn't move up in the batting numbers, don't tell anyone. If he does, then good luck convincing the coach...

    Either way you will learn something about statistics, programming, and exactly how hard it is to come up with a decent model of anything in the real world.

      Setting it up as Tilly suggests, this also might be a good problem for a genetic algorithm. First, create and fill a hundred or so random lineups from all available teams. Each step of the interation will require playing each line up as randomly as tilly suggests for a game, assigning the total number of runs won as the 'value' of that lineup. For each line up, you could run the game multiple times, the value being the total sum of all scores. When all lineups have been done, sort this set based on scores; remove those lineups that did not perform well (say, less than 2*number of games played*number of innings), and for those that did perform well (say, better than 4*number of games played*number of innings), copy and mutate them. The mutation would be one of two things: either randomly switch the order or two consecutive players on the team, or switch a random player with a random player not currently on the lineup. Clear out all the current 'values' and rerun. After several iterations of this, the top 10 or so lineups should have outstanding results, which you can then compare to the current lineup with.

      To make this work better, you should generate a large array of random numbers that would be regenerated on each step, but within each step, each tested lineup would use the same random numbers in the same order, if only to remove a potental bias.

      (Yes, I know this is serious overkill, but it's an interesting thought ...)


      Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
        Just to supplement Masem's reply...

        First, you might consider doing a crossover in addition to a mutation to ensure that you properly explore the line-up space. You'll need to do a partially matched crossover (PMX) to ensure that you don't duplicate the team members. You could do a mutation as suggested by Masem either on each lineup or on each member of a lineup 1-5% of the time.

        Secondly you could experiment with various types of selection algorithms for determining who breeds and who dies. Masem suggested the Percentage model, but there's also the Roulette and Tournament models.

        Third, it would be really neat to take your top lineup from one run and stick it into a new run of the GA to see how it fares.

        Check out Genetic Algorithms with Perl for more guidance and inspiration.

        Enjoy!

        Addendum: I just realized the page didn't actually link to the code.

Re: Baseball line up (best rotation)
by Dominus (Parson) on May 18, 2001 at 00:51 UTC
    This problem has been well-studied, as you might imagine. The optimal lineup varies depending on the game situation. (For example, if the home team is down by one run in the bottom of the ninth inning, they are no longer interested in the strategy which gives them the largest expected number of runs over the long term, but in the strategy that gives them the greatest probability of scoring two runs in the rest of the inning.) But in general, and over the long term, the answer is simple: To maximize the expected number of runs scored, the batters should appear in decreasing order of their on-base average. The player with the highest on-base average should appear first. I will explain why this is is a little further down.

    On-base average is like batting average, but it is calculated differently---it's simpler. When a batter comes to the plate, that is a 'plate appearance'. On-base average is the fraction of a batter's plate appearances in which he reaches base on a hit, a walk, or a hit-by-pitch. (Your brother's league probably doesn't play with hit-by-pitch, so you can ignore this; they may not play with walks either, in which case you can ignore that too.)

    Batting average is different: It's hits divided by at-bats, and not every plate appearance is an at-bat. Certain plate appearances do not count as at-bats. In particular, if the batter walks, that is not an at-bat. The batter did not get a hit, but his batting average is unchanged, because he wasn't offered a fair chance to get a hit.

    For on-base average, you count all plate appearances, and hits and walks both count positively.

    The reason on-base average is important is this: The team gets only 27 outs in the game. The outs are like a clock that is ticking away. Once they use up their 27 outs, the game is over. The more batters your team can send to the plate in 27 outs, the more likely the team is to score and win. On-base average is precisely the chance that a batter will go to the plate without producing an out. If the batters on the team have on-base averages of .250, the team sends an average of 4 batters to the plate each inning. If the batters on the team have on-base averages of .350, the team sends an average of 4.6 batters to the plate each inning. Compared with the .250 team, they are getting a free inning!

    If only two consecutive batters get on base in an inning, they will probably not score. If four consecutive batters get on base in an inning, at least one will certainly score. Clustering together the batters with a high on-base average maximizes the likelihood of a long inning and therefore a high-scoring inning.

    By the way, you don't want to use the word 'rotation' here. That refers to pitching staffs. The word you want is either 'lineup' or 'batting order'.

    --
    Mark Dominus
    Perl Paraphernalia

(arturo) Re: Baseball line up (best rotation)
by arturo (Vicar) on May 17, 2001 at 20:39 UTC

    To do this I would use a DBMS. Yes, you can do it in Perl data structures but if you start out even on this simple project with a hand-rolled solution, you'll end up tearing your hair out when you add features later. (I can easily see any app you'd develop for this specific purpose growing into a general statistical database for the league. Really!)

    Fear not, MySQL and PostgreSQL are available for the cost of a download and if you're running a free *nix (linux, bsd), you've probably got it installed already. Learn SQL. Get a copy of Programming the Perl DBI and write a DBI application that manipulates a relational database.

    That's short on specifics. I'd actually written somethign about the likely table structure, but my browser crashed. Let's just say you'd want a table for teams, players, and one for at bats. In the 'at bats' table, you store the game, the inning, the batter, the pitcher, the result (walk, strikeout, single, etc.) and the highest base reached. Then you could find out the number of singles Joe's hit against Sue with SQL like

    SELECT count(*) FROM at_bat WHERE pitcher='Sue' AND batter='Joe' WHERE + result = 'single';

    The SQL for averages would be more complex, but I don't have time to think it out.

    This could be a *really fun* project, IMO.

    perl -e 'print "How sweet does a rose smell? "; chomp ($n = <STDIN>); +$rose = "smells sweet to degree $n"; *other_name = *rose; print "$oth +er_name\n"'
Re: Baseball line up (best rotation)
by dze27 (Pilgrim) on May 17, 2001 at 22:21 UTC

    Check the Internet! Effect of Batting Order on Runs Scored. That guy recommends ordering players by on-base average in descending order, so the players who get on base more often hit more. I don't think the setup for your league would change this conclusion much.

    The reality is, lineup order has surprisingly little effect on the total number of runs scored.

    Note for non baseball fans, on-base average is obtained by dividing the number of times a player reaches base through hit, walk or being hit by pitch, by his number of plate appearances.

    Here's another interesting link: Protection - Hit or Myth?.

Re: Baseball line up (best rotation)
by atcroft (Abbot) on May 18, 2001 at 17:05 UTC

    I wish you well in finding a reasonable solution.

    I may be off-base (no pun intended), but my thinking is that this might be related to the knapsack problem, and that you might want to take a look at code solving that problem as a starting point. The knapsack problem involves the idea of a number of objects, each with a weight/value, in which the goal is to find as good a possible arrangement of objects as can be stored in the container (the container having a maximum capacity). In your case, this is finding a better arrangement of players in a lineup, where the value is the number of bases they reach per hit (or something similar). The knapsack problem is also one of a set of possibly NP-complete problems (so finding references to it is not difficult).

    Hope that helps, and good luck on your search.

Re: Baseball line up (best rotation)
by Hero Zzyzzx (Curate) on May 17, 2001 at 19:16 UTC

    Edit:Sorry, Abbot Lego. You've got to admit that it has the look of homework, though.


    Original message:Homework anyone? (though it is a neat question)

      update: It's cool. It does look much like HW.

      I apologize if this looks like HW but it isn't. Not even remotely. I have been around long enough to know to not post HW.

      My schedule for the summer is at the below link, if you don't believe me I don't know what to say. I am a MIS bs student at NCSU taking no classes now and only business and psychology later this summer.

      http://www4.ncsu.edu/~zltunsta/schedule.html

      LeGo

        It looks like a very noble use for perl.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://81264]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (3)
As of 2024-04-26 04:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found