Markov Models/Batting Order Optimization

These are mathematical models based on the probably of moving among the base runners and numbers of outs situations. These models can be applied to strategy analysis, but I have concentrated on using them to a) estimate how many runs a given batting order will average, and b) determine the best batting order for a given nine players. You probably should print and read the introduction and then select from the other articles that interest you.

Introduction: A non-technical description of the Markov Chain models of baseball with numerical examples of some of the applications. The data used to develop the values used in the examples is too small of a sample for the results shown to be meaningful. However, the examples illustrate some of the techniques I have used. This is a minor revision of an article I wrote for the first edition of The Great American Baseball Stat Book, which was published in 1987.

Each of the following articles is available in three formats: HTML, for your browser, a Microsoft Word for Windows (.doc) file, and an Adobe Acrobat Reader (.pdf) file. The equations, tables, and graphics in the HTML versions may not be understandable depending on the browser you are using. Netscape seems to work best, but it is not without some flaws. If you have Word or a program that can read Word files, you may be better off downloading the files and looking at them using a word processing program. Note that Wordpad that comes with Windows 95/98 can read Word files, but some of the tables may not appear correctly. The Acrobat file should be readable and printable on virtually all Windows and Macintosh sytems, but to read or print it you will need the Adobe Acrobat Reader program. It may already be on your system, but if it is not, you can down load it free from: Download Acrobat Reader.

Theory: The mathematical theory and equations, which involve matrix arithmetic, behind the models. Among the topics discussed are the basic model, models for real lineups, how to compute the expected runs after each base runners and outs combination. There are no results based on real data in this article. Unless you have a mathematical bent, you will probably want to skip this one.

Batting Order Optimization: This article, which appeared in the December 1991 By The Numbers, the newsletter of the Society for American Baseball Research's Statistical Analysis Committee, explains how I used the Markov model and regression analysis to devise a method that produces batting orders that should score more runs than traditional ones. Some general rules are formulated, and there are specific examples. Some of the optimized batting orders will be considered to be strange by baseball traditionalists.

The following two articles are actually presentations that I gave at national meetings of the Society for American Baseball Research (SABR, there is a link to their web site on my baseball page). The article includes the overheads that I used during the presentation. In the first one, these are followed by a section of notes and comments that are keyed to the overheads. You probably should print the entire article so you can read the corresponding parts of the notes as you look at each overhead. The notes contain additional data. The second one does not have notes.

Both of these presentations are based on a newer version of the Markov model. This version incorporates base running probabilities (advancement on hits and outs, double plays, reaching on errors) that are based on runner speed, batter speed, and whether the batter is right handed, left handed, or a switch hitter. The earlier model used major league average for these factors.

Subtle Aspects of the Game: From the 1993 SABR convention in San Diego. Presents some of the data on the base running and reaching on error effects described above. There are also results on whether or not a batter's performance is affected by the strength of the hitter who follows him in the lineup.

Speed, Strikeouts, and Scoring: From the 1994 SABR convention in Arlington, Texas. Investigates how much fast runners add to team scoring and whether teams that strikeout a lot hurt their overall scoring. Estimate the value of moving a runner from second to third when there are no outs by hitting behind the runner and making an out

Return to the Baseball Page

Return to the Home Page