Given the nine starting players,
in what order should they bat? Traditional guidelines such as
ìthe leadoff man should be a good base stealerî,
ìnumber two should be a contact hitter who can hit behind
the runnerî, ìbat your best hitter thirdî abound.
Due to computational complexities, there have been few studies
that analyze the batting order question from a quantitative viewpoint.
This article discusses what I believe is the most comprehensive
mathematical and statistical approach to lineup determination.
The models and the methods used to develop them are described,
and some resulting principles of batting order construction are
presented. Finally, the models are applied to the 1991 AL division
winners and compared to the batting orders employed by the teamsí
managers.
The material presented here is an
expanded version of the talk I gave at SABR XXI in New York during
July, 1991. I have written several pieces on using Markov models
applied to baseball; readers wanting more information may write
to me [1018 N. Cleveland St., Arlington, VA 22201].
The study utilizes two mathematical/statistical
models: 1) a Markov process model that calculates the long-term
average (often called expected) runs per game that a given lineup
will score, and 2) a statistically derived model that quantitatively
evaluates the suitability of each of the nine players in each
of the nine batting order positions. Data for the second model
were generated by numerous runs of the Markov model. Hence, we
see that the Markov model underlies the entire analysis.
THE MARKOV PROCESS MODEL
The Markov process model is based
on the probabilities of moving from one runners and outs situation
to another, possibly the same, situation. These probabilities,
which depend on who is batting, are called transition probabilities.
For example, one such transition is from no one on and no outs
to a runner on first and no outs; and the transition probability
is that of a single, walk, hit batsman, safe at first on an error,
catcher interference, or striking out and reaching first on a
wild pitch or passed ball. The Markov model employs matrix algebra
to perform the complex calculations. However, once all the requisite
probabilities have been determined, the matrix formulation enables
the remaining calculations to be carried out without much difficulty.
It is important to note that assumptions
made in determining the transition probabilities have an enormous
influence on the the batting order results presented later. The
goal is to choose a realistic set of assumptions, but, as always,
some simplifying assumptions are quite helpful. Moreover, some
of the assumptions are open to alternatives, the particular ones
employed being a matter of judgment or study objectives. The
key assumptions for the current analysis are:
1) Players bat the same in all situations. For this study, each playerís 1990 full season data was used to determine how he would bat.
2) All base advancement, outs on the bases (including double plays), wild pitches, passed balls, balks, etc. occur according to major league average probabilities.
3) Stolen base attempts are permitted with a runner on first only.
4) Only pitchers attempt sacrifice bunts.
5) Overall 1990 pitcher batting is used for all pitchers.
6) Small adjustments to hit and walk
frequencies are made in certain situations. In particular, there
are more walks and fewer hits when there are runners on base and
first base is not occupied.
Data for 2) and 6) are derived from
combined AL and NL data for the 1986 season. I used this season
because I had extracted the needed data from the Project Scoresheet
database for a prior study. Since this is a time consuming operation,
I decided not to repeat it using 1990 data. Comparable data for
several seasons would be better, and I may do the computer work
on the entire Project Scoresheet database covering 1984-91. However,
I doubt that the essential results and lineup optimization models
derived would be affected very much.
The first assumption is the most
critical and most controversial. One of its consequences is that
the differences in expected runs between batting orders tend to
be relatively small. A previous, less extensive, study that incorporated
situational performance assumptions (e.g. certain players hit
better with runners on) showed much larger differences in expected
scoring. I plan to explore various alternative assumptions about
performance levels in future batting order studies.
Base advancement on hits certainly
is not uniform since it depends on runner speed and where the
particular batter tends to get his hits (e.g. the percentage of
singles to left, center, or right). However, I did not have the
data needed to incorporate such effects. Data availability also
prevented batter specific double play modeling.
The stolen base try restriction does
not have a large effect because over 80% of steal attempts occur
with a runner on first only. The restriction to this case greatly
simplifies the computations and is not likely to affect comparisons
between batting orders. Sacrifice bunt tries are not included
for non-pitchers because they are game situation specific and
reduce overall scoring, contrary to the study objective of finding
the highest scoring lineups.
DATA FOR THE STATISTICAL MODELS
The Markov model was used for two
primary purposes. One purpose is to evaluate a specific batting
order by calculating its expected runs per game. In this way,
alternative lineups can be compared. The second purpose is the
generation of data for use in the statistical models. For each
of the 26 major league teams in 1990, 200 ìbatting rotationsî
were chosen at random. A batting rotation consists in specifying
the order in which the players will bat by establishing who follows
whom, but a rotation does not become a lineup or batting order
until the leadoff hitter in the first inning is specified. Each
batting rotation corresponds to nine lineups, one for each possible
leadoff batter. The Markov calculations have the property that
the computations needed for one lineup are also sufficient for
the other eight lineups corresponding to the same batting rotation.
There is nothing special about the choice of 200; it was a function
of the computing power available to me and the amount of time
I could spend on this phase of the study. More, as usually is
the case for statistical analyses, would have been better.
Thus, the Markov model computed the
expected runs per game for 1800 ìsemi-randomlyî (a
made up concept since only the batting rotations are chosen at
random) generated batting orders incorporating the nine most frequent
players, one for each position. One property of the 1800 lineups
is that each of the nine players hits in each batting position
exactly 200 times.
The next step was to select the best
lineups for each team from the 1800 tested. I used two definitions
of best. The first is obvious: select the ones with the highest
expected runs per game. The second definition is more subtle.
Each batting rotation will have one lineup that scores the best,
and this lineup may or may not be one of the highest scoring lineups
out of the 1800. Call the highest scoring lineup for each rotation,
a maximal lineup. The reason a maximal lineup, which
may not be a particularly high scoring lineup overall, is of interest
is that it can reveal advantages to batting certain players in
certain positions although the overall scoring is held down by
the batting positions of other players. Since there were 200
maximal lineups, one for each rotation, I decided to use them
and the 200 highest scoring lineups as the basis for the statistical
analysis. I did not determine how many of the maximal lineups
were also in the 200 highest scoring.
Within each set of 200 best lineups,
I computed how often each player hit in each batting position.
For example, Wade Boggs leads off in 21% of Bostonís highest
scoring lineups. (This value, the highest on the team, means
that Boggs is a good first hitter since the average is 100%/9
= 11.1%) In this way, each player has a rating for his suitability
for each batting order position.
For each player, I computed scores
in 21 offensive measures relative to the group of nine
starting players on his team. The offensive measures are batting
average; on base average; slugging average; slugging average modified
by counting walks as singles and SF as AB (which is the relationship
of on base percentage to batting average); extra base average
[=SA-BA, also called isolated power]; runs created per game; frequency
per plate appearance of each type of hit, walks (including hit
by pitch), and strikeouts; relative frequency of each type of
hit (i.e. the percentage of players hits that are singles, doubles,
); percentage of plate appearances that are not walks or
strikeouts (which measures putting the ball in play); secondary
average [ = (TB-H+BB+SB-CS)/AB, a Bill James idea]; run element
ratio [ = (BB+SB)/(TB-H), another Bill James idea];
steal attempt frequency [ = (SB+CS)/(1B+BB)]; and
stolen base success percentage [ = SB/(SB+CS)]. No claim is
made that the set of measures chosen is complete or perfect, just
that it covers all the significant aspects of offensive performance.
I used two measures of player performance
relative to the team: 1) percentage above or below the team mean
in the category, and 2) the z-score, which is the number
of standard deviations above or below the mean. By using z-scores,
I am not claiming any of the these distributions is normal (given
that there are only nine values for a team in each offensive category,
the distributions are almost certainly not even approximately
normal); I am just using z-scores as a measure of relative
performance.
REGRESSION ANALYSIS
In the next phase, I applied regression
analysis using the playersí batting position ratings (e.g.
Wade Boggs 21% batting first) as the dependent variable and their
relative scores for the various offensive measures as the candidate
independent variables. For each batting position there are 236
data points,óone for each of the nine players on the 26
teamsóused in the regression estimates. Because there
were two measures for batting position ratingsóone based
on the highest scoring lineups and one based on the maximal lineupsóand
two measures of relative offensive performanceópercentages
above or below the team mean and z-scores, there are four
possible categories of models that can be derived. I tested all
four, as described below, decided on the one that seemed to yield
the models with the best statistical properties, and focused on
that one. The best combination from the first round of testing
was highest scoring rather than maximal lineups as the basis of
the dependent variable and z-scores for the independent variables.
To do the regressions, I used the
stepwise regression procedure in the SHAZAM statistical package
with a 10% significance level required for variables to enter
or leave the equations. One equation is estimated for each batting
order position, and the estimates are done independently. Since
the nine batting position values for a given player must add to
100%, I experimented with some joint estimation techniques. However,
they did not yield significantly different models from the independent
estimates, so I used the independent estimates throughout this
study. After performing stepwise regressions for each of the
four categories of models described in the previous paragraph,
I restricted further investigation to the highest scoring/z-scores
category.
For this first set of regressions
for highest scoring/z-scores models, the r2
values ranges from a high of 0.914 (#9 position) to a low of 0.580
(#6). It is no surprise that the best fit is obtained for the
#9 position because of the inclusion of NL teams with pitchers
that bat. The number of independent variables in these equations
range from a low of 4 (#2,#4) to 12 (#9). Overall, I judged this
to be good and workable set of models. Three candidate variablesóhome
runs per plate appearance, run element ratio, and stolen base
success percentage (which is highly correlated with steal attempt
frequency)ódid not enter any of the nine model equations.
The variables most frequently in the equations were runs created
per game (in 7 equations, all but #4 and #5) and modified slugging
average including walks (in 6, all but #2, #5, #7).
The offensive performance measures
that are the basis of the independent variables are not truly
independent, and several measure similar player performance characteristics.
Since the models usually included several such variables, often
with opposite signs, I decided to see if a smaller set of independent
variables could yield models with r2
values almost as high, but which lend themselves to more sensible
interpretations. After examining the equations and the correlation
matrix of the candidate independent variables, I restricted the
candidates to the following nine: on base average (OBA), slugging
average (SA), extra base average (EBA), BB/PA, K/PA, 1B/H, HR/H,
ball in play percentage (INPLAY), steal attempt frequency (SBTRY).
The resulting set of models had r2
values from 0.885 (#9) down to 0.607 (#5) and 0.434 (#6). With
the exception of #6, the decline in r2
is not a major concern. In order to improve the model for the
sixth position, I added RC/G to set of candidate independent variables
for that equation only, which improved its r2
to 0.557. The number of independent
variables ranges from 3 (#3,#4,#7) to 7 (#9). Each candidate
variable appeared in at least one of the model equations. The
table that follows summarizes the models; a plus sign before the
variable means high scores are best for the particular batting
order position, and a minus sign indicates the opposite. There
are numerical values, the model equation parameters, which are
not shown, associated with each variable in the table. These
values determine the relative importance of the variables.
I also did some regression analyses
using each of the leagues separately because I wanted to see if
the DH rule affected the models. In general, the statistical
propertiesógoodness of fit and significance levels of the
parametersówere poorer for the models based on the separate
leagues. Also, I was not able to interpret the models in a way
that could answer the DH question. I suspect that I need more
and better data to do this analysis. More in that teams from
seasons other than 1990 should be included, and better in that
more than 200 batting rotations should be calculated to determine
the player/batting position scores. Additional candidate independent
variables should also be considered. Due to time constraints,
I did not pursue these models further, but this is a topic worth
further investigation if for no other reason than the feeling
of some AL managers that the number nine hitter should considered
as a second leadoff hitter.
GENERATING LINEUPS BASED ON THE
BATTING POSITION MODELS
Once the batting position model equations
are in hand, for a given team, we can compute a value in each
of the nine batting order positions for each player. These values
can be positive, meaning the player is better than average for
the particular lineup position, or negative, which has the opposite
meaning. These scores serve to rank the nine players for each
lineup position and also to identify the best position for each
player. The next step is using those values to find one or more
high scoring lineups. Things would be easy if the best position
for each player was the highest rating for that position on the
entire team. This occurs, for example if Wade Boggs best spot
is leadoff and the highest scoring leadoff man on the Red Sox
is Boggs; Jody Reedís best spot is #2 and the Soxí
best #2 is Reed; etc. However, such is rarely the case. Due
to the nature of the models, it is common for the player with
the best leadoff score to also have the best #2 score and a high
#3 score. Also, the scores on the ends of the lineup (#1, #2,
#8, #9) tend to be more extreme, both on the high and low sides,
than the scores in the middle. This reflects the modelsí
emphasis on the importance of having high on base average hitters
at the top of the order, which is discussed later.
What we need is a method of assigning
players to lineup positions so that total model scores from the
assignments is high. This is a well known Operations Research
topic known as an assignment problem. Fortunately, this
type of problem can be solved used several methods, some of which
are easy to implement on computers and run quickly. I chose an
algorithm that not only finds the best possible assignment, but
also finds the top n assignments, where n can be specified. For
the purposes of this study, I set n equal to five. For each set
of batting positions modelsóone based on the full set of
independent variables and one based on the reduced setóI
found the five highest assignments for a team, which were always
quite close in total batting position values. These lineups were
fed into the Markov model to find the expected runs per game.
The lineup with the highest expected scoring was usually one
of the top three solutions to the assignment problem, but the
best solution did not seem to have an advantage over the next
two. In some cases, a comparison of the expected scoring and
the batting order differences among lineups led me to formulate
a lineup with even better expected runs per game that was not
in the five solutions to the assignment problem.
For each of the 1990 major league
teams, I compared the expected runs of the best lineups found
using the models described in the table with the best found using
the models based on the full set of candidate independent variables.
For 3 AL and 6 NL teams, the full variable models had a slight
advantage (about 1-2 runs a season), and for 4 AL and 2 NL teams,
the reduced variable set models had a similar advantage. For
the rest of the teams, the two sets of models were virtually the
same. Because the smaller variable set models are easier to comprehend,
the discussion in the next section is based on those models.
INTERPRETING THE MODELS
Due to the nature of the regression
process, it can be misleading to draw conclusions about individual
variables without considering the context provided by the entire
set of variables. One example is the -SBTRY for the leadoff position.
This is the fifth most important variable (its weight is about
10% that of OBA, which is by far and away the most important characteristic
of a good leadoff hitter). Even so, does it mean that other things
being equal, which they never are, it is better to have a leadoff
hitter who doesnít try to steal? It might, but it also
may just be the regression distinguishing certain slow effective
leadoff hitters based on the Markov model, Wade Boggs for example.
Additional statistical analysis, which I have not yet gotten
to, could determine if one or two specific players are the cause
of the -SBTRY.
The less important explanatory variables
often play a role of emphasizing or modifying the more important
ones. For example, the -INPLAY in the #1 and #2 positions
serves to emphasize BB/PA. If I wanted to try to find the best
set of variables for each position, I would try to build these
two models without INPLAY. To illustrate the idea of modification,
the -EBA in #2 balances the +SLUG and +OBA. Often players with
high OBA have high BA and above average SLUG since slugging average
incorporates batting average. The negative EBA in effect puts
more weight on the OBA and less weight on power. A more interesting
instance is the -HR/H in the model for #4. Does this mean that
the clean up hitter shouldnít hit homers? No, what it
means is that among players with high slugging averages, it is
better to have one who does not get his slugging average mainly
from home runsóa Dave Kingmanóbut instead has a
good batting average and hits a fair number of doublesóan
Eddie Murray.
The model equations, which are not
shown, can be interpreted to characterize the desirable abilities
for each batting order position:
1) Getting on base is everything. To much lesser extent, home run hitters should not lead off. Stolen base ability is irrelevant.
2) Similar to the leadoff hitter, but not quite as crucial to get on base; some power is also desirable.
3) Should have fair power, be able to draw walks, and not strike out much.
4) Highest slugging average; also has a good on base percentage and is not necessarily the best home run hitter.
5) Good power; secondarily puts ball in play (i.e. does not walk or strike out a lot).
6) Hardest spot to characterize and probably least critical. Probably want to use player who doesnít fit well in other positions. Base stealing ability is a small plus.
7-9) Decreasing overall abilities
as hitters as characterized by on base percentage and measures
of power hitting.
One clear result from this and prior
studies is the importance of having the right batters at the top
of the order. This follows from the finding that most of the
difference in expected runs between high and low scoring lineups
using the same players occurs in the first inning. In particular,
the leadoff batter must have a high on base percentage. Also,
the second hitter must be good. The practice of leading off a
fast runner who can steal bases, but doesnít get on base
much, and putting a weak hitter ìwith good bat control
who can bunt or hit behind the runnerî second is a perfect
prescription for a lower scoring batting order.
APPLYING THE MODELS
To see them in action, consider what
these models say about the 1991 ALCS teams, Toronto and Minnesota.
Batter performance is based on full season 1991 data, and no
righty-lefty splits are used. The lineups used by the teams were
against right handed starting pitchers. Before Joe Carter was
hurt in game three, Cito Gaston used the batting order:
1) D. White, 2) R. Alomar, 3) J.
Carter, 4) J. Olerud, 5) K. Gruber, 6) C. Maldonado, 7) L. Mulliniks,
8) P. Borders, 9) M. Lee.
The Markov model expected runs per
game for this lineup is 4.739. This value is about 0.5 higher
than Torontoís 1991 actual of 4.222 runs per game. That
the Markov values are higher than the actuals is to be expected
for several reasons. The most important are: 1) the players listed
are generally better than the substitutes who play for various
reasons; 2) sacrifice bunt attempts, which decrease overall scoring,
are not included in the Markov model; 3) relief pitchers brought
in with men on base or to face particular hitters can reduce late
inning scoring; and 4) a good team usually loses more innings
in games won at home than it gains in extra inning games, but
the Markov value is based on nine complete innings per game.
The highest scoring lineup found
by the models is:
1) Mulliniks, 2) Olerud, 3) Maldonado,
4) White, 5) Alomar, 6) Carter, 7) Gruber, 8) Borders, 9) Lee
The Markov value for the above lineup
is 4.795 runs per game, which is about 9 runs per 162 game season
more than Gastonís, a difference that should be worth one
extra win. (Keep in mind that differences in expected runs between
lineups are small due to the assumption that each playerís
batting is the same in all situations.)
Mulliniks should lead off because
he has an on-base average (OBA) of .364, the highest in this group,
and little power. White, in contrast, has an OBA of .342 and
the second best slugging average (.455, Carterís is .503),
so he should not lead off despite his stolen base ability. The
major surprise is that Carter bats sixth. The batting position
equations score him as best on the team in the third, fourth,
and fifth spots, but Maldonado, White, and Alomar rate so low
as sixth, that Carter is put there instead. Tests using the Markov
model showed its makes virtually no difference if Carter bats
fourth and White and Alomar fill the five and six slots in either
order.
Minnesotaís Tom Kelly employed
the following order in the four games against right handed starters:
1) D. Gladden, 2) C. Knoblauch, 3)
K. Puckett, 4) K. Hrbek, 5) C. Davis, 6) B. Harper, 7) S. Mack,
8) M. Pagliarulo, 9) G. Gagne.
The Markov process expected runs
per game is 5.383 for this lineup, which is higher than the Twins
1991 average of 4.790 for the reasons given previously.
The best model generated lineup is:
1) Hrbek, 2) Davis, 3) Mack, 4) Puckett,
5) Harper, 6) Gagne, 7) Gladden, 8) Pagliarulo, 9) Knoblauch.
The Markov value of the model lineup
is 5.431, about 8 runs higher than Kellyís, which might
yield one more victory. Clearly, the model result flies in the
face of ìconventional wisdomî, but one reason for
building models is to gain new knowledge. Perhaps the best thing
is getting Gladden out of the lead off spot because his 1991 OBA
of .306 is by far the worst among the nine players. I am never
ceased to be amazed by managers who are so fascinated by speed
that they forget players canít steal first base! Davis
and Hrbek have the two highest OBAs, and the model takes advantage
of this by loading the top part of the order. One reason Davis
with a slugging average of .507 can bat second is that Mackís
at .529 is even better. Knoblauch is an interesting case because
the model values him highest at either the top or bottom of the
order. However, on this team, he is best suited to the bottom
because his OBA is far from the best.
One important factor not considered
is what assumptions, if any, the managers make about batting performance
by their players. If I knew such, those levels could be put into
the models, and then we could judge better how well the managers
constructed their batting orders.
Those with computer baseball games
that will automatically play hundreds or thousands of games may
find it interesting to enter the 1991 data for these two teams
and then compare the scoring of the lineups shown above for a
large number of games. I would be interested in seeing how the
results of the simulations compare with the Markov calculations.
As a test of how well the models
work, I compared lineups found by the models with lineups used
by the teams in 1990. For each team, I tabulated the number of
times each player started a game in each batting order position.
From this information, I constructed one or more typical lineups
for each team. Some teams did not really have anything close
to a set lineup, and others platooned certain fielding and batting
order positions. In all cases, I developed batting orders that
were typical of those used by the managers and that reflect their
thinking. Using the Markov process expected runs calculations,
I compared the best team lineup with the best lineup found by
either of the two sets of models óone using all the candidate
independent variables and one using the reduced variable set described
above. The table shows the extent to which the model did better
than the major league managers
ADVANTAGE IN EXPECTED RUNS OF MODEL OVER MANAGERS
Approx.
Runs/Game Runs/162G AL NL
.095 - .105 16 1 (White Sox)
.085 - .095
.075 - .085 13 1 (Phillies)
.065 - .075 11 1 2
.055 - .065 9.5 2 1
.045 - .055 8 4 2
.035 - .045 6.5 1 2
.025 - .035 5 1 3
.015 - .025 3.25 2
.005 - .015
-005 - .005 0 2 (Bos, Milw) 1 (SF)
A general rule of thumb is that an
additional 10 runs a season leads to one more win. We see that
the model lineups were better than the managersí in 23
of 26 cases with the other three being virtually equal. These
comparisons are far from definitive because the models are based
on the assumptions listed previously. Also, managers consider
many factors when deciding on batting orders, some of which canít
be modeled. For example, although Barry Bonds would be an outstanding
leadoff hitter because he gets on base so much, according to an
article in August 12, 1991 Sporting News he prefers to bat 5th
where he can get more RBIs and hence more attention and presumably
a higher salary. Even if he has faith in my models, Jim Leyland
might figure that a happy Bonds hitting fifth can help his team
more than an unhappy Bonds leading off. Moreover, Bonds might
not draw so many walks if he were batting first.
CONCLUSION
Although I believe this study is
a major advance of our knowledge about batting orders, the models
discussed are not intended to be the final word on this subject.
In particular, incorporation of some situational batting effects
should be considered. One, of particular interest, is how the
strength or weakness of the next hitter(s) affects a playerís
batting performance. For example, is there really a tendency
to ìpitch aroundî a strong hitter if he is followed
by a weak one. The primary problem is obtaining relevant data.
Also, there is room for improvement in the statistical (regression)
modeling process; additional candidate independent variables should
be studied.
I hope that this article has convinced readers that mathematical and statistical techniques can be useful for tools for designing higher scoring batting orders. For those who are interested in actually using the models described, if all goes according to plans, they should be part of the 1992 edition of the APBA computer baseball game (contact the publisher, Miller Associates, 11 Burtis Ave., New Canaan, CT 06840 for details).