Machine Learning to Predict Player Decline

This past off season was interesting for the MLB. Mike Moustakes never signed a big contract. All-star players like Eric Hosmer and Jake Arrieta didn’t ink deals until spring training. Many people provided potential explanations: owners became miserly, teams wanted to save resources for next off-season’s free agent class, and clubs started to realize the dangers of large, long-term contracts.

For the most part, teams want to give large contracts to players before their peak. However, many players only attract major deals after they have performed well, sometimes after hitting their peak. In that vein, can teams accurately predict if a player is before or after their peak? I don’t know if teams can…but machine learning can do it pretty well.

I built several machine learning models to predict if an offensive players is before or after their peak (no pitchers were analyzed). I defined “peak” as the year the player achieved their highest OPS (OPS = on base percentage + slugging percentage). The following variables were used in the model: most recent season’s OPS, cumulative games played, most recent season’s salary, position, age, decade, weight, height, batting hand, and throwing hand. The training data consisted of all players who debuted after 1970 and who have retired. Only seasons where players appeared in more than 80 games were considered.

The best model was a voting classifier, which combined the following models: logistic regression, adaptive boosting, gradient boosting, random forest, and extra trees. On a holdout set of data, the model achieved 83% predictive accuracy.

Use the Model

The web app below allows you to enter a player’s info and receive a prediction about if they are before or after their peak. I entered a few current players and received the following predictions.

  • Eric Hosmer: after
  • Mike Trout: before
  • Jose Altuve: after
  • Giancarlo Stanton: after
  • Charlie Blackmon: before

For your own exploration, you can get the info needed from baseball reference.

You must enter a valid scenario. Blank entries will throw an error. If you receive an error, simply refresh the web page. Likewise, please note the app may take a few seconds to load.

Explore the Data

The app below allows you to explore the data used to train the predictive model. The drop-down menus enable you to segment players based on height, weight, batting arm, throwing arm, position, and decade. In turn, you will see averages for age, games played, salary, and OPS for players before and after their peak.

You will notice that players sometimes have higher OPS numbers after their peak. Intuitively, this makes sense. Players often struggle early in their careers, dragging down their aggregate numbers. Many players are also still effective after their peak. The question remains: is the player on an upward of downward swing?

Keep in mind that some scenarios may yield no results. For example, second baseman rarely throw left-handed. Likewise, some scenarios may not have a “before peak” – players might achieve their highest OPS in their first season. Further, some highly specific scenarios (e.g. skinny first baseman) will yield small sample sizes. Lastly, salary information is sometimes not available. Please note the app may take a few seconds to load.