Run Average And 5 Team example essay topic

1,967 words
On Wednesday, October 27th 2004, the Curse of the Bambino was finally lifted off the City of Boston and its long-suffering baseball fans (see Appendix A for more on the Curse). For the first time in 86 years, the Boston Red Sox were the world champions of baseball. There is no arguing that the 2004 Red Sox were a good team that played excellent baseball throughout the season. The team was led not by talent cultivated through the Red Sox' farm system but by high-priced, free-agent acquisitions such as Pedro Martinez, Manny Ramirez, Keith Foulke, Curt Shilling and David Ortiz.

The average age for a Red Sox team member was 31.1 years, the oldest team average in the league. Additionally, the cumulative payroll for the 2004 Red Sox was the second highest in Major League Baseball at $125,208,542 or $4,173,618 per player. The previous two statistics describe some of the off-field demographic makeup of the 2004 Red Sox. In additional to being a veteran and well-paid ball club, the Red Sox performed well on the field as well. The team batting average (number of hits divided by number of official at-bats) of the Red Sox was tied for the highest of the 30 Major Leagues teams at 0.282.

In terms of pitching statistics, the Red Sox were in the top third of earned run average (E.R.A. ; the number of earned runs allowed per nine innings of play). Fielding average (number of successful fielding attempts divided by total number of fielding attempts) is the only major statistic where the Red Sox were significantly below the mean, ranking in the bottom quartile. I am interested in analyzing the Major League Baseball data from the 2004 season to determine the factors that best predict success (measured by the number of team wins). I am especially interested in analyzing the relationship between wins and payroll. I am most curious about this relationship because this relationship can be controlled by the ball club's management. On-field performance is less controllable by the team's management because it has a higher 'human performance' element.

Furthermore, I will obtain the linear regression equations for the various variables and detailing the additional amount of wins for the marginal amount of the independent variable. In addition to analyzing the relationship between payroll and wins, I am also interested in analyzing the relationship between other major statistical categories and wins. The other categories I will analyze are team age, team batting average, team earned run average and team fielding percentage. After this analysis, I hope to determine which variables have the highest correlation to winning baseball games for the 2004 season.

Major League Baseball is the only major professional sport that does not have a salary cap (the maximum in total payroll that a team can pay its players). For example, the National Football League has a salary cap for 2004 of about $75 million and the National Basketball Association has a salary cap for 2004 of approximately $44 million. There are multiple reasons for a salary cap but two of the underlying reasons are parity (equality) and competitiveness. It is assumed that without a salary cap, large market teams such as New York, Los Angeles, and Chicago will be able to 'buy up' all the good players leaving the small market cities such as Minneapolis, Cincinnati, and Phoenix with the less-talented left-overs. Additionally, teams that win more games and make the playoffs and World Series receive extra revenue from TV, thereby creating even more of a discrepancy if large market teams have an advantage in winning more games and playing in the post-season.

In 1998, MLB Commission Bud Selig formed a panel to report on the economic conditions within baseball. One of the findings of that panel was that team payrolls have become increasingly disparate; the gap between "rich" and "poor" teams is not only wide, but it is growing. The effect, according to the panel, is a dramatic decline in parity and competitiveness of MLB. In this report, most of the data being analyzed was gathered from the Major League Baseball website (web) or the ESPN website (espn. go. com). The dependent variable for each analysis is a team's total number of victories in the 2004 season (including the playoff and World Series). The independent variables for these regressions are 1. team payroll, 2. average team age, 3. team batting average, 4. team earned run average, and 5. team fielding percentage.

Regression analysis results were obtained using the least squares method in the Data Analysis tool within Microsoft Excel. Regressing team wins and payroll results in a correlation coefficient of 0.56, a coefficient of determination of 0.31 and an estimated simple linear regression equation of: Total Wins = 64 +. 26 (each $1,000,000 of payroll expense) A correlation coefficient of 0.56 shows a moderate, positive relationship between payroll and wins. This moderate, positive correlation can also be visually seen in the scatter plot of the two variables (see attached chart entitled 2004 MLB Total Team Wins and Team Payroll). A coefficient of determination of 0.31 shows the amount of total variance in wins explained by the payroll variance.

As expected, the regression equation has a positive slope, meaning that every additional $1,000,000 spent results in approximate 0.26 wins. When looking at the p-value for payroll, I am testing whether the payroll variable significantly predicts the number wins. The null hypothesis I am using states that payroll is NOT a significant predictor of wins. At a 5% significance level with a p-value of 0.001, I reject the null hypothesis. Therefore I accept the alternative hypothesis, that payroll is a significant predictor of wins.

When regressing team wins and average player age, the results are a correlation coefficient of 0.70, a coefficient of determination of 0.49 and an estimated simple linear regression equation of: Total Wins = -156.7 + 8.6 (each year of average team age) For 2004, there definitely seems to be a strong, positive relationship between the amount of team wins and the average age of the team. Further analysis on this will be provided later in the report. Once again, at a 5% significance level we will end up rejecting the null hypothesis (p-value: 0.000016). When regressing the offensive statistic of team batting average and team wins, the result is once again a moderate, positive relationship.

The results are a correlation coefficient of 0.57, a coefficient of determination of 0.32 and an estimated simple linear regression equation of: Total Wins = -153.6 + 885.7 (each batting average percentage point) Similar to the two previous simple regressions, the null hypothesis that batting average is not a significant predictor of wins is rejected at the 5% significance level being that the p-value is 0.001 Turning to the pitching side of the game, the regression results are a correlation coefficient of -0.62, a coefficient of determination of 0.38 and an estimated simple linear regression equation of: Total Wins = 172.5 - 20.3 (each earned run average percentage point) Since earned run average is a statistic where it is better to have a low number (a lower number means the teams pitching staff is allowing fewer runs per nine innings), it is logical that the correlation is negative since a higher E.R.A. should be negatively related to the number of wins. Once again, this negative relation is graphically displayed in the scatter diagram for the two variables. From analyzing the p-value, it is very clear that earned run average is a significant predictor in the number of team wins. The t-statistic for earned run average is clearing in the reject region with a value of -4.18. Taking defense into account, the regression results are a correlation coefficient of 0.44, a coefficient of determination of 0.19 and an estimated simple linear regression equation of: Total Wins = -2171.1-2292.5 (each fielding average percentage point) That is to say for every one one-hundredth increase of a percentage point of fielding percentage, the team can expect to win about 2 games. Similar to the other four statistics analyzed, the null hypothesis that fielding percentage is not a significant predictor of wins is rejected (p-value: 0.015).

Finally, when using multiple regressions with all the above independent variables against the dependent variable (wins), the results are a correlation coefficient of 0.87, a coefficient of determination of 0.76 and an estimated simple linear regression equation of: Total Wins = -826.9 + (0.07 payroll) + (3.0 age) + (447.9 batting avg.) + (-13.5 ERA) + (765.5 fielding pct.) From the high, positive correlation coefficient, we can conclude that the five independent variables chosen have a high predictive relationship with the number of wins. When evaluating the p-values for the independent variables, we see that three of the variables (payroll, age, and fielding percentage) fall within the null hypothesis acceptance region at the 5% significance level. Therefore, in a multiple regression with the other two variables, these three variables are not significant predictors in the number of wins. In contrast, the p-values for two of the variables (batting average and earned run average) fall within the null hypothesis rejection region at the 5% level. Therefore these two variables are significant predictors of the number of wins. When analyzing the simple linear regressions for each of the independent variables, the ranking by correlation coefficient was: 1. average age, 2. earned run average, 3. batting average, 4. payroll, 5. fielding percentage.

This tells me that for 2004, the older, more experienced teams performed at a higher level. Also, good pitching was a more significant predictor of wins than was good hitting. Payroll had only a moderate positive association with wins. And fielding percentage had the least correlation with predicting wins. When using multiple regression with all the variables, three of the variables turned out not to have a significant association with wins (payroll, age, fielding percentage). The earned run average and batting average statistics had the highest relationship with predicting the number of wins.

Originally, the theory I most wanted to test was to see how strongly payroll was related to wins for the 2004 season. From the data analyzed, it turns out the level of payroll was not a significant variable for predicting the number of wins. Although the Boston Red Sox had a high payroll, I conclude that their success was much more related to their superior hitting and pitching during 2004 than to their high-payroll. Appendix AThe Legend of the Curse In 1918 the Red Sox won their 5th World Series, the most by any club at that time. One of the stars of the Boston championship franchise was a young pitcher by the name of George Herman Ruth, aka The Babe or The Bambino. In 1920, however, Red Sox owner Harry Frazee needed money to finance his girlfriend's play, so he sold Babe Ruth's contract to Colonel Jacob Ruppert's New York Yankees for $100,000 (plus a loan collateralize d by Fenway Park).

Since then, the Yankees, who had never won a World Championship before acquiring Ruth, have gone on to win 26, and are arguably one of the greatest success stories in the history of sport. Meanwhile, the Boston Red Sox have appeared in only four World Series since 1918, losing each one in game seven. Many consider Boston's performance after the departure of Babe Ruth to be attributable to 'The Curse of the Bambino. '.