Mid-term Project
Analysis technique: ANOVA to analyze if there is a difference in means among 3 groups
Dataset: mlb_players_18.csv
Variable Description name Name of baseball player team Team of baseball player position Position of baseball player. This variable will be used to create the 3 groups games Games played (can be used for sample size) AB Number of Times player Batted (can be used for sample size)
R Runs H Hits doubles Doubles triples Triples HR Home Runs RBI Runs Batted In
walks Walks strike_outs Strike Outs stolen_bases Stolen Bases caught_stealing_base Caught Stealing a Base AVG Average (Hits divided by At Bats)
OBP On Base Percentage. This variable will be used in the analysis to describe batter performance.
SLG Slugging percentage
OPS On Base plus Slugging percentage
Scenario: You are an analyst for a baseball organization and the team management wants to know if there are real differences between the batting performance of baseball players according to these 3 positions: outfielder, infielder, and catcher (C). For this study, batting performance is equivalent to the variable OBP in the dataset.
Variable of Interest: Batting Performance, represented by on base percentage shown in variable OBP
Groups: Using the position variable, create a new variable with 3 groups according to the following definitions:
Outfield: If position variable = any one of the following: CF, LF, RF
Infield: If position variable = any one of the following: 1B, 2B, 3B, SS
Catcher: If position variable = C
Note that any other position values can be ignored for this analysis
Write a report to answer the management’s question. The report need not be long (no more than 2 pages) but should at least answer the following questions. Submit your report in Microsoft Word or another word processing format.
1. Lead Statement: Write just one sentence, as if it is a “lead sentence” in a news story, which succinctly answers the managements question. Sometimes this is referred to as a BLUF statement, which stands for “bottom line up front” and can be thought of as a conclusion that is presented first. (10 points)
2. Problem Statement: Make sure to state the problem you were asked to solve. (10 points) 3. Assumptions: Clearly define any assumptions you made. For instance, did you remove any
outliers in your data due to small sample size (sample size can be determined by either the games or AB [at bats] variables). (10 points)
4. Key charts: Create at least one chart for your dataset that should accompany any insights you have discovered for this dataset. Make sure to give your chart and clear and descriptive Title. (Possible Idea: Box and Whisker Plot of OBP by each of the 3 groups). (10 points)
5. Analysis Technique: ANOVA results should include the null hypothesis under consideration, the F-statistic, p-value, degrees of freedom, and an evaluation of the null hypothesis based on the p-value. (50 points)
6. Your Name: Make sure to include your name and contact information in case management has further questions. (10 points)