Big data and MLB

35 %
65 %
Information about Big data and MLB

Published on December 2, 2016

Author: ChrisRamirez15



2. How is Baseball using Big Data Analytics? Sabermetrics=“Moneyball” is essentially the term (stemming from the book written by Michael Lewis) for using advanced statistics to determine how to build a team, what strategies to use, and more. “Our ability to know what’s going to happen, when it’s going to happen, how much cash we’re going to generate on the revenue side, allows us to plan accordingly. That’s a tremendous value proposition to ownership and executives.”

3. What is there to analyze?  162 games per team per season (not including pre- season, playoffs or minor league games) 162*30 mlb teams=4860 regular season games.  There are 85 common player stats measured (BA, OBP, HR… ERA, W, L…) and then each of those stats is broken down against teams, lefty vs righty, day vs night, home vs away….  All of these stats are kept in data banks for every single pitch thrown, …700,000+ pitches in 2014... Also keep data on fans, food, drinks, promotions…  First 135 years of Baseball are a combined 2GB of data Today each game has almost 1TB of data collected. That’s a 10 Million fold increase in data collected. And its predicted that upwards of 7TB per game will be collected.

4. What do you do with the data?  Win at an unfair game!  2002 Oakland A’s  103-59  2014-15 Astros  Shifts  Predict injuries  3D Snapshots 10-15 gigs of data  $1.4 Billion in knee injuries for MLB in 2014 

5. Question??  There are 4 types of analytics that companies can use to aid their business: 1. Prescriptive- Takes data and reveals what actions should be taken 2. Predictive- Takes data and gives an analysis of likely scenarios of what might happen 3. Diagnostic- A look at past performance to determine what happened and why 4. Descriptive- What is happening now based on incoming data  MLB uses all 4. What type of analytics does your company use that helps it gain a competitive advantage?

6. Tools  Statcast by MLBAM  Uses Amazon Web Services  Captures on field data  Quickly analyzes and codes  Pitch Rx  Tracks every pitch  Uses Camera Triangulation  Field Fx  Records all field plays using camera feeds and object-recognition software  BaseRuns Estimator  Estimates the number of runs a team should score given their offensive statistics and the number of runs a hitter or pitcher creates or allows Source: Whitman School of Business, Syracuse University Data from the Player Tracking System (Statcast) overlaid on video of the Panik-Hosmer play. The red section on the right shows that if Hosmer had maintained his speed instead of diving to the bag, he would have been safe by about a foot.

7. Tools in use Cloud EC2 Compute Pwr behind solution Amazon S3 Storage 7Tb per Game Amazon Elastic Cache Temp Memory, Fast Retrieval AWS Lambda  Used for Raw Data Manipulation to Create “On the Fly” Metrics  Creates More Insight in to Plays Amazon DynamoDB  Allows for powerful queries.  Supports fast retrieval of information Dedicated Connection

8. Discussion Question Given the emergence of complex technological tools, how can companies with smaller budgets stay competitive with companies that have deep pockets?

9. 3 Key Differences in Data • Volume- More data across the internet every second vs what was stored on the entire internet 20 years ago • Velocity-Real time data i.e. cell phone location data • Variety-Large amounts of data being created on every topic of business

10. Data Types Structured Data  G = games • Number of games a player participated in (out of 162 games in a season)  AB = at bats • Number of times a batter was hitting and either got a hit or got out (does not include walks or reaching base on an error)  R = runs • Number of runs the player scored  H = hit • Number of times a player hit the ball or got on base or hit a home run (sum of 1B, 2B, 3B, HR) Unstructured Data  Social Media updates tied to baseball games/players  Video  Photos  Open ended surveys

11. What are some examples of how Unstructured Data is used in your company? Online Reviews Facebook “Likes”

12. /infographic-major-league-baseballs-top- social-media-performers.html Structured data and unstructured data can be combined to gain insights into new categories.

13. Big Data Acquisition:  Data harvest from meticulous record-keeping  (on-base percentage, batting average, slugging/fastball percentages, RBIs, stolen bases, etc…)  Employ analytics experts: utilize their skillset to build team, field, and manage players  Expand use: ticketing, promotions, fan-team relationships, concessions and products  Milwaukee Brewers analyze each email received by teams to better understand fans  Analyze who the occasional attendees are and how to get them to buy tickets more often  Boston Red Sox developing concessions heat-map (geo-locating proximity fans to hotdog stands)  Tracks type, quantity, frequency, and locations of concession purchases  2014 App “IdealSeat” allows fans to choose seats based upon likelihood of catching foul balls  Adjust and re-target focus of data sets (player field positions, t-shirt prices) as needed  Q: What other venues or industries could benefit by a similar depth of big-data analysis?

14. Big Data Governance: Organization  Effective governance is equal parts: organization and security  Historic Organization (Waterfall Model): Garbage-In / Garbage- Out  Integration of data as it arrives into repository for use  Indiscriminate harvest; lack of profiling/prioritizing data lengthens time to organize/use  Without organizing, data mismatches can damage customer relations  (i.e. coupons for women’s shoes sent to male customers)  Understanding data before to employment is key  Effective Governance: beyond scrubbing and deletion, focus on ensuring accuracy  Identify custodians (who's accountable for data consistency, accuracy, and archival)  Develop criteria policies (standards and procedures for use, purpose, and by who?)  Enact policy controls and audit (enforcement of policies and accountability for custodians)

15. Big Data Governance: Security  Security Issues:  Financial and Reputational  Too much data with too many vulnerabilities can be catastrophic  2015 Breach at Office of Personnel Management:  Personal Records, PII (names, addresses, etc…), Security Clearance details of 21M citizens  5.6M sets of fingerprints stolen  2014 Breach at Home Depot:  46M credit cards hacked  Big Data poses Big Risks:  Big gains can be realized IF security risks are properly mitigated AND the data harvest is properly organized

16. Conclusion  Big data is utilized to make teams better and organizations more profitable  4 Types of Analytics 1. Prescriptive 2. Predictive 3. Diagnostic 4. Descriptive  Many tools available to analyze data  Statcast by MLBAM, Pitch Rx, Field Fx  Data Types  Structured & Unstructured  Effective Governance can ensure accuracy

17. Questions

Add a comment