Ronit Ray


Getting Your Feet (and mine) Wet in the World of Sports Analytics

2021-11-05 • 7-minute read

Personal note: I love football. I didn't get into the game very early in my life, but since age 12 or so, I have been absolutely obsessed. Earlier this year, I got a subscription to The Athletic and was exposed to some of the best football coverage I've ever read. While I cannot afford the subscription anymore, I still have access to a lot of their content through podcasts and partner YouTube channels like the excellent Tifo Football and Tifo IRL. Tom Worville, their data analytics writer, has consistently put out superb work that is informative and interesting. More recently, he has featured in a series of videos by Tifo, comparing players in Europe's top leagues across several dimensions using scattergrams.

Very recently, Tom has signed as a Data Scientist for RB Leipzig, one of the hottest projects in Europe (Congratulations, Tom!). I think it's fair to say his work has impressed me and maybe even inspired me to dip my toes in the field. Thankfully, Worville himself and a number of awesome people on Twitter have compiled more resources than I can get through in quite some time. This post is an attempt to document the same. 99% of the effort is from the people who actually created the materials and compiled them. All I have done is removed Twitter's stupid tracking links and cleaned up the formatting a little so it's actually readable. I will hopefully be visiting some of these materials in the future and writing about what I learn from them.


Thread by Tom Worville

To start, here’s a great thread of intro blogs for working with and plotting @StatsBomb data in R by @biscuitchaser

  1. Plotting passes using and getting started in R
  2. Adding further information to pass plots
  3. Comparing WSL creators
  4. Shots maps for WSL
  5. Using Wyscout data in R

Piece by @GregorydSam: Getting into Sports Analytics 2.0

Some good advice from @smarterscout in this podcast ep about getting into the industry, along with the much needed about the relatively few jobs on offer

Another plug for @rweeklyorg, this week is by the great @Rby_Ryo Including - Creating an xG model with R @thecomeonman's arsenal of code examples

This is a gentle how-to guide for those who are approaching R from being excel users first and foremost.

This is a great #Rstats guide to making the most of recent changes to the {tidyverse} Big one for me is relocating columns in a dataframe, which dplyr::relocate() can help with now. Thanks @dr_keithmcnulty!

The great @thomas_mock has been on a TEAR recently with some nice blogs on building better tables using {gt}, but more relevant is this well thought out piece on heatmap design.

Shout out to @icymi_r which is a great #rstats content aggregator.

Not posted here for a while, but @JanVanHaaren's excellent summary of football analytics in 2020 is well worth a read:

Feel it’s quite common that people have “coding scrapbooks” of functions/tips they return to over and over again. Sam’s here is neat

Had a couple of DMs asking about coding tutorials. Have to say that @mckayjohns is the GOAT, and a great place to get started

Also@Soccermatics' Friends of Tracking channel has tons too.

Just stumbled across this Github repo of links and various resources by @eddwebster, unbelievable effort to keep on top of all of this.

Great work by @PfaffCatherine pulling a list together of female identifying sports analysts, hopefully serves as inspiration to others + good to spotlight those succeeding in such a male-dominated industry.

Thread by @tylermorganwall

Reminder: there are hundreds of great, FREE learning resources for #rstats out there. There's no need to sign up to take courses with a disgusting, ethically bankrupt company with sniveling, feckless leadership.

I'm completely self-taught in R. Here's a list of the FREE, OPEN materials I've used on my journey: For data wrangling and visualization, nothing beats Hadley's "R for Data Science"

Want to learn about data visualization? Check out @ClausWilke's "Fundamentals of Data Visualization". While not a book about R specifically, it's a great resource for learning what makes a good, interpretable viz. Plus, the book's code is available!

If you want to learn about ggplot2 plotting in particular, it's best to get it straight from the source: read Hadley's ggplot2: Elegant Graphics for Data Analysis. I can't imagine why you'd want to learn it from anyone else.

For diving into the internals of R, Hadley's "Advanced R" is a wonderful resource. I've gone back to it several times over the years and found new nuggets of info. Confused about environments or closures, or how to debug C code or NSE? Check it out. The second edition of "Advanced R" is available at a slightly different URL

Want to learn about maps, GIS, and spatial analysis and visualization? Check out @robinlovelace & @jakub_nowosad's "Geocomputation with R", a great free resource on geographic data analysis, visualization and modeling.

Want to learn how to spin up your own data analysis cluster and do work on "big data" in R? Check out @javierluraschi's "Mastering Spark with R." He even shows you how to fire up a rayrender render farm for complex 3D visualizations!

If you like the "in person" learning experience video provides and you're interested in learning more about making maps in R with rayshader, check out my open and free masterclass I taught with the @PennMUSA program:

Finally, the R community is the best—if you ever run into seemingly intractable problems on your journey, tweet your issue with the #rstats hashtag & almost certainly you'll get help. And if that fails, try the community forums hosted by RStudio

I'd add Kieran Healy's Data Visualization, available in a free online format as well as a textbook.

There is an awesome Slack group to help work through the book (and help on all other R questions as well). People are really helpful and friendly!

Learn about solve text mining problems with "Text Mining with R"! Great introduction to the tidytext package and working with text data by @juliasilge & @drob

Another resource - I like to tell complete beginners to check out Swirl.

Tweet by Lucy D’Agostino McGowan

Have you checked out@dcossyle's teacup giraffes and stats? I think it is definitely high schooler friendly! https://tinystats.github.io/teacups-giraffes-and-statistics/

Tweet by Dr. Erin Buchanan

Thread by Amelia McNamara

I’d love to gather open source interactive R lessons. I’ll start a thread here, but please chime in! rstats #tidyverse

There are introductory primers on @rstudio.

Noam Ross's GAM Course

@tladeras and @datapointier’s intro to the tidyverse R bootcamp

Ted’s R package to convert DC courses

Grant McDermott courses:

Tweet by Rebecca Barter

Tidymodels: tidy machine learning in R. A concise introduction to machine learning in R using the new(ish) tidymodels pipeline (essentially caret's successor). Thanks to@topepos and team!


back to Home