Learning R plot with NFL Data wins vs. yards allowed

0

R plot data doesn’t need to be boring. Being an NFL football fan has made it a lot more interesting to create lessons on learning R. There is so much NFL data that you could test endless theories on cause and effect. For this lesson we will look at both the base R plot.

The basic R plot is truly a blank slate. R plot is essentially text, lines, points and axis. Creating a simple R plot as your starting point is often helpful because you may not have a fully-formed idea of how we want to look at some data and just want to start testing some theories. Then over time you start to tweak and then interesting correlations evolve.

For this R plot lesson we are looking at the correlation between the amount of yards a defense gives up vs. the amount of wins for their team. Our hypothesis is that the more yards a defense allows the more likely that team is to lose.

First you need some NFL defensive data. We took total defensive data and added wins and loses. Then we imported that CSV.


> nflDefenseData <- read.csv("nflTeamDfense.csv", header=TRUE) # make sure your csv is in the folder you are using in R
> with(nflDefenseData, {
+ plot(Yds,Wins)
+ lines(loess.smooth(Yds,Wins)
+ )})

What is happening here? First, “with” is calling our data which is nflDefenseData. The with function applies an expression to a dataset — such as with(data, expression). In our case above we are adding the expression plot and lines.

Second, we are creating a plot and within that plot creating Yds for the x axis and Wins for the y axis. Those are the two attributes of an NFL team’s defense we want to look at for a correlation. It could have been TDs or Ints instead. We just chose this for the example to start our initial guesswork on our hypothesis.

Finally, we add add a linear regression line or a smoother to it to highlight the trends.

And this is what it looks like:

r plot nfl

What do you think? Did this prove anything about yards vs. wins? What other tests would you run based on the data?

LEAVE A REPLY