Note: I am not a statistician. I am sure other statisticians are able to do much more complex testing, but through my studies, I was able to generate a test that has potential to be feasible. While there are plenty of other approaches of analysis, I am using a test that has direct correlation to my studies. I cannot guarantee that these results will be practical for your specific school, but these were specific for mine. Please comment below if you would like me to conduct a different test, as these tests take a great amount of work. Thank you and enjoy!
Reason for Testing
As I drove up to Raymond James Stadium for the USF-Tulsa game, I saw an empty parking lot. An absolute tragedy for a team that is having such a successful season, with NFL talent on the roster. I got inside the stadium, and saw empty seats all around as I was able to easily make my way to the front of the student section. I was wondering what could be leading to this poor attendance? Could it be it having a game on Thursday or Friday? Good weather or bad weather? An off-campus stadium? My point being, there are a variety of other factors that can lead to poor attendance problems.
Of course, if we all knew what the solution was to building attendance we would get it fixed. As I stated earlier, we could probably go over multiple reasons, beyond the ones listed above, for why attendance numbers are down. So I decided I wanted to figure out, to some degree, why these issues are so prevalent. I designed a statistical test, although not perfect and lacking many of the factors listed, that would try to answer my question.
My test, in the stats world, is called a regression analysis test. Bear with me through the nerdy part here, so I can explain what this stuff is. The test required me to have one dependent variable (Y) and two independent variables (X1 being quantitative and X2 being a qualitative). Seeing as qualitative analysis cannot be measured by numbers, it presents a challenge in analysis. Thus, I made X2 a dummy variable where I assigned the number 1 to one of my qualitative options and the number 0 to the other qualitative option. The goal of this test is to find the best equation possible to figure out if the model would end up being useful in real life. Without further ado, here is what I will base my model on:
Y=Attendance (in thousands) of a USF home football game
X1=Betting Line (in point spread form)
X2= Type of game
- 1 if out-of-conference game
- 0 if in-conference game
While the test lacks other factors, this was the test I decided to do to see if X1, X2, or both are practical predictors of Y. I thought maybe a betting line might influence fans who are teetering on going depending on the likelihood of a win or loss; and I chose the type of game to see if a non-conference game may be more appealing. Also, you will have to trust me on the data meeting all the assumptions needed to conduct the test. It does, and I don’t think anyone wants to read continuous printouts, so take my word on it.
Here is the scatter plot of the data I came up with:
You can see that there is no real pattern to the data. What was interesting to me was seeing zero USF conference games reaching 50,000 people in the stands, even during the Big East days. Even non-conference games like Stony Brook and Florida A&M drew more attendance than some conference games.
To keep the article brief, and get to the actual findings, I’ll just give you the information for my best and most practical model. I will point out the key numbers in the analysis portion.
So far I have only thrown out a scatter plot and a print out of a speadsheet. What does all of this mean? The key numbers to look at in this global F-Test is p-value in the regression line, the adjusted R^2, and the standard deviation.
To truly measure p-value, we must chose an error level or an alpha. For example, with an alpha at .05, I am essentially saying with 95% confidence. I chose the alpha at .05, which is greater than my p-value of .0270. So, there is sufficient evidence to say the model with betting line and type of game to be “statistically” useful for predicting attendance. Thus, this is a meaningful test in the statistics world.
But, after examining the adjusted R^2 and standard deviation, it proved to not be practical. The R^2 adjusted needs to be high and the standard deviation needs to be low. 11.8% is super low, while two times the standard deviation of nine is quite high. Sadly, after putting in all the effort, the test is not practical for actual use. A more complex model or different variables will most likely produce better results.
So after having a failed test, what can we takeaway? The biggest, most obvious, takeaway is the fact that a betting line and the type of game has no practical use when they are in the same model. We would have to go back to the drawing board to see if there are other ways to statistically measure attendance data. This doesn’t mean that these factors couldn’t work for a different test.
Maybe both factors, in a more complex test, may still work. They just don’t work together in this particular test. Honestly though, there are so many factors that go into attendance that we may never be able to predict attendance properly.
Overall, I would not consider the test a waste. In fact, it was very fun to see if what I was testing would be meaningful in regards to attendance. If you made it this far, through all the nerdiness, I thank you. I hope you enjoyed. I would love to hear your comments below. If you have any questions about the test or would like to see other printouts let me know!
What is the biggest factor of attendance in your opinion?
This poll is closed
Type of game
Skill level of team