View Single Post
Old 03-28-2014, 12:59 PM   #1
Greg Geis
Member Greg Geis is offline
 
Profile:
Join Date: Jan 2014
Location: Houston  TX
Posts: 23
HQ: Who's the Fittest Region?

I think it would be interesting to play around with the statistical distributions comparing each region for 14.1-14.5. I would like to see how the 50th, 90th, 99% compare for each region. It would also be cool to see if there are "stronger" regions (14.3) or "motor" regions (14.5) or if they get pretty even once you get a large enough dataset of Crossfitters.

The analysis is really easy to put together. You could pull out most of the information with a cumulative probability distribution comparing the different regions. Example: http://en.wikipedia.org/wiki/File:Chi-square_cdf.svg (WFS).

You could also pull the P(10) P(50) P(90) P(99) which would compare the percentiles of each region to see if they are consistent, or do they vary from one to another. The Northeast is stacked at the highest level, does that carry for the top 10%, the average?

The biggest hurdle to getting this is the data size. If it was a small data set I could do it in an hour in excel. But 200,000 competitors X 5 workouts = 1,000,000 data points, which is starting to be a fairly large dataset. Also, you can only pull up 50 competitors at a time on the games site, so you'd need to be able to download the data all at once, copy paste isn't going to cut it.

If I had the data, excel should technically be able to handle it (1,000,000 rows available) but I don't know how well it would handle graphing that large of datasets without crashing. It may be able to handle it, may not.

So in summary:
Is there an interst in seeing this kind of analysis?
Does anyone know a better way to get it?
If previous answers are yes and no, is there a way I can get a copy of the data?
  Reply With Quote