CrossFit Discussion Board  

Go Back   CrossFit Discussion Board > Community > Community
CrossFit Home Forum Site Rules CrossFit FAQ Search Today's Posts Mark Forums Read

Community Catch all category for CrossFit community discussion.

Reply
 
Thread Tools
Old 03-28-2014, 12:59 PM   #1
Greg Geis
Member Greg Geis is offline
 
Profile:
Join Date: Jan 2014
Location: Houston  TX
Posts: 23
HQ: Who's the Fittest Region?

I think it would be interesting to play around with the statistical distributions comparing each region for 14.1-14.5. I would like to see how the 50th, 90th, 99% compare for each region. It would also be cool to see if there are "stronger" regions (14.3) or "motor" regions (14.5) or if they get pretty even once you get a large enough dataset of Crossfitters.

The analysis is really easy to put together. You could pull out most of the information with a cumulative probability distribution comparing the different regions. Example: http://en.wikipedia.org/wiki/File:Chi-square_cdf.svg (WFS).

You could also pull the P(10) P(50) P(90) P(99) which would compare the percentiles of each region to see if they are consistent, or do they vary from one to another. The Northeast is stacked at the highest level, does that carry for the top 10%, the average?

The biggest hurdle to getting this is the data size. If it was a small data set I could do it in an hour in excel. But 200,000 competitors X 5 workouts = 1,000,000 data points, which is starting to be a fairly large dataset. Also, you can only pull up 50 competitors at a time on the games site, so you'd need to be able to download the data all at once, copy paste isn't going to cut it.

If I had the data, excel should technically be able to handle it (1,000,000 rows available) but I don't know how well it would handle graphing that large of datasets without crashing. It may be able to handle it, may not.

So in summary:
Is there an interst in seeing this kind of analysis?
Does anyone know a better way to get it?
If previous answers are yes and no, is there a way I can get a copy of the data?
  Reply With Quote
Old 03-29-2014, 03:48 PM   #2
Chuck Golden
Member Chuck Golden is offline
 
Chuck Golden's Avatar
 
Profile:
Join Date: Jan 2014
Location: Fort Worth  TX
Posts: 221
Re: HQ: Who's the Fittest Region?

I'd love to see this kind of data, I've wanted to do some similar analysis but pulling the data from the Games site is so tedious. I wish I knew of my faster way
__________________
Worry leaves when faith arrives My BtWB
  Reply With Quote
Old 03-29-2014, 10:46 PM   #3
Christopher Morris
Member Christopher Morris is offline
 
Christopher Morris's Avatar
 
Profile:
Join Date: Jan 2013
Location: Highlands Ranch  CO
Posts: 1,301
Re: HQ: Who's the Fittest Region?

We have an idea of which regions are fittest when considering the winners (e.g. the men in Central East). I take it you want to look at the data considering all athletes in each region, or at least all athletes that submitted scores for every Open workout? You want to know which region has the fittest average.

I would definitely find that analysis interesting.
__________________
Chris
http://www.drchristophermorris.com/ wfs
  Reply With Quote
Old 03-30-2014, 08:16 AM   #4
Greg Geis
Member Greg Geis is offline
 
Profile:
Join Date: Jan 2014
Location: Houston  TX
Posts: 23
Re: HQ: Who's the Fittest Region?

My title was more of an attention grabber than my actual intent. The average is one of the easy primary statistics you might look at, but I think there is more to the story than just the average would tell you. Thats why I want to build and compare the distributions. Some regions may have a higher elite segment. You might see a more ballsy region with clearly a larger "beginner" segment that signed up for the open. I really don't know what you'll see, but I bet you could pull out some interesting trends.

It would also be cool to compare the 11.1 distribution to 14.1. Crossfit has an awesome data set to play with, and Castro has mentioned on several occasions how like they like to gather that kind of data and play around with it. I've just never seen the data analyzed at the next level yet. I am hoping someone at HQ has an interest.
  Reply With Quote
Old 03-30-2014, 11:33 AM   #5
Chuck Golden
Member Chuck Golden is offline
 
Chuck Golden's Avatar
 
Profile:
Join Date: Jan 2014
Location: Fort Worth  TX
Posts: 221
Re: HQ: Who's the Fittest Region?

I agree it would be a very interesting breakdown based on the huge amount of data available. I just have no idea how you'd gather it without a ton of manual copy and pasting. Pulling the top 60 from each region (basically the group that would go to Regionals) wouldn't he too terribly difficult. I've thought about doing that once the leader boards are finalized. That would at least give some indication as to which regions are the hardest to get to Regionals/Games. Having that middle group though, say 100-800 though would be pretty good too
__________________
Worry leaves when faith arrives My BtWB
  Reply With Quote
Old 03-31-2014, 06:29 AM   #6
Greg Geis
Member Greg Geis is offline
 
Profile:
Join Date: Jan 2014
Location: Houston  TX
Posts: 23
Re: HQ: Who's the Fittest Region?

It would take someone with access to the database to create an export of the relevant information and send it to me via a file transfer site. I don't know if HQ views the entire data set data as proprietary or would have issues sending it out. It would be very easy to do for whoever manages their database. Everything is accessible, just not usable as is. Or HQ could do it themselves pretty easily if anyone with access to the database has a basic statistics background and a curiosity to know what to look for.
  Reply With Quote
Old 03-31-2014, 07:14 AM   #7
Mark E. Wallace
Member Mark E. Wallace is offline
 
Mark E. Wallace's Avatar
 
Profile:
Join Date: Apr 2008
Location: Cedar Park  TX
Posts: 3,869
Re: HQ: Who's the Fittest Region?

HQ almost certainly has -- or could easily acquire -- the resources to do this analysis if they care to have it done. Pretty unlikely that they're going to just turn the raw data over to some random dude on the Internet.

- Mark
__________________
"Ima champ Still pushin Strong, Remember You only get what you train FOR>"
Snarky answers -- Free of charge.
  Reply With Quote
Old 03-31-2014, 09:18 AM   #8
Luke Sirakos
Member Luke Sirakos is offline
 
Profile:
Join Date: Sep 2009
Location: Dallas  TX
Posts: 859
Re: HQ: Who's the Fittest Region?

If all you wanted to do was see the fittest region I think a simple median of all those who submitted a score for each workout would be your best bet. In my head, the idea of fittest region would be, if you had a huge dartboard for each region with every participant on it with an equal amount of space, if you threw a dart randomly at each region, which would have the highest probability of winning.

I think the analysis could be far more interesting if you could get additional data on the participants such as age, weight, benchmark lifts/wod times, level of competitive seriousness, etc. That would make for some really interesting data.

What could be more possible is to take the games athletes from prior years, get their basic stats at the time of the games, and build a model to predict what place they would finish. Then take this years pool once it is determined and run it against the model. It could tell you some pretty interesting information such as what attributes are the strongest predictors of success in the games. I think that could be pretty fascinating.
__________________
Deadlift: 475 | Back Squat: 405 | Front Squat: 320
  Reply With Quote
Old 03-31-2014, 08:36 PM   #9
Greg Geis
Member Greg Geis is offline
 
Profile:
Join Date: Jan 2014
Location: Houston  TX
Posts: 23
Re: HQ: Who's the Fittest Region?

Don't get me wrong, I am not holding my breath for a response. I figured it couldn't hurt to ask. At the very least it would get the conversation going and possibly spark an interest in someone who would want to write a journal article somewhat along those lines. The raw data isn't really proprietary or anything, but not many people are going to spend 6 hours or so copying all the data manually.

It's only worth it to do the work if there is a big enough interest to see it. I wanted to throw the idea out there right at the end of the open when the interest would be highest. On the off chance someone in the right position found it intriguing I'd be happy to play with the numbers and throw together some graphics. It would be even better if someone with better access to the data (the entire data set like Luke mentioned) wanted to tinker with it. Anyway, if they don't have enough interest, it should probably die right here and in all likelihood will.
  Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
South West Region Andrew Scheid Competitions 5 05-25-2013 10:25 PM
Lakes Region Jonathan Q Adams Starting 2 09-21-2012 11:02 AM
Toughest Region? Christophir "SMITTY" Smith Competitions 6 05-08-2011 08:37 PM
CF in asian region Mark Garcia Community 1 04-16-2006 05:04 PM
They let me in!!! - Any XFitters in NY (Captial Region)??? Joshua Murphy Community 6 09-15-2004 08:21 AM


All times are GMT -7. The time now is 11:49 PM.


CrossFit is a registered trademark of CrossFit Inc.