2

Backstory* I have recently switched from using excel to produce models of predicting the chance of being diagnosed with a particular cancer. The model was produced in an excel file and grew in both size and complexity, I made use of excels solver platform to iterate through simulations, the file achieved a size 500mb+, essentially I was starting to cross over into the realm of 'big data'.*

My question to the stack overflow community is, what is the best methodology for continuing this research. My hunch is that storing the data in a database and calling each parameter for individual analysis is a possibility. My old excel methodology used non linear regressions of each parameter (from historic data) Enabling the calculation of a percentage chance of acquiring said cancer (specific to that individual parameter), the algorithm used then weighted each parameter to achieve a final score from which I would perform a logistic regression in order to calculate the chance of a persons achieving said cancer.

Any suggestions, comments, pointers and constructive criticisms would be greatly appreciated, I have recently made the switch from excel to python to continue in this work, Kind regards AEA

6
  • 1
    First off, "big data" is really only relative to the tools available. That said, although 500MB might be "big data" for excel, in general people don't say you're approaching "big data" until you're at least reaching the limits of your computer's memory capactiy, which on conventional hardware is already about 4-8 GB. That said, there's a data anlytic toolkit for python you should look into: pandas.pydata.org. Be sure to check the sidebar Commented Jun 11, 2013 at 1:54
  • @DavidMarx Yep, the answer is pandas. Many people have 64/94Gb ram setups, imo that could be called "big". :) Commented Jun 11, 2013 at 2:02
  • 1
    500mb was the point at which i had to stop, I actually have about 12 times this amount (of course the size of the raw data wont be the same as what the size of the excel file.) But I can tell it has outgrown excel modeling, I was recommended python by someone studying computer science. I will check out pandas.pydata.org many thanks. Commented Jun 11, 2013 at 2:02
  • Apologies for my embarrassing miss interpretation that I was nearing big data (no sarcasm, I really am embarrassed) Commented Jun 11, 2013 at 2:06
  • For using pandas, do you make calls from a database? Or do you analyse data on the fly? Commented Jun 11, 2013 at 2:35

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.