Machine learning for Project Cognoma
This repository hosts machine learning code and discussion (see Issues) for Project Cognoma.
NOTE: This repository is no longer up-to-date with the web application
The production notebook that is served to website users can be found in the ml-workers repository. This repository will be used for continued data exploration and new modeling approaches.
Notebooks
The following notebooks implement the primary machine learning workflow for Cognoma:
1.download.ipynb: downloads the cancer datasets.2.mutation-classifier.ipynb: builds a classifier for mutation in a given gene.3.pathway-classifier.ipynb: builds a classifier for mutation in any gene for a given pathway.
If you've modified a notebook and are submitting a pull request, then export the notebooks to scripts:
jupyter nbconvert --to=script --FilesWriter.build_directory=scripts *.ipynbEnvironment
This repository uses conda to manage its environment and install packages.
If you don't have conda installed on your system, you can download it here.
You can install the Python 2 or 3 version of Miniconda (or Anaconda), which determines the Python version of your root environment.
Since we create a dedicated environment for this project, named cognoma-machine-learning whose explicit dependencies are specified in environment.yml, the version of your root environment will not be relevant.
With conda, you can create the cognoma-machine-learning environment by running the following from the root directory of this repository:
# Create or overwrite the cognoma-machine-learning conda environment
conda env create --file environment.ymlIf environment.yml has changed since you created the environment, run the following update command:
conda env update --file environment.ymlActivate the environment by running source activate cognoma-machine-learning on Linux or OS X and activate cognoma-machine-learning on Windows.
Once this environment is active in a terminal, run jupyter notebook to start a notebook server.

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.
