I am developing a simple recommendation system and trying to do some computation like SVD, RBM, etc.
To be more convincing, I am going to use the Movielens or Netflix dataset to evaluate the performance of the system. However, the two datasets both have more than 1 million of users and more than 10 thousand of items, it's impossible to put all the data into memory. I have to use some specific modules to handle such a large matrix.
I know there are some tools in SciPy can handle this, and divisi2 used by python-recsys also seems like a good choice. Or maybe there are some better tools I don't know?
Which module should I use? Any suggestion?
scipy.sparseis the standard implementation, used by many third-party libraries. I don't know about divisi2 to compare features, though.