2

I have trained scikit learn model and now I want to use in my python code. Is there a way I can re-use the same model instance? In a simple way, I can load the model again whenever I need it, but as my needs are more frequent I want to load the model once and reuse it again.

Is there a way I can achieve this in python?

Here is the code for one thread in prediction.py:

clf = joblib.load('trainedsgdhuberclassifier.pkl')
clf.predict(userid)

Now for another user I don't want to initiate prediction.py again and spend time in loading the model. Is there a way, I can simply write.

new_recommendations = prediction(userid)

Is it multiprocessing that I should be using here? I am not sure !!

2 Answers 2

7

As per the Scikit-learn documentation the following code may help you:

from sklearn import svm
from sklearn import datasets
clf = svm.SVC()
iris = datasets.load_iris()
X, y = iris.data, iris.target
clf.fit(X, y)  
import pickle
s = pickle.dumps(clf)
clf2 = pickle.loads(s)
clf2.predict(X[0])

In the specific case of the scikit, it may be more interesting to use joblib’s replacement of pickle (joblib.dump & joblib.load), which is more efficient on objects that carry large numpy arrays internally as is often the case for fitted scikit-learn estimators, but can only pickle to the disk and not to a string:

from sklearn.externals import joblib
joblib.dump(clf, 'filename.pkl') 

Later you can load back the pickled model (possibly in another Python process) with:

clf = joblib.load('filename.pkl') 

Once you have loaded your model again. You can re-use it without retraining it.

clf.predict(X[0])

Source: http://scikit-learn.org/stable/modules/model_persistence.html

Sign up to request clarification or add additional context in comments.

7 Comments

Thanks for your answer. I am aware about this joblib.load. What I want is to resuse clf from joblib.load('filename.pkl') again. How can I do that?I don't wanna load it for multiple users as it takes time !!
Ok, If I understand you correctly then you could just call the clf.predict() or clf.transform() method depending on the type of estimator you used or depending on what you want to achieve. You don't have to fit the model again... Difficult to help if you don't provide any code examples.
@ashu, please provide the code so that @Oq01 can help you out. pickle is the right way to go.
@alvas just did that in the question !
@ashu, thanks but it's confusing. You don't seem to be using scikit-learn. That looks like graphlab, so I am not really sure if the same methods will apply.
|
0

First, you should check how much of a bottleneck this is and if it is really worth avoiding the IO. An SGDClassifier is usually quite small. You can easily reuse the model, but the question is not really about how to reuse the model I would say, but how to get the new user instances to the classifier.

I would imagine userid is a feature vector, not an ID, right?

To make the model do prediction on new data, you need some kind of event based processing that calls the model when a new input arrives. I am by far no expert here but I think one simple solution might be using an http interface and use a light-weight server like flask.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.