python - Pickling and Unpickling in different modules -
i know has been covered number of other questions (unable load files using pickle , multipile modules) can't see how solutions apply situation.
this project structure (as minimal possible):
classify-updater/ ├── main.py └── updater ├── __init__.py └── updater.py classify └── main.py
in classify-updater/main.py
:
import sys sklearn.feature_extraction.text import countvectorizer updater.updater import updater def main(argv): vectorizer = countvectorizer(stop_words='english') updater = updater(vectorizer) updater.update() if __name__ == "__main__": main(sys.argv)
in classify-updater/updater/updater.py
:
import dill class updater: def __init__(vectorizer): vectorizer.preprocessor = lambda doc: doc.text.encode('ascii', 'ignore') self.vectorizer = vectorizer def update(self): pickled_vectorizer = dill.dumps(self.vectorizer) # save google cloud storage
in classify/main.py
import dill import sys def main(argv): # load google cloud storage vectorizer = dill.loads(vectorizer_blob) if __name__ == "__main__": main(sys.argv)
this results in importerror
.
traceback (most recent call last): file "classify.py", line 102, in <module> app.main(sys.argv) file "classify.py", line 50, in main vectorizer = self.fetch_vectorizer() file "classify.py", line 86, in fetch_vectorizer vectorizer = dill.loads(vectorizer_blob.download_as_string()) file "/usr/local/lib/python2.7/site-packages/dill/dill.py", line 299, in loads return load(file) file "/usr/local/lib/python2.7/site-packages/dill/dill.py", line 288, in load obj = pik.load() file "/usr/local/cellar/python/2.7.13_1/frameworks/python.framework/versions/2.7/lib/python2.7/pickle.py", line 864, in load dispatch[key](self) file "/usr/local/cellar/python/2.7.13_1/frameworks/python.framework/versions/2.7/lib/python2.7/pickle.py", line 1096, in load_global klass = self.find_class(module, name) file "/usr/local/lib/python2.7/site-packages/dill/dill.py", line 445, in find_class return stockunpickler.find_class(self, module, name) file "/usr/local/cellar/python/2.7.13_1/frameworks/python.framework/versions/2.7/lib/python2.7/pickle.py", line 1130, in find_class __import__(module) importerror: no module named updater.updater
it has been explained elsewhere pickle needs class definition load object, can't see reference updater module comes i'm pickling instance of vectorizer.
i've simplified example heavily. 2 packages sit quite far apart in terms of our codebase. importing 1 module other might not feasible. there way work around this?
the issue here lambda (anonymous function).
it possible pickle self-contained object vectorizer. however, preprocessing function used in example scoped updater class updater class required unpickle.
rather having preprocessor function, preprocess data , pass in fit vectorizer. remove need updater class when unpickling.
Comments
Post a Comment