hadoop - How can I add external python libraries into HDFS? -


is there way, how add external libraries this one hdfs? seems pyspark needs external libs have them in shared folder on hdfs. byt since using shellscript, runs pyspark script external libraries, fails importing them.

see post here importerror.

you can add external lib --py-files option. can provide either .py file or .zip.

for exemple, using spark submit :

spark-submit --master yarn --py-files ./hdfs.zip myjob.py 

check corresponding documentation : submitting applications


Comments

Popular posts from this blog

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

vue.js - Create hooks for automated testing -

.htaccess - ERR_TOO_MANY_REDIRECTS htaccess -