hadoop - How can I add external python libraries into HDFS? -
is there way, how add external libraries this one hdfs? seems pyspark needs external libs have them in shared folder on hdfs. byt since using shellscript, runs pyspark script external libraries, fails importing them.
see post here importerror.
you can add external lib --py-files
option. can provide either .py file or .zip.
for exemple, using spark submit :
spark-submit --master yarn --py-files ./hdfs.zip myjob.py
check corresponding documentation : submitting applications
Comments
Post a Comment