scala - Spark job runs longer locally after subsequent runs

scala - Spark job runs longer locally after subsequent runs - tuning spark job -

August 15, 2011

i have spark job runs in 5 mins on first initial runs , takes several minutes ..more 20-30 on subsequent runs. i'm reading parquet file once , creating dataframe , writing in .json format. have not used cache(), persist() or unpersist() anywhere in code. local instance. issue ?

configuration parameters

   val spark = sparksession   .builder()   .appname("example")   .config("spark.sql.warehouse.dir", warehouselocation)   .config("spark.master", "local")   .config("spark.serializer",    "org.apache.spark.serializer.kryoserializer")   .getorcreate()     //set new runtime options    spark.conf.set("spark.sql.shuffle.partitions", 14)    spark.conf.set("spark.executor.memory", "6g")    spark.conf.set("spark.driver.host", "localhost")    spark.conf.set("spark.cores.max", "8")    spark.conf.set("spark.eventlog.enabled", true)     spark.sparkcontext.setcheckpointdir("somedirectorypath")    spark.sparkcontext.setloglevel("warn")

Search This Blog

Insert

scala - Spark job runs longer locally after subsequent runs - tuning spark job -

Comments

Post a Comment

Popular posts from this blog

vue.js - Create hooks for automated testing -

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

serial port - hub4com OVERRUN Error -