scala - Spark job runs longer locally after subsequent runs - tuning spark job -
i have spark job runs in 5 mins on first initial runs , takes several minutes ..more 20-30 on subsequent runs. i'm reading parquet file once , creating dataframe , writing in .json format. have not used cache(), persist() or unpersist() anywhere in code. local instance. issue ?
configuration parameters
val spark = sparksession .builder() .appname("example") .config("spark.sql.warehouse.dir", warehouselocation) .config("spark.master", "local") .config("spark.serializer", "org.apache.spark.serializer.kryoserializer") .getorcreate() //set new runtime options spark.conf.set("spark.sql.shuffle.partitions", 14) spark.conf.set("spark.executor.memory", "6g") spark.conf.set("spark.driver.host", "localhost") spark.conf.set("spark.cores.max", "8") spark.conf.set("spark.eventlog.enabled", true) spark.sparkcontext.setcheckpointdir("somedirectorypath") spark.sparkcontext.setloglevel("warn")
Comments
Post a Comment