java - Spark fails to serialize on RDD.count(); -


so relatively new spark, guess understanding bit more everyday. have methods running, not. trying rdd.count() gives me error of :

exception in thread "main" `org.apache.spark.sparkexception`: job aborted due stage failure: failed serialize task 5, not attempting retry it.  exception during serialization: `java.io.notserializableexception`:  serialization stack: - object not serializable  element of array (index: 0) - array (class [ljava.lang.object;, size 224) - field (class: scala.collection.mutable.wrappedarray$ofref, name: array, type: class [ljava.lang.object;) - object (class scala.collection.mutable.wrappedarray$ofref, wrappedarray - writeobject data (class: org.apache.spark.rdd.parallelcollectionpartition) - object (class org.apache.spark.rdd.parallelcollectionpartition, org.apache.spark.rdd.parallelcollectionpartition@82b) - field (class: org.apache.spark.scheduler.resulttask, name: partition, type: interface org.apache.spark.partition) - object (class org.apache.spark.scheduler.resulttask, resulttask(4, 0)) 

the object in question made of strings, booleans, , other classes transient sets. have tried implementing serializable classes error throws, becomes way recursive, , feels there no end implementing serializable everything, , feel there more efficient way of doing this. here code:

javasparkcontext sc = new javasparkcontext(spark.sparkcontext());     list<objectinquestion> rowsindba = new arraylist<objectinquestion>();     try     {         dba.readallrows(objectinquestion.class, new dbentitylistener<objectinquestion>(){              @override             public void entityreceived(objectinquestion entity)             {                 rowsindba.add(entity);              }         });     }     catch (exception e)     {         // todo auto-generated catch block         e.printstacktrace();     }     javardd<objectinquestion> schemardd = sc.parallelize(rowsindba, 2);     long count = schemardd.count();     return  (int) count; 

after debugging , looking @ values method reaches sc.parallelize rdd contains entities given rowsindba. removed .count() before, , there no issues running. still looking myself , trying find answers, feel may know error caused way before come solution. thanks!


Comments

Popular posts from this blog

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

vue.js - Create hooks for automated testing -

Add new key value to json node in java -