java - Spark Streaming Dataset<Row> collectAsList() taking too much time -


i've done spark streaming (2.1.1) driver in java compute logs , update stats in mysql table, inserting or updating existing rows rows comming in streaming batch. great if use dataset write() method cant because there 2 ways: - savemode.append: new rows - savemode.overwrite: not valid beacuse deletes whole table rows comming in batch save

(i wish there "savemode.update" ...)

i've managed situation processing dataset fetching rows , inserting/updating manually 1 one: * dataset datatowrite list rows = datatowrite.collectaslist(); (row row : rows) { .... insert/update table }*

and works fine, problem i'm facing "collectaslist()" taking 2s each rdd execution processed, i've seen in spark console (sql), supposed due time in collect data (executed in master), strange beacuse happens either if have 1 master , 2 workers in same machine or in different. facing problem? have thoughts avoid this? in advance,


Comments

Popular posts from this blog

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

vue.js - Create hooks for automated testing -

Add new key value to json node in java -