hadoop - spark read data from hbase, did workers need to get paritions data from remote driver program? -


spark read data hbase,such //create rdd

val hbaserdd = sc.newapihadooprdd(conf, classof[tableinputformat],    classof[org.apache.hadoop.hbase.io.immutablebyteswritable],   classof[org.apache.hadoop.hbase.client.result]) 

for example, hbaserdd has 5 partitions, executor on worker partition data compute, must data remote driver program? (not read hdfs, each worker hadoop slave has hdfs file replication)

spark integrated hbase , data locality principles same in hadoop map-reduce jobs: spark try assign input partition (hbase region) worker on same physical machine, data fetched directly without remote driver.


Comments

Popular posts from this blog

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

vue.js - Create hooks for automated testing -

Add new key value to json node in java -