hadoop - spark read data from hbase, did workers need to get paritions data from remote driver program? -

September 15, 2010

spark read data hbase,such //create rdd

val hbaserdd = sc.newapihadooprdd(conf, classof[tableinputformat],    classof[org.apache.hadoop.hbase.io.immutablebyteswritable],   classof[org.apache.hadoop.hbase.client.result])

for example, hbaserdd has 5 partitions, executor on worker partition data compute, must data remote driver program? (not read hdfs, each worker hadoop slave has hdfs file replication)

spark integrated hbase , data locality principles same in hadoop map-reduce jobs: spark try assign input partition (hbase region) worker on same physical machine, data fetched directly without remote driver.

Search This Blog

Insert

hadoop - spark read data from hbase, did workers need to get paritions data from remote driver program? -

Comments

Post a Comment

Popular posts from this blog

vue.js - Create hooks for automated testing -

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

serial port - hub4com OVERRUN Error -