hadoop - spark read data from hbase, did workers need to get paritions data from remote driver program? -
spark read data hbase,such //create rdd
val hbaserdd = sc.newapihadooprdd(conf, classof[tableinputformat], classof[org.apache.hadoop.hbase.io.immutablebyteswritable], classof[org.apache.hadoop.hbase.client.result])
for example, hbaserdd has 5 partitions, executor on worker partition data compute, must data remote driver program? (not read hdfs, each worker hadoop slave has hdfs file replication)
spark integrated hbase , data locality principles same in hadoop map-reduce jobs: spark try assign input partition (hbase region) worker on same physical machine, data fetched directly without remote driver.
Comments
Post a Comment