In Distributed Tensorflow, what is the effect of having multiple parameter servers? -
when have parameter server updated workers, effect of having multiple parameter servers same number of workers?
i.e. happens when have multiple parameter servers instead of 1 parameter server?
thank you.
this known having multiple parameter server shards. gives more details https://static.googleusercontent.com/media/research.google.com/en//archive/large_deep_networks_nips2012.pdf, section 4.1
to apply sgd large data sets, introduce downpour sgd, variant of asynchronous stochastic gradient descent uses multiple replicas of single distbelief model. basic approach follows: divide training data number of subsets , run copy of model on each of these subsets. models communicate updates through centralized parameter server, keeps current state of parameters model, sharded across many machines (e.g., if have 10 parameter server shards, each shard responsible storing , applying updates 1/10th of model parameters) (figure 2)
Comments
Post a Comment