hadoop - Mapper and reducer functions in python -

January 15, 2013

i want know whether there wrong mapper , reducer functions below. part of project in udacity's intro data science course

def mapper():     dic={}     line in sys.stdin:         data=line.strip().split(" ")         in data:             dic[i]=1     key, value in dic.iteritems():         print key,'\t', value

here values input string words separated space , function returns dictionary each word of string 'key' , it's counting 1 'value'.

def reducer():     dic={}     line in sys.stdin:         data=line.strip().split('\t')         if data[0] in dic.keys():             dic[data[0]]+=1         else:             dic[data[0]]=data[1]     key, value in dic.iteritems():         print key,'\t',value

here values inputted string consisting of word , count 1 separated tab. both functions executed differently. i'm not getting correct output.

it if told output expect, in dic[data[0]]=data[1] value data[1] string won't able add number such 1 it.

also, surely point of reducer may run multiple times when input count isn't going 1, may want add actual value rather incrementing.

def reducer():     dic=collections.defaultdict(int)     line in sys.stdin:         key, value=line.strip().split('\t')         dic[key] += int(value)     key, value in dic.iteritems():         print key,'\t',value

Search This Blog

Insert

hadoop - Mapper and reducer functions in python -

Comments

Post a Comment

Popular posts from this blog

vue.js - Create hooks for automated testing -

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

serial port - hub4com OVERRUN Error -