hadoop - Mapper and reducer functions in python -
i want know whether there wrong mapper , reducer functions below. part of project in udacity's intro data science course
def mapper(): dic={} line in sys.stdin: data=line.strip().split(" ") in data: dic[i]=1 key, value in dic.iteritems(): print key,'\t', value
here values input string words separated space , function returns dictionary each word of string 'key' , it's counting 1 'value'.
def reducer(): dic={} line in sys.stdin: data=line.strip().split('\t') if data[0] in dic.keys(): dic[data[0]]+=1 else: dic[data[0]]=data[1] key, value in dic.iteritems(): print key,'\t',value
here values inputted string consisting of word , count 1 separated tab. both functions executed differently. i'm not getting correct output.
it if told output expect, in dic[data[0]]=data[1]
value data[1]
string won't able add number such 1
it.
also, surely point of reducer may run multiple times when input count isn't going 1
, may want add actual value rather incrementing.
def reducer(): dic=collections.defaultdict(int) line in sys.stdin: key, value=line.strip().split('\t') dic[key] += int(value) key, value in dic.iteritems(): print key,'\t',value
Comments
Post a Comment