python - Count unique elements from all the rows of a column -
i have data frame df column called "groups". looks below -
groups [u'cn=myusers,ou=groups,dc=sample,dc=com',u'cn=sample-users,ou=groups,dc=sample,dc=com'] [u'cn=myusers,ou=groups,dc=sample,dc=com',u'cn=sample-users,ou=groups,dc=sample,dc=com',u'cn=moreusers,ou=groups,dc=sample,dc=com']
first row contains 2 groups , 2nd row contains 3 groups. want make count of each unique group in whole column. resulting data frame should -
group count u'cn=myusers,ou=groups,dc=sample,dc=com' 2 u'cn=sample-users,ou=groups,dc=sample,dc=com' 2 u'cn=moreusers,ou=groups,dc=sample,dc=com' 1
how able achieve task. trying-
res=df.groups.apply(pd.series).stack().value_counts()
but doesn't give me expected result. doesn't break counts of individual groups.
this should work:
from itertools import chain pd.dataframe(map(lambda x: (x, 1), chain.from_iterable(df.groups.values))).groupby(0, as_index=false).sum().rename(columns={0:'group', 1:'count'}) group count 0 cn=moreusers,ou=groups,dc=sample,dc=com 1 1 cn=myusers,ou=groups,dc=sample,dc=com 2 2 cn=sample-users,ou=groups,dc=sample,dc=com 2
also this:
pd.dataframe(list(chain.from_iterable(df.groups.values)), columns=['group']).group.value_counts() cn=sample-users,ou=groups,dc=sample,dc=com 2 cn=myusers,ou=groups,dc=sample,dc=com 2 cn=moreusers,ou=groups,dc=sample,dc=com 1
time tests:
%timeit pd.dataframe(list(chain.from_iterable(df.groups.values)), columns=['group']).group.value_counts() 1000 loops, best of 3: 899 µs per loop %timeit pd.dataframe(list(chain.from_iterable(df.groups.values))).groupby(0, as_index=false).sum().rename(columns={0:'group', 1:'count'}) 100 loops, best of 3: 5.5 ms per loop
Comments
Post a Comment