pandas - python - binary encoding of column containing multiple terms -
i need binary transformation of column containing lists of strings separated comma
.
can me in getting here:
df = pd.dataframe({'_id': [1,2,3], 'test': [['one', 'two', 'three'], ['three', 'one'], ['four', 'one']]}) df _id test 1 [one, two, three] 2 [three, one] 3 [four, one]
to:
df_result = pd.dataframe({'_id': [1,2,3], 'one': [1,1,1], 'two': [1,0,0], 'three': [1,1,0], 'four': [0,0,1]}) df_result[['_id', 'one', 'two', 'three', 'four']] _id 1 2 3 4 1 1 1 1 0 2 1 0 1 0 3 1 0 0 1
any appreciated!
you can use str.get_dummies
, pop
extract column out, convert str
str.join
, last join
:
df = df.join(df.pop('test').str.join('|').str.get_dummies()) print (df) _id 4 1 3 2 0 1 0 1 1 1 1 2 0 1 1 0 2 3 1 1 0 0
instead pop
possible use drop
:
df = df.drop('test', axis=1).join(df.pop('test').str.join('|').str.get_dummies()) print (df) _id 4 1 3 2 0 1 0 1 1 1 1 2 0 1 1 0 2 3 1 1 0 0
solution new dataframe
:
df1 = pd.get_dummies(pd.dataframe(df.pop('test').values.tolist()), prefix='', prefix_sep='') df = df.join(df1.groupby(level=0, axis=1).max()) print (df) _id 4 1 3 2 0 1 0 1 1 1 1 2 0 1 1 0 2 3 1 1 0 0
i try solution converting string
astype
, cleaning necessary:
df1=df.pop('test').astype(str).str.strip("'[]").str.replace("',\s+'", '|').str.get_dummies() df = df.join(df1) print (df) _id 4 1 3 2 0 1 0 1 1 1 1 2 0 1 1 0 2 3 1 1 0 0
Comments
Post a Comment