python - Pandas Create Link Pairs from Multiple Rows -
i have df id flow id, dttm step modification time, , step steps in flow. ordered dttm. there number of steps particular id.
current df:
id dttm step 0 81 2015-05-26 07:56:03 1 81 2015-05-26 08:19:07 b 2 81 2015-05-26 08:32:05 c 3 91 2015-05-26 08:07:12 b 4 91 2015-05-26 08:07:12 c i want create link data feed sankey. therefore want df 3 columns: source, target, , value. value count of ids have such step pair.
desired df:
source target value 0 b 1 1 b c 2 i know can stuff step 1 row either groupby or possibly cat. however, think create different starting point without advancing solution. part of makes tough steps depend on dttm stay ordered steps appropriately paired. also, fact has dynamic because there number of steps adds difficulty.
how should dynamically "stuff" step column arrive @ link data?
is there way join df of pairs, , remove rows created during join nonsense?
thank , insight!
let's try:
(df.groupby('id')['step'].apply(lambda x: pd.dataframe(list(zip(x, x[1:]))).set_index([0,1]).assign(count=1)) .rename_axis(['id','source','target']) .sum(level=[1,2]).reset_index()) output:
source target count 0 b 1 1 b c 2
Comments
Post a Comment