python - Pandas Create Link Pairs from Multiple Rows -


i have df id flow id, dttm step modification time, , step steps in flow. ordered dttm. there number of steps particular id.

current df:

    id      dttm                  step 0   81      2015-05-26 07:56:03   1   81      2015-05-26 08:19:07   b 2   81      2015-05-26 08:32:05   c 3   91      2015-05-26 08:07:12   b 4   91      2015-05-26 08:07:12   c 

i want create link data feed sankey. therefore want df 3 columns: source, target, , value. value count of ids have such step pair.

desired df:

    source   target   value 0          b        1 1   b        c        2 

i know can stuff step 1 row either groupby or possibly cat. however, think create different starting point without advancing solution. part of makes tough steps depend on dttm stay ordered steps appropriately paired. also, fact has dynamic because there number of steps adds difficulty.

how should dynamically "stuff" step column arrive @ link data?

is there way join df of pairs, , remove rows created during join nonsense?

thank , insight!

let's try:

(df.groupby('id')['step'].apply(lambda x: pd.dataframe(list(zip(x, x[1:]))).set_index([0,1]).assign(count=1))    .rename_axis(['id','source','target'])    .sum(level=[1,2]).reset_index()) 

output:

  source target  count 0           b      1 1      b      c      2 

Comments

Popular posts from this blog

javascript - Create a stacked percentage column -

Optimising Firebase database by automatically overwriting data -

javascript - Angular UI-Grid customTemplate directive causing rows to load slowly/? -