python 3.x - Pandas: How to analyse data with start and end timestamp? -


i have analyze activity of users uses application during given period, periods start , end timestamp. tried bar chart not know how include hours in interval. ex : user uid=2 use application @ [18, 19, 20, 21]

my dataframe like:

uid           sex          start                 end 1             0       2000-01-28 16:47:00   2000-01-28 17:47:00 2             1       2000-01-28 18:07:00   2000-01-28 21:47:00 3             1       2000-01-28 18:47:00   2000-01-28 20:17:00 4             0       2000-01-28 08:00:00   2000-01-28 10:00:00 5             1       2000-01-28 02:05:00   2000-01-28 02:30:00 6             0       2000-01-28 15:10:00   2000-01-28 18:04:00 7             0       2000-01-28 01:50:00   2000-01-28 03:00:00   df['hour_s'] = pd.to_datetime(df['start']).apply(lambda x: x.hour) df['hour_e'] = pd.to_datetime(df['end']).apply(lambda x: x.hour)  uid           sex          start                 end              hour_s      hour_e 1             0       2000-01-28 16:47:00   2000-01-28 17:47:00   16          17 2             1       2000-01-28 18:07:00   2000-01-28 21:47:00   18          21 3             1       2000-01-28 18:47:00   2000-01-28 20:17:00   18          20 4             0       2000-01-28 08:00:00   2000-01-28 10:00:00   08          10 5             1       2000-01-28 02:05:00   2000-01-28 02:30:00   02          02 6             0       2000-01-28 15:10:00   2000-01-28 18:04:00   15          18 7             0       2000-01-28 01:50:00   2000-01-28 03:00:00   01          03 

i have find number of users in specifc hours

i'm not sure whether looking gantt chart. if so, hints @vinícius aguiar, in comments.

from last line

i have find number of users in specifc hours

it seems need histogram showing user amount (freqeuncy) pivoted hour of day. if case, can this:

#! /usr/bin/python3  import matplotlib.pyplot plt import pandas pd import numpy np  # read data df=pd.read_csv("data.csv")  # hours per user (per observation) def sum_hours(obs):     return(list(range(obs['hour_s'],obs['hour_e']+1,1)))  # existing activity hours (no matter user) hours2d=list(df.apply(sum_hours,axis=1)) # existing hours hoursflat=[hour sublist in hours2d hour in sublist]  plt.hist(hoursflat,rwidth=0.5,range=(0,24)) plt.xticks(np.arange(0,24, 1.0)) plt.xlabel('hour of day') plt.ylabel('users') plt.show() 

where data.csv sample provided:

uid, sex,start,end,hour_s,hour_e 1,0,2000-01-28 16:47:00,2000-01-28 17:47:00,16,17 2,1,2000-01-28 18:07:00,2000-01-28 21:47:00,18,21 3,1,2000-01-28 18:47:00,2000-01-28 20:17:00,18,20 4,0,2000-01-28 08:00:00,2000-01-28 10:00:00,08,10 5,1,2000-01-28 02:05:00,2000-01-28 02:30:00,02,02 6,0,2000-01-28 15:10:00,2000-01-28 18:04:00,15,18 7,0,2000-01-28 01:50:00,2000-01-28 03:00:00,01,03 

you should following graph: data pivoted showing user amounts hour


Comments

Popular posts from this blog

php - Vagrant up error - Uncaught Reflection Exception: Class DOMDocument does not exist -

vue.js - Create hooks for automated testing -

Add new key value to json node in java -