r - Calculation of date difference (BUG?) for POSIXct columns -
i using code difference in hours 2 posixct dates.
x <- transform(x, hrs = ceiling(as.numeric(ship_date-pick_date)))
this gives accurate results. however, when tried find hour differences similar column, needed this:
x <- transform(x, hrs_adj = ceiling(as.numeric(ship_date-adj_pick_date)/60))
pick_date & ship_date extracted using same formula.
x$ship_date <- ifelse(is.na(as.posixct(x$ship_date, format="%d-%b-%y %h:%m %p")), yes = as.posixct(x$ship_date, format="%d-%b-%y %h:%m"), no = as.posixct(x$ship_date, format="%d-%b-%y %h:%m %p")) x$ship_date <- as.posixct(x$ship_date, origin = "1970-01-01")
adj_pick_date computed below:
x$adj_pick_date <- ifelse(x$pick_time=="early", as.posixct(paste(format(x$pick_date, "%d-%b-%y"), "03:00"), format="%d-%b-%y %h:%m"), x$pick_date) x$adj_pick_date <- ifelse(x$pick_time=="late", as.posixct(paste(format(x$pick_date+86400, "%d-%b-%y"), "03:00"), format="%d-%b-%y %h:%m"), x$adj_pick_date) x$adj_pick_date <- as.posixct(x$adj_pick_date, origin = "1970-01-01")
pick_time computed adjust pick_date, orders between 16:00 & 03:00, lead time calculated 3am.
questions:
- how efficiently generate adj_pick_date column (now slow)?
- how extract source data posixct using shorter , more efficient code? (it takes 10-15 seconds per million data point on i7 7th gen cpu)
- why did need use different formula each pair of dates calculate no of days?
sample data (the dates formatted randomly in source (pick_date & ship_date) both "dd-mmm-yyyy hh:mm" , "dd-mmm-yyyy hh:mm am/pm"):
pick_date ship_date pick_time 01-apr-2017 00:51 02-apr-2017 06:55 01-apr-2017 00:51 02-apr-2017 12:11 pm 01-apr-2017 07:51 02-apr-2017 12:11 pm okay 01-apr-2017 02:51 pm 02-apr-2017 09:39 late
ok, got of solutions now.
- using
lubridate
package, method takes 50% processing time:
x$adj_pick_date <- ifelse(x$pick_time=="early", dmy_hm(paste(format(x$pick_date, "%d-%b-%y"), "03:00")), ifelse(x$pick_time=="late", dmy_hm(paste(format(x$pick_date+86400, "%d-%b-%y"), "03:00")), x$pick_date)) x$adj_pick_date <- as.posixct(x$adj_pick_date, origin = "1970-01-01")
- again, using
lubridate
:
x$ship_date <- lubridate::dmy_hm(x$ship_date) x$pick_date <- lubridate::dmy_hm(x$pick_date)
- probably formatting error while doing conversion. still need on problem.
Comments
Post a Comment