我的DataFrame在表单中:
TimeWeek TimeSat TimeHoli
0 6:40:00 8:00:00 8:00:00
1 6:45:00 8:05:00 8:05:00
2 6:50:00 8:09:00 8:10:00
3 6:55:00 8:11:00 8:14:00
4 6:58:00 8:13:00 8:17:00
5 7:40:00 8:15:00 8:21:00
我需要在TimeWeek,TimeSat和TimeHoli中找到每一行之间的时差,输出必须是
TimeWeekDiff TimeSatDiff TimeHoliDiff
00:05:00 00:05:00 00:05:00
00:05:00 00:04:00 00:05:00
00:05:00 00:02:00 00:04:00
00:03:00 00:02:00 00:03:00
00:02:00 00:02:00 00:04:00
我尝试使用(d [‘TimeWeek’] – df [‘TimeWeek’].shift().fillna(0),它会抛出一个错误:
TypeError: unsupported operand type(s) for -: 'str' and 'str'
可能是因为列中存在’:’.我该如何解决这个问题?
解决方法:
看起来错误是因为数据是字符串而不是时间戳.首先将它们转换为时间戳:
df2 = df.apply(lambda x: [pd.Timestamp(ts) for ts in x])
默认情况下,它们将包含今天的日期,但是一旦你区分时间,这一点无关紧要(希望你不必担心日期差异23:55和00:05).
转换后,只需区分DataFrame即可:
>>> df2 - df2.shift()
TimeWeek TimeSat TimeHoli
0 NaT NaT NaT
1 00:05:00 00:05:00 00:05:00
2 00:05:00 00:04:00 00:05:00
3 00:05:00 00:02:00 00:04:00
4 00:03:00 00:02:00 00:03:00
5 00:42:00 00:02:00 00:04:00
根据您的需要,您可以只取第1行(忽略NaT):
(df2 - df2.shift()).iloc[1:, :]
或者你可以用零填充NaT:
(df2 - df2.shift()).fillna(0)