用apply处理pandas比用for循环,快了无数倍,测试如下:
我们有一个pandas加载的dataframe如下,features是0和1特征的组合,可惜都是str形式(字符串形式),我们要将其转换成一个装有整型int 0和1的list
(1)用for循坏,要3个小时
1 from tqdm import tqdm 2 for i in tqdm(range(df.shape[0])): 3 df[‘features‘][i] = df[‘features‘][i].split(",") 4 for j in range(len(df[‘features‘][i])): 5 df[‘features‘][i][j] = int(df[‘features‘][i][j]) 6 7 print(type(df[‘features‘][0]))
(2)推荐用apply方法,60秒
1 from time import time 2 from tqdm import tqdm 3 4 def func(x): 5 l = x.split(",") 6 for i in range(len(l)): 7 l[i] = int(l[i]) 8 return l 9 10 stime = time() 11 df[‘new_features‘] = df[‘features‘].apply(func) 12 endtime = time() 13 14 print("time:"+str(endtime-stime)+"s") 15 #df.head() 16 print("over")