(原)怎样解决python dataframe loc，iloc循环处理速度很慢的问题

2024-03-05 10:56:42

怎样解决python dataframe loc，iloc循环处理速度很慢的问题

1.问题说明

最近用DataFrame做大数据处理，发现处理速度特别慢，追究原因，发现是循环处理时，loc，iloc速度都特别慢，当数据量特别大得时候真的是超级慢。查很多资料，发现没有详细说明，以下为解决办法

2.问题解决

使用 Pandas.Series.apply 方法，可以对一列数据快速进行处理

Series.apply(*func*, *convert_dtype=True*, *args=()*, **\*kwds*)

函数说明：

To lunch typora from Terminal, you could add

func : function

convert_dtype : boolean, default True

    Try to find better dtype for elementwise function results. If False, leave as dtype=object

args : tuple

    Positional arguments to pass to function in addition to the value

Additional keyword arguments will be passed as keywords to the function

例子讲解

# 首先导入数据

>>> import pandas as pd

>>> import numpy as np

>>> series = pd.Series([20, 21, 12], index=['London','New York','Helsinki'])

>>> series

London      20

New York    21

Helsinki    12

dtype: int64

# 应用1，把每个值都*2

>>> def square(x):

...     return x**2

>>> series.apply(square)

London      400

New York    441

Helsinki    144

dtype: int64

>>> series.apply(lambda x: x**2)

London      400

New York    441

Helsinki    144

dtype: int64

# 应用2，相减

>>> def subtract_custom_value(x, custom_value):

...     return x-custom_value

>>> series.apply(subtract_custom_value, args=(5,))

London      15

New York    16

Helsinki     7

dtype: int64

# 使用numpy library中得函数

>>> series.apply(np.log)

London      2.995732

New York    3.044522

Helsinki    2.484907

dtype: float64

3.总结

这样可以快速操作一列数据，不必循环操作每行每列数据，对于大数据处理是非常有用的

码农公寓

怎样解决python dataframe loc，iloc循环处理速度很慢的问题

1.问题说明

2.问题解决

3.总结

相关文章