python-为什么在使用pandas apply时会出现AttributeError?

如何根据条件将NaN值转换为分类值.尝试转换Nan值时出现错误.

category           gender     sub-category    title

health&beauty      NaN         makeup         lipbalm

health&beauty      women       makeup         lipstick

NaN                NaN         NaN            lipgloss

我的DataFrame看起来像这样.我将性别的NaN值转换为分类值的函数看起来像

def impute_gender(cols):
    category=cols[0]
    sub_category=cols[2]
    gender=cols[1]
    title=cols[3]
    if title.str.contains('Lip') and gender.isnull==True:
        return 'women'
df[['category','gender','sub_category','title']].apply(impute_gender,axis=1)

如果我运行代码,则会收到错误消息

----> 7     if title.str.contains('Lip') and gender.isnull()==True:
      8         print(gender)
      9 

AttributeError: ("'str' object has no attribute 'str'", 'occurred at index category')

完整数据集-https://github.com/lakshmipriya04/py-sample

解决方法:

这里要注意的一些事情-

>如果仅使用两列,则调用Apply over 4列是浪费的
>呼叫申请通常很浪费,因为它很慢并且没有向您提供矢量化好处
>在apply中,您要处理标量,因此不要像使用pd.Series对象那样使用.str访问器. title.contains就足够了.或更确切地说,在标题中为“ lip”.
> sex.isnull是完全错误的,gender是一个标量,没有isull属性

选项1
np.where

m = df.gender.isnull() & df.title.str.contains('lip')
df['gender'] = np.where(m, 'women', df.gender)

df
        category gender sub-category     title
0  health&beauty  women       makeup   lipbalm
1  health&beauty  women       makeup  lipstick
2            NaN  women          NaN  lipgloss

这不仅快速,而且更简单.如果您担心大小写敏感,则可以使包含的检查不区分大小写-

m = df.gender.isnull() & df.title.str.contains('lip', flags=re.IGNORECASE)

选项2
另一种选择是使用pd.Series.mask / pd.Series.where-

df['gender'] = df.gender.mask(m, 'women')

要么,

df['gender'] = df.gender.where(~m, 'women')
df
        category gender sub-category     title
0  health&beauty  women       makeup   lipbalm
1  health&beauty  women       makeup  lipstick
2            NaN  women          NaN  lipgloss

掩码根据提供的掩码将新值隐式地应用于列.

上一篇:python – PyCharm:Py_Initialize:无法初始化sys标准流


下一篇:python – peewee:object没有属性_meta