dropna 缺失数据处理

pandas 官方 api

  1. 函数原型
DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
  1. 参数意义
  • axis{0 or ‘index’, 1 or ‘columns’}, default 0

    • Determine if rows or columns which contain missing values are removed.
      • 0, or ‘index’ : Drop rows which contain missing values.
      • 1, or ‘columns’ : Drop columns which contain missing value.
    • Changed in version 1.0.0: Pass tuple or list to drop on multiple axes. Only a single axis is allowed.
  • how{‘any’, ‘all’}, default ‘any’

    • Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.
      • ‘any’ : If any NA values are present, drop that row or column.
      • ‘all’ : If all values are NA, drop that row or column.
  • threshint, optional

    • Require that many non-NA values.
  • subsetarray-like, optional

    • Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.
  • inplacebool, default False

    • If True, do operation inplace and return None.
  • Returns

    • DataFrame or None
    • DataFrame with NA entries dropped from it or None if inplace=True.
  1. 样例
df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
                   "toy": [np.nan, 'Batmobile', 'Bullwhip'],
                   "born": [pd.NaT, pd.Timestamp("1940-04-25"),
                            pd.NaT]})       
	name        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

默认删除

df.dropna()
     name        toy       born
1  Batman  Batmobile 1940-04-25

删除所有存在NAN值的列

df.dropna(axis='columns')
       name
0    Alfred
1    Batman
2  Catwoman

删除所有列都为空的行

df.dropna(how='all')
       name        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

删除空值大于2的列

df.dropna(thresh=2)
       name        toy       born
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

删除name,toy列为空的行

df.dropna(subset=['name', 'toy'])
       name        toy       born
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT
df.dropna(inplace=True)
     name        toy       born
 1  Batman  Batmobile 1940-04-25
上一篇:Python中进程间通信出现(PermissionError: [WinError 5] 拒绝访问。)


下一篇:【题解】[CEOI2018]toy