Call func on self producing a Series with transformed values. Produced Series will have same axis length as self. Parameters funcfunction, str, list or dict Function to use for transforming the data. If a function, must either work when passed a Series or when passed to Series.apply. Accepted combinations are: function string function name list of functions and/or function names, e.g. [np.exp. ‘sqrt‘] dict of axis labels -> functions, function names or list of such. axis{0 or ‘index’} Parameter needed for compatibility with DataFrame. *args Positional arguments to pass to func. **kwargs Keyword arguments to pass to func. Returns Series A Series that must have the same length as self. Raises ValueErrorIf the returned Series has a different length than self.
Series.agg(self, func, axis=0, *args, **kwargs)[source] Aggregate using one or more operations over the specified axis. New in version 0.20.0. Parameters funcfunction, str, list or dict Function to use for aggregating the data. If a function, must either work when passed a Series or when passed to Series.apply. Accepted combinations are: function string function name list of functions and/or function names, e.g. [np.sum, ‘mean‘] dict of axis labels -> functions, function names or list of such. axis{0 or ‘index’} Parameter needed for compatibility with DataFrame. *args Positional arguments to pass to func. **kwargs Keyword arguments to pass to func. Returns scalar, Series or DataFrame The return can be: scalar : when Series.agg is called with single function Series : when DataFrame.agg is called with a single function DataFrame : when DataFrame.agg is called with several functions Return scalar, Series or DataFrame.
DataFrame.agg(self, func, axis=0, *args, **kwargs)[source] Aggregate using one or more operations over the specified axis. Parameters funcfunction, str, list or dict Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply. Accepted combinations are: function string function name list of functions and/or function names, e.g. [np.sum, ‘mean‘] dict of axis labels -> functions, function names or list of such. axis{0 or ‘index’, 1 or ‘columns’}, default 0 If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row. *args Positional arguments to pass to func. **kwargs Keyword arguments to pass to func. Returns scalar, Series or DataFrame The return can be: scalar : when Series.agg is called with single function Series : when DataFrame.agg is called with a single function DataFrame : when DataFrame.agg is called with several functions Return scalar, Series or DataFrame. The aggregation operations are always performed over an axis, either the index (default) or the column axis. This behavior is different from numpy aggregation functions (mean, median, prod, sum, std, var), where the default is to compute the aggregation of the flattened array, e.g., numpy.mean(arr_2d) as opposed to numpy.mean(arr_2d, axis=0). agg is an alias for aggregate. Use the alias.
df = pd.DataFrame({‘A‘: range(3), ‘B‘: range(1, 4)}) df A B 0 0 1 1 1 2 2 2 3 df.transform(lambda x: x + 1) A B 0 1 2 1 2 3 2 3 4
apply() 与transform() agg()的异同点:
- pandas.core.groupby.GroupBy
- pandas.DataFrame
- pandas.Series
2.agg() / transform()方法可以反射调用(str调用)‘sum‘、‘max‘、‘min‘、‘count‘等方法,形如agg(‘sum‘)。apply不能直接使用,而可以用自定义函数+列特征的方法调用。
3.transform() 里面不能跟自定义的特征交互函数,因为transform是真针对每一元素(即每一列特征操作)进行计算
2.1 transform() 方法+自定义函数
2.2 transform() 方法+python内置方法
2.3 apply() 方法+自定义函数
2.4 agg() 方法+自定义函数
2.5 agg() 方法+python内置方法
2.6 结论
agg()+python内置方法的计算速度最快,其次是transform()+python内置方法。而 transform() 方法+自定义函数 的组合方法最慢,需要避免使用!