如果我有一定数量的基本特征并且从它们生成适度的多项式特征顺序,那么知道特征数组preprocess_XX的哪一列对应于基本特征的哪个变换会让人感到困惑.
我曾经做过类似下面的事情,使用旧版sklearn(可能是0.14?):
import numpy as np
from sympy import Symbol
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(4)
x1 = Symbol('x1')
x2 = Symbol('x2')
x3 = Symbol('x3')
XX = np.random.rand(1000, 3) # replace with the actual data array
preprocess_symXX = poly.fit_transform([x1, x2, x3])
preprocess_XX = poly.fit_transform(XX)
print preprocess_symXX
这太棒了.它将产生类似[1,x1,x2,x3,x1 ** 2,…]的输出,这将让我知道preprocess_XX的列实际来自哪些多项式函数.
但现在当我这样做时,它会抱怨TypeError:无法将表达式转换为float.引发此异常是因为sklearn.utils.validation中的函数名为check_array(),它试图将输入转换为poly.fit_transform()为dtype = float.
您是否建议如何查看基本特征的多项式对应于fit_transform()输出中的哪一列?现在,sympy似乎不再适用于fit_transform?
解决方法:
使用poly.powers_获取权力.然后你可以将它转换成人类可读的东西:
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
X = np.random.rand(1000, 3)
poly = PolynomialFeatures(4)
Y = poly.fit_transform(X)
features = ['X1','X2','X3']
print(poly.powers_)
for entry in poly.powers_:
newFeature = []
for feat, coef in zip(features, entry):
if coef > 0:
newFeature.append(feat+'**'+str(coef))
if not newFeature:
print(1) # If all powers are 0
else:
print(' + '.join(newFeature))
打印(打印poly.powers_后):
1
X1**1
X2**1
X3**1
X1**2
X1**1 + X2**1
X1**1 + X3**1
X2**2
X2**1 + X3**1
X3**2
X1**3
X1**2 + X2**1
X1**2 + X3**1
X1**1 + X2**2
X1**1 + X2**1 + X3**1
X1**1 + X3**2
X2**3
X2**2 + X3**1
X2**1 + X3**2
X3**3
X1**4
X1**3 + X2**1
X1**3 + X3**1
X1**2 + X2**2
X1**2 + X2**1 + X3**1
X1**2 + X3**2
X1**1 + X2**3
X1**1 + X2**2 + X3**1
X1**1 + X2**1 + X3**2
X1**1 + X3**3
X2**4
X2**3 + X3**1
X2**2 + X3**2
X2**1 + X3**3
X3**4