sklearn之XGBModel:XGBModel之feature_importances_、plot_importance的简介、使用方法之详细攻略(一)

feature_importances_


1、feature_importances_方法的解释


XGBRegressor( ).feature_importances_


参数


注意:特性重要性只定义为树增强器。只有在选择决策树模型作为基础时,才定义特征重要性。

学习器(“助推器= gbtree”)。它不定义为其他基本的学习者类型,如线性学习者 (`booster=gblinear`).。

返回


feature_importances_: ' ' [n_features] ' '形状的数组

注意:importance_type: string, default "gain", The feature importance type for the feature_importances_ property: either "gain", "weight", "cover", "total_gain" or "total_cover".



2、feature_importances_的原生代码

class XGBModel(XGBModelBase):

   # pylint: disable=too-many-arguments, too-many-instance-attributes, invalid-name

   """Implementation of the Scikit-Learn API for XGBoost.

   Parameters

   ----------

   max_depth : int

       Maximum tree depth for base learners.

   learning_rate : float

       Boosting learning rate (xgb's "eta")

   n_estimators : int

       Number of boosted trees to fit.

   silent : boolean

       Whether to print messages while running boosting.

   objective : string or callable

       Specify the learning task and the corresponding learning objective or

       a custom objective function to be used (see note below).

   booster: string

       Specify which booster to use: gbtree, gblinear or dart.

   nthread : int

       Number of parallel threads used to run xgboost.  (Deprecated, please use ``n_jobs``)

   n_jobs : int

       Number of parallel threads used to run xgboost.  (replaces ``nthread``)

   gamma : float

       Minimum loss reduction required to make a further partition on a leaf node of the tree.

   min_child_weight : int

       Minimum sum of instance weight(hessian) needed in a child.

   max_delta_step : int

       Maximum delta step we allow each tree's weight estimation to be.

   subsample : float

       Subsample ratio of the training instance.

   colsample_bytree : float

       Subsample ratio of columns when constructing each tree.

   colsample_bylevel : float

       Subsample ratio of columns for each split, in each level.

   reg_alpha : float (xgb's alpha)

       L1 regularization term on weights

   reg_lambda : float (xgb's lambda)

       L2 regularization term on weights

   scale_pos_weight : float

       Balancing of positive and negative weights.

   base_score:

       The initial prediction score of all instances, global bias.

   seed : int

       Random number seed.  (Deprecated, please use random_state)

   random_state : int

       Random number seed.  (replaces seed)

   missing : float, optional

       Value in the data which needs to be present as a missing value. If

       None, defaults to np.nan.

   importance_type: string, default "gain"

       The feature importance type for the feature_importances_ property: either "gain",

       "weight", "cover", "total_gain" or "total_cover".

   \*\*kwargs : dict, optional

       Keyword arguments for XGBoost Booster object.  Full documentation of parameters can

       be found here: https://github.com/dmlc/xgboost/blob/master/doc/parameter.rst.

       Attempting to set a parameter via the constructor args and \*\*kwargs dict simultaneously

       will result in a TypeError.

       .. note:: \*\*kwargs unsupported by scikit-learn

           \*\*kwargs is unsupported by scikit-learn.  We do not guarantee that parameters

           passed via this argument will interact properly with scikit-learn.

   Note

   ----

   A custom objective function can be provided for the ``objective``

   parameter. In this case, it should have the signature

   ``objective(y_true, y_pred) -> grad, hess``:

   y_true: array_like of shape [n_samples]

       The target values

   y_pred: array_like of shape [n_samples]

       The predicted values

   grad: array_like of shape [n_samples]

       The value of the gradient for each sample point.

   hess: array_like of shape [n_samples]

       The value of the second derivative for each sample point

   """

   def __init__(self, max_depth=3, learning_rate=0.1, n_estimators=100,

                silent=True, objective="reg:linear", booster='gbtree',

                n_jobs=1, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0,

                subsample=1, colsample_bytree=1, colsample_bylevel=1,

                reg_alpha=0, reg_lambda=1, scale_pos_weight=1,

                base_score=0.5, random_state=0, seed=None, missing=None,

                importance_type="gain", **kwargs):

       if not SKLEARN_INSTALLED:

           raise XGBoostError('sklearn needs to be installed in order to use this module')

       self.max_depth = max_depth

       self.learning_rate = learning_rate

       self.n_estimators = n_estimators

       self.silent = silent

       self.objective = objective

       self.booster = booster

       self.gamma = gamma

       self.min_child_weight = min_child_weight

       self.max_delta_step = max_delta_step

       self.subsample = subsample

       self.colsample_bytree = colsample_bytree

       self.colsample_bylevel = colsample_bylevel

       self.reg_alpha = reg_alpha

       self.reg_lambda = reg_lambda

       self.scale_pos_weight = scale_pos_weight

       self.base_score = base_score

       self.missing = missing if missing is not None else np.nan

       self.kwargs = kwargs

       self._Booster = None

       self.seed = seed

       self.random_state = random_state

       self.nthread = nthread

       self.n_jobs = n_jobs

       self.importance_type = importance_type

def feature_importances_(self):

   """

   Feature importances property

   

   .. note:: Feature importance is defined only for tree boosters

   

   Feature importance is only defined when the decision tree model is chosen as base

   learner (`booster=gbtree`). It is not defined for other base learner types, such

   as linear learners (`booster=gblinear`).

   

   Returns

   -------

   feature_importances_ : array of shape ``[n_features]``

   

   """

   if getattr(self, 'booster', None) is not None and self.booster != 'gbtree':

       raise AttributeError(

           'Feature importance is not defined for Booster type {}'.format(self.booster))

   b = self.get_booster()

   score = b.get_score(importance_type=self.importance_type)

   all_features = [score.get(f, 0.) for f in b.feature_names]

   all_features = np.array(all_features, dtype=np.float32)

   return all_features / all_features.sum()


上一篇:STL中排序函数的用法(Qsort,Sort,Stable_sort,Partial_sort,List::sort)


下一篇:elasticsearch-jdbc实现MySQL同步到ElasticSearch深入详解