ML之4PolyR:利用四次多项式回归4PolyR模型+两种正则化(Lasso/Ridge)在披萨数据集上拟合(train)、价格回归预测(test)
输出结果
设计思路
核心代码
lasso_poly4 = Lasso()
lasso_poly4.fit(X_train_poly4, y_train)
ridge_poly4 = Ridge()
ridge_poly4.fit(X_train_poly4, y_train)
class Lasso(ElasticNet):
"""Linear Model trained with L1 prior as regularizer (aka the Lasso)
The optimization objective for Lasso is::
(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1
Technically the Lasso model is optimizing the same objective function as
the Elastic Net with ``l1_ratio=1.0`` (no L2 penalty).
Read more in the :ref:`User Guide <lasso>`.
Parameters
----------
alpha : float, optional
Constant that multiplies the L1 term. Defaults to 1.0.
``alpha = 0`` is equivalent to an ordinary least square, solved
by the :class:`LinearRegression` object. For numerical
reasons, using ``alpha = 0`` with the ``Lasso`` object is not advised.
Given this, you should use the :class:`LinearRegression` object.
fit_intercept : boolean
whether to calculate the intercept for this model. If set
to false, no intercept will be used in calculations
(e.g. data is expected to be already centered).
normalize : boolean, optional, default False
This parameter is ignored when ``fit_intercept`` is set to False.
If True, the regressors X will be normalized before regression by
subtracting the mean and dividing by the l2-norm.
If you wish to standardize, please use
:class:`sklearn.preprocessing.StandardScaler` before calling ``fit``
on an estimator with ``normalize=False``.
precompute : True | False | array-like, default=False
Whether to use a precomputed Gram matrix to speed up
calculations. If set to ``'auto'`` let us decide. The Gram
matrix can also be passed as argument. For sparse input
this option is always ``True`` to preserve sparsity.
copy_X : boolean, optional, default True
If ``True``, X will be copied; else, it may be overwritten.
max_iter : int, optional
The maximum number of iterations
tol : float, optional
The tolerance for the optimization: if the updates are
smaller than ``tol``, the optimization code checks the
dual gap for optimality and continues until it is smaller
than ``tol``.
warm_start : bool, optional
When set to True, reuse the solution of the previous call to fit as
initialization, otherwise, just erase the previous solution.
positive : bool, optional
When set to ``True``, forces the coefficients to be positive.
random_state : int, RandomState instance or None, optional, default
None
The seed of the pseudo random number generator that selects a
random
feature to update. If int, random_state is the seed used by the random
number generator; If RandomState instance, random_state is the
random
number generator; If None, the random number generator is the
RandomState instance used by `np.random`. Used when ``selection`` ==
'random'.
selection : str, default 'cyclic'
If set to 'random', a random coefficient is updated every iteration
rather than looping over features sequentially by default. This
(setting to 'random') often leads to significantly faster convergence
especially when tol is higher than 1e-4.
Attributes
----------
coef_ : array, shape (n_features,) | (n_targets, n_features)
parameter vector (w in the cost function formula)
sparse_coef_ : scipy.sparse matrix, shape (n_features, 1) | \
(n_targets, n_features)
``sparse_coef_`` is a readonly property derived from ``coef_``
intercept_ : float | array, shape (n_targets,)
independent term in decision function.
n_iter_ : int | array-like, shape (n_targets,)
number of iterations run by the coordinate descent solver to reach
the specified tolerance.
Examples
--------
>>> from sklearn import linear_model
>>> clf = linear_model.Lasso(alpha=0.1)
>>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000,
normalize=False, positive=False, precompute=False,
random_state=None,
selection='cyclic', tol=0.0001, warm_start=False)
>>> print(clf.coef_)
[ 0.85 0. ]
>>> print(clf.intercept_)
0.15
See also
--------
lars_path
lasso_path
LassoLars
LassoCV
LassoLarsCV
sklearn.decomposition.sparse_encode
Notes
-----
The algorithm used to fit the model is coordinate descent.
To avoid unnecessary memory duplication the X argument of the fit
method
should be directly passed as a Fortran-contiguous numpy array.
"""
path = staticmethod(enet_path)
def __init__(self, alpha=1.0, fit_intercept=True, normalize=False,
precompute=False, copy_X=True, max_iter=1000,
tol=1e-4, warm_start=False, positive=False,
random_state=None, selection='cyclic'):
super(Lasso, self).__init__(alpha=alpha, l1_ratio=1.0,
fit_intercept=fit_intercept, normalize=normalize,
precompute=precompute, copy_X=copy_X, max_iter=max_iter, tol=tol,
warm_start=warm_start, positive=positive, random_state=random_state,
selection=selection)
######################################################
#########################
# Functions for CV with paths functions
class Ridge(_BaseRidge, RegressorMixin):
"""Linear least squares with l2 regularization.
This model solves a regression model where the loss function is
the linear least squares function and regularization is given by
the l2-norm. Also known as Ridge Regression or Tikhonov regularization.
This estimator has built-in support for multi-variate regression
(i.e., when y is a 2d-array of shape [n_samples, n_targets]).
Read more in the :ref:`User Guide <ridge_regression>`.
Parameters
----------
alpha : {float, array-like}, shape (n_targets)
Regularization strength; must be a positive float. Regularization
improves the conditioning of the problem and reduces the variance of
the estimates. Larger values specify stronger regularization.
Alpha corresponds to ``C^-1`` in other linear models such as
LogisticRegression or LinearSVC. If an array is passed, penalties are
assumed to be specific to the targets. Hence they must correspond in
number.
fit_intercept : boolean
Whether to calculate the intercept for this model. If set
to false, no intercept will be used in calculations
(e.g. data is expected to be already centered).
normalize : boolean, optional, default False
This parameter is ignored when ``fit_intercept`` is set to False.
If True, the regressors X will be normalized before regression by
subtracting the mean and dividing by the l2-norm.
If you wish to standardize, please use
:class:`sklearn.preprocessing.StandardScaler` before calling ``fit``
on an estimator with ``normalize=False``.
copy_X : boolean, optional, default True
If True, X will be copied; else, it may be overwritten.
max_iter : int, optional
Maximum number of iterations for conjugate gradient solver.
For 'sparse_cg' and 'lsqr' solvers, the default value is determined
by scipy.sparse.linalg. For 'sag' solver, the default value is 1000.
tol : float
Precision of the solution.
solver : {'auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga'}
Solver to use in the computational routines:
- 'auto' chooses the solver automatically based on the type of data.
- 'svd' uses a Singular Value Decomposition of X to compute the Ridge
coefficients. More stable for singular matrices than
'cholesky'.
- 'cholesky' uses the standard scipy.linalg.solve function to
obtain a closed-form solution.
- 'sparse_cg' uses the conjugate gradient solver as found in
scipy.sparse.linalg.cg. As an iterative algorithm, this solver is
more appropriate than 'cholesky' for large-scale data
(possibility to set `tol` and `max_iter`).
- 'lsqr' uses the dedicated regularized least-squares routine
scipy.sparse.linalg.lsqr. It is the fastest but may not be available
in old scipy versions. It also uses an iterative procedure.
- 'sag' uses a Stochastic Average Gradient descent, and 'saga' uses
its improved, unbiased version named SAGA. Both methods also use an
iterative procedure, and are often faster than other solvers when
both n_samples and n_features are large. Note that 'sag' and
'saga' fast convergence is only guaranteed on features with
approximately the same scale. You can preprocess the data with a
scaler from sklearn.preprocessing.
All last five solvers support both dense and sparse data. However,
only 'sag' and 'saga' supports sparse input when `fit_intercept` is
True.
.. versionadded:: 0.17
Stochastic Average Gradient descent solver.
.. versionadded:: 0.19
SAGA solver.
random_state : int, RandomState instance or None, optional, default
None
The seed of the pseudo random number generator to use when
shuffling
the data. If int, random_state is the seed used by the random number
generator; If RandomState instance, random_state is the random
number
generator; If None, the random number generator is the RandomState
instance used by `np.random`. Used when ``solver`` == 'sag'.
.. versionadded:: 0.17
*random_state* to support Stochastic Average Gradient.
Attributes
----------
coef_ : array, shape (n_features,) or (n_targets, n_features)
Weight vector(s).
intercept_ : float | array, shape = (n_targets,)
Independent term in decision function. Set to 0.0 if
``fit_intercept = False``.
n_iter_ : array or None, shape (n_targets,)
Actual number of iterations for each target. Available only for
sag and lsqr solvers. Other solvers will return None.
.. versionadded:: 0.17
See also
--------
RidgeClassifier, RidgeCV, :class:`sklearn.kernel_ridge.KernelRidge`
Examples
--------
>>> from sklearn.linear_model import Ridge
>>> import numpy as np
>>> n_samples, n_features = 10, 5
>>> np.random.seed(0)
>>> y = np.random.randn(n_samples)
>>> X = np.random.randn(n_samples, n_features)
>>> clf = Ridge(alpha=1.0)
>>> clf.fit(X, y) # doctest: +NORMALIZE_WHITESPACE
Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
normalize=False, random_state=None, solver='auto', tol=0.001)
"""
def __init__(self, alpha=1.0, fit_intercept=True, normalize=False,
copy_X=True, max_iter=None, tol=1e-3, solver="auto",
random_state=None):
super(Ridge, self).__init__(alpha=alpha, fit_intercept=fit_intercept,
normalize=normalize, copy_X=copy_X, max_iter=max_iter, tol=tol,
solver=solver, random_state=random_state)
def fit(self, X, y, sample_weight=None):
"""Fit Ridge regression model
Parameters
----------
X : {array-like, sparse matrix}, shape = [n_samples, n_features]
Training data
y : array-like, shape = [n_samples] or [n_samples, n_targets]
Target values
sample_weight : float or numpy array of shape [n_samples]
Individual weights for each sample
Returns
-------
self : returns an instance of self.
"""
return super(Ridge, self).fit(X, y, sample_weight=sample_weight)