一、R语言的mlr packages
install.packages("mlr")之后就可以看到R里面有哪些机器学习算法、在哪个包里面。
a<-listLearners()
这个包是听CDA网络课程《R语言与机器学习实战》余文华老师所述,感觉很棒,有待以后深入探讨。以下表格是R语言里面,52个机器学习算法的来源以及一些数据要求。
class | name | short.name | package | note | type | installed | numerics | factors | ordered | missings | weights | prob | oneclass | twoclass | multiclass | class.weights | se | lcens | rcens | icens | |
1 | classif.avNNet | Neural Network | avNNet | nnet | `size` has been set to `3` by default. Doing bagging training of `nnet` if set `bag = TRUE`. | classif | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | TRUE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE |
2 | classif.binomial | Binomial Regression | binomial | stats | Delegates to `glm` with freely choosable binomial link function via learner parameter `link`. | classif | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | TRUE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE |
3 | classif.C50 | C50 | C50 | C50 | classif | TRUE | TRUE | TRUE | FALSE | TRUE | TRUE | TRUE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | |
4 | classif.cforest | Random forest based on conditional inference trees | cforest | party | See `?ctree_control` for possible breakage for nominal features with missingness. | classif | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE |
5 | classif.ctree | Conditional Inference Trees | ctree | party | See `?ctree_control` for possible breakage for nominal features with missingness. | classif | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE |
6 | classif.cvglmnet | GLM with Lasso or Elasticnet Regularization (Cross Validated Lambda) | cvglmnet | glmnet | The family parameter is set to `binomial` for two-class problems and to `multinomial` otherwise. Factors automatically get converted to dummy columns, ordered factors to integer. | classif | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | TRUE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE |
7 | classif.gausspr | Gaussian Processes | gausspr | kernlab | Kernel parameters have to be passed directly and not by using the `kpar` list in `gausspr`. Note that `fit` has been set to `FALSE` by default for speed. | classif | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | TRUE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE |
8 | classif.gbm | Gradient Boosting Machine | gbm | gbm | `keep.data` is set to FALSE to reduce memory requirements. Note on param 'distribution': gbm will select 'bernoulli' by default for 2 classes, and 'multinomial' for multiclass problems. The latter is the only setting that works for > 2 classes. | classif | TRUE | TRUE | TRUE | FALSE | TRUE | TRUE | TRUE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE |
9 | classif.glmnet | GLM with Lasso or Elasticnet Regularization | glmnet | glmnet | The family parameter is set to `binomial` for two-class problems and to `multinomial` otherwise. Factors automatically get converted to dummy columns, ordered factors to integer. Parameter `s` (value of the regularization parameter used for predictions) is set to `0.1` by default, but needs to be tuned by the user. | classif | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | TRUE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE |
10 | classif.h2o.deeplearning | h2o.deeplearning | h2o.dl | h2o | classif | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | TRUE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | |
11 | classif.h2o.gbm | h2o.gbm | h2o.gbm | h2o | 'distribution' is set automatically to 'gaussian'. | classif | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | TRUE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE |
12 | classif.h2o.glm | h2o.glm | h2o.glm | h2o | 'family' is always set to 'binomial' to get a binary classifier. | classif | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | TRUE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE |
13 | classif.h2o.randomForest | h2o.randomForest | h2o.rf | h2o | classif | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | TRUE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | |
14 | classif.knn | k-Nearest Neighbor | knn | class | classif | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | |
15 | classif.ksvm | Support Vector Machines | ksvm | kernlab | Kernel parameters have to be passed directly and not by using the `kpar` list in `ksvm`. Note that `fit` has been set to `FALSE` by default for speed. | classif | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | TRUE | FALSE | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE |
16 | classif.lda | Linear Discriminant Analysis | lda | MASS | Learner parameter `predict.method` maps to `method` in `predict.lda`. | classif | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | TRUE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE |
17 | classif.logreg | Logistic Regression | logreg | stats | Delegates to `glm` with `family = binomial(link = "logit")`. | classif | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | TRUE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE |
18 | classif.lssvm | Least Squares Support Vector Machine | lssvm | kernlab | `fitted` has been set to `FALSE` by default for speed. | classif | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE |
19 | classif.lvq1 | Learning Vector Quantization | lvq1 | class | classif | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | |
20 | classif.mlp | Multi-Layer Perceptron | mlp | RSNNS | classif | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | |
21 | classif.multinom | Multinomial Regression | multinom | nnet | classif | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | TRUE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | |
22 | classif.naiveBayes | Naive Bayes | nbayes | e1071 | classif | TRUE | TRUE | TRUE | FALSE | TRUE | FALSE | TRUE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | |
23 | classif.nnet | Neural Network | nnet | nnet | `size` has been set to `3` by default. | classif | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | TRUE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE |
24 | classif.plsdaCaret | Partial Least Squares (PLS) Discriminant Analysis | plsdacaret | caret | classif | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | |
25 | classif.probit | Probit Regression | probit | stats | Delegates to `glm` with `family = binomial(link = "probit")`. | classif | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | TRUE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE |
26 | classif.qda | Quadratic Discriminant Analysis | qda | MASS | Learner parameter `predict.method` maps to `method` in `predict.qda`. | classif | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | TRUE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE |
27 | classif.randomForest | Random Forest | rf | randomForest | Note that the rf can freeze the R process if trained on a task with 1 feature which is constant. This can happen in feature forward selection, also due to resampling, and you need to remove such features with removeConstantFeatures. | classif | TRUE | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | FALSE | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE |
28 | classif.rpart | Decision Tree | rpart | rpart | `xval` has been set to `0` by default for speed. | classif | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE |
29 | classif.svm | Support Vector Machines (libsvm) | svm | e1071 | classif | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | TRUE | FALSE | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | |
30 | classif.xgboost | eXtreme Gradient Boosting | xgboost | xgboost | All settings are passed directly, rather than through `xgboost`'s `params` argument. `nrounds` has been set to `1` by default. `num_class` is set internally, so do not set this manually. | classif | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | TRUE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE |
31 | cluster.dbscan | DBScan Clustering | dbscan | fpc | A cluster index of NA indicates noise points. Specify `method = "dist"` if the data should be interpreted as dissimilarity matrix or object. Otherwise Euclidean distances will be used. | cluster | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE |
32 | cluster.kkmeans | Kernel K-Means | kkmeans | kernlab | `centers` has been set to `2L` by default. The nearest center in kernel distance determines cluster assignment of new data points. Kernel parameters have to be passed directly and not by using the `kpar` list in `kkmeans` | cluster | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE |
33 | regr.avNNet | Neural Network | avNNet | nnet | `size` has been set to `3` by default. | regr | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE |
34 | regr.cforest | Random Forest Based on Conditional Inference Trees | cforest | party | See `?ctree_control` for possible breakage for nominal features with missingness. | regr | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE |
35 | regr.ctree | Conditional Inference Trees | ctree | party | See `?ctree_control` for possible breakage for nominal features with missingness. | regr | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE |
36 | regr.gausspr | Gaussian Processes | gausspr | kernlab | Kernel parameters have to be passed directly and not by using the `kpar` list in `gausspr`. Note that `fit` has been set to `FALSE` by default for speed. | regr | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE |
37 | regr.gbm | Gradient Boosting Machine | gbm | gbm | `keep.data` is set to FALSE to reduce memory requirements, `distribution` has been set to `"gaussian"` by default. | regr | TRUE | TRUE | TRUE | FALSE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE |
38 | regr.glm | Generalized Linear Regression | glm | stats | 'family' must be a character and every family has its own link, i.e. family = 'gaussian', link.gaussian = 'identity', which is also the default. | regr | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE |
39 | regr.glmnet | GLM with Lasso or Elasticnet Regularization | glmnet | glmnet | Factors automatically get converted to dummy columns, ordered factors to integer. Parameter `s` (value of the regularization parameter used for predictions) is set to `0.1` by default, but needs to be tuned by the user. | regr | TRUE | TRUE | TRUE | TRUE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE |
40 | regr.h2o.deeplearning | h2o.deeplearning | h2o.dl | h2o | regr | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | |
41 | regr.h2o.gbm | h2o.gbm | h2o.gbm | h2o | 'distribution' is set automatically to 'gaussian'. | regr | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE |
42 | regr.h2o.glm | h2o.glm | h2o.glm | h2o | 'family' is always set to 'gaussian'. | regr | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE |
43 | regr.h2o.randomForest | h2o.randomForest | h2o.rf | h2o | regr | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | |
44 | regr.ksvm | Support Vector Machines | ksvm | kernlab | Kernel parameters have to be passed directly and not by using the `kpar` list in `ksvm`. Note that `fit` has been set to `FALSE` by default for speed. | regr | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE |
45 | regr.lm | Simple Linear Regression | lm | stats | regr | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE | |
46 | regr.mob | Model-based Recursive Partitioning Yielding a Tree with Fitted Models Associated with each Terminal Node | mob | party | regr | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | |
47 | regr.nnet | Neural Network | nnet | nnet | `size` has been set to `3` by default. | regr | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE |
48 | regr.randomForest | Random Forest | rf | randomForest | See `?regr.randomForest` for information about se estimation. Note that the rf can freeze the R process if trained on a task with 1 feature which is constant. This can happen in feature forward selection, also due to resampling, and you need to remove such features with removeConstantFeatures. | regr | TRUE | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE |
49 | regr.rpart | Decision Tree | rpart | rpart | `xval` has been set to `0` by default for speed. | regr | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE |
50 | regr.rvm | Relevance Vector Machine | rvm | kernlab | Kernel parameters have to be passed directly and not by using the `kpar` list in `rvm`. Note that `fit` has been set to `FALSE` by default for speed. | regr | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE |
51 | regr.svm | Support Vector Machines (libsvm) | svm | e1071 | regr | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | |
52 | regr.xgboost | eXtreme Gradient Boosting | xgboost | xgboost | All settings are passed directly, rather than through `xgboost`'s `params` argument. `nrounds` has been set to `1` by default. | regr | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE |
53 | surv.cforest | Random Forest based on Conditional Inference Trees | crf | party,survival | See `?ctree_control` for possible breakage for nominal features with missingness. | surv | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE |
54 | surv.coxph | Cox Proportional Hazard Model | coxph | survival | surv | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE | |
55 | surv.cvglmnet | GLM with Regularization (Cross Validated Lambda) | cvglmnet | glmnet | Factors automatically get converted to dummy columns, ordered factors to integer. | surv | TRUE | TRUE | TRUE | TRUE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE |
56 | surv.glmnet | GLM with Regularization | glmnet | glmnet | Factors automatically get converted to dummy columns, ordered factors to integer. Parameter `s` (value of the regularization parameter used for predictions) is set to `0.1` by default, but needs to be tuned by the user. | surv | TRUE | TRUE | TRUE | TRUE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE |
57 | surv.rpart | Survival Tree | rpart | rpart | `xval` has been set to `0` by default for speed. | surv | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE |
二、ML在python+R的互查