标准流程:
- 理解问题:理解问题的核心,相关领域的问题、经验、限制、惯例、内外影响等等。
- Collect input features
- Preprocess:Fillna(fill 0,mean,or by model(eg rf)), Outlier
- Feature engineering:
- Normalize: min-max,z-score,pca,zca
- Transform: square,log,exp,sin,cos,rotate
- Embedding: one-hot, category
- Binning: eg. age 0-14:1 , 14-20:2
- Cross feature: eg. X1*X2
- De-periodic:eg. fft
- TD: y[n] = x[n] -x[n-t]
- Sampling: Uniform, Stratified, Pool, Undersampling, Oversampling,MCMC, Gibbs, SMOTE
- Build Model : DL or ML
- Train: Hyper params(grid search), cross validation
- Validate: Get metrics