学习ML.NET(1): 构建流水线

ML.NET使用LearningPipeline类定义执行期望的机器学习任务所需的步骤,让机器学习的流程变得直观。

下面用鸢尾花瓣预测快速入门的示例代码讲解流水线是如何工作的。

using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Runtime.Api;
using Microsoft.ML.Trainers;
using Microsoft.ML.Transforms;
using System;

namespace myApp
{
    class Program
    {
        // STEP 1: Define your data structures

        // IrisData is used to provide training data, and as
        // input for prediction operations
        // - First 4 properties are inputs/features used to predict the label
        // - Label is what you are predicting, and is only set when training
        public class IrisData
        {
            [Column("0")]
            public float SepalLength;

            [Column("1")]
            public float SepalWidth;

            [Column("2")]
            public float PetalLength;

            [Column("3")]
            public float PetalWidth;

            [Column("4")]
            [ColumnName("Label")]
            public string Label;
        }

        // IrisPrediction is the result returned from prediction operations
        public class IrisPrediction
        {
            [ColumnName("PredictedLabel")]
            public string PredictedLabels;
        }

        static void Main(string[] args)
        {
            // STEP 2: Create a pipeline and load your data
            var pipeline = new LearningPipeline();

            // If working in Visual Studio, make sure the 'Copy to Output Directory'
            // property of iris-data.txt is set to 'Copy always'
            string dataPath = "iris-data.txt";
            pipeline.Add(new TextLoader(dataPath).CreateFrom<IrisData>(separator: ','));

            // STEP 3: Transform your data
            // Assign numeric values to text in the "Label" column, because only
            // numbers can be processed during model training
            pipeline.Add(new Dictionarizer("Label"));

            // Puts all features into a vector
            pipeline.Add(new ColumnConcatenator("Features", "SepalLength", "SepalWidth", "PetalLength", "PetalWidth"));

            // STEP 4: Add learner
            // Add a learning algorithm to the pipeline.
            // This is a classification scenario (What type of iris is this?)
            pipeline.Add(new StochasticDualCoordinateAscentClassifier());

            // Convert the Label back into original text (after converting to number in step 3)
            pipeline.Add(new PredictedLabelColumnOriginalValueConverter() { PredictedLabelColumn = "PredictedLabel" });

            // STEP 5: Train your model based on the data set
            var model = pipeline.Train<IrisData, IrisPrediction>();

            // STEP 6: Use your model to make a prediction
            // You can change these numbers to test different predictions
            var prediction = model.Predict(new IrisData()
            {
                SepalLength = 3.3f,
                SepalWidth = 1.6f,
                PetalLength = 0.2f,
                PetalWidth = 5.1f,
            });

            Console.WriteLine($"Predicted flower type is: {prediction.PredictedLabels}");
        }
    }
}

创建工作流实例

首先,创建LearningPipeline实例

var pipeline = new LearningPipeline();

添加步骤

然后,调用LearningPipeline实例的Add方法向流水线添加步骤,每个步骤都继承自ILearningPipelineItem接口。

一个基本的工作流包括以下几个步骤,其中,蓝色部分是可选的。

学习ML.NET(1): 构建流水线

  • 加载数据集

继承自ILearningPipelineLoader接口。

一个工作流必须包含至少1个加载数据集步骤。

//使用TextLoader加载数据
string dataPath = "iris-data.txt";
pipeline.Add(new TextLoader(dataPath).CreateFrom<IrisData>(separator: ','));
  • 数据预处理

继承自CommonInputs.ITransformInput接口。

一个工作流可以包含0到多个数据预处理步骤,用于将已加载的数据集标准化,示例代码中就包含2了个数据预处理步骤。

//由于Label文本数据,算法不能识别数据,需要将其转换为字典
pipeline.Add(new Dictionarizer("Label")); 

//算法只能从Features列获取数据,需要数据中的多列连接到Features列中
pipeline.Add(new ColumnConcatenator("Features", "SepalLength", "SepalWidth", "PetalLength", "PetalWidth"));
  • 选择学习算法

继承自CommonInputs.ITrainerInput接口。

一个工作流必须且只能包含1个学习算法。

//使用线性分类器
pipeline.Add(new StochasticDualCoordinateAscentClassifier());
  • 标签转换

继承自CommonInputs.ITransformInput接口。

一个工作流可以包含0到多个标签转换步骤,用于将预测得到的标签转换成方便识别的数据。

//将Label从字典转换成文本数据
pipeline.Add(new PredictedLabelColumnOriginalValueConverter() { PredictedLabelColumn = "PredictedLabel" });

执行工作流

最后,调用LearningPipeline实例的Train方法,就可以执行工作流得到预测模型。

var model = pipeline.Train<IrisData, IrisPrediction>();
上一篇:java String 怎么看里面有几个指定字符


下一篇:python编程 之 PyMysql包接口,python中如何使用数据库