Five Minute Coder: October 2020

What is ML.NET

ML.NET is a dot net based machine learning language created by Microsoft. It allows us to use C# to quickly create various machine learning algorithms using built in training methods. ML.NET also has a way to extend to the language to tap into other machine learning platforms such as TensorFlow for actions that are not yet supported by ML.NET.

What is supervised learning?

Supervised learning is when we train a model with known labels for our data. The learning is supervised because we are able to give the training algorithm the correct answer for what the data represents. When training a real model, you will want a large data set representing different scenarios for your model.

Create a Supervised Learning Model in about 5 minutes.

The Data Set

For this example, we will be using the Iris Flower Species data set which can be found on the Kaggle website here https://www.kaggle.com/uciml/iris.

Create the Project

Once you have downloaded the data set, we need to create the project. Since ML.NET is so new it is worth noting that this article was written using version 1.51 and dot net core 3.1. As machine learning evolves some of these techniques may change.

To start, create a new dot net core console application called "mlnet_intro"

        dotnet new console –-name “mlnet_intro”

Now that we have our new project make sure you have the folder open, and add the nuget package "Microsoft.ML". If you are using VSCode, use CTRL+SHIFT+P to search for the package.

Data Models

We now have all the necessary components to start creating our supervised learning application. We will need 2 data models for our model one representing the Iris being fed into the model, one for displaying results. We will create our Iris model aptly named "IrisModel".



	using Microsoft.ML.Data;

        namespace mlnet_intro
        {
            public class IrisModel
            {
                [ColumnName("Id"), LoadColumn(0)]
                public int Id {get;set;}
                [ColumnName("SepalLengthCm"), LoadColumn(1)]
                public float SepalLengthCm {get;set;}
                [ColumnName("SepalWidthCm"), LoadColumn(2)]
                public float SepalWidthCm {get;set;}
                [ColumnName("PetalLengthCm"), LoadColumn(3)]
                public float PetalLengthCm {get;set;}
                [ColumnName("PetalWidthCm"), LoadColumn(4)]
                public float PetalWidthCm {get;set;}
                [ColumnName("Species"), LoadColumn(5)]
                public string Species {get;set;}

            }
        }

Notice that we have attributes for ColumnName and LoadColumn which come from the using statement Microsoft.ML.Data. LoadColumn is the column found in our CSV, Column name is how we will refer to when training our model. This is important to remember so that our label is not part of the data being trained, in this case the column named "Species" is our label.

Next, we need to create our prediction model called "PredectionModel". Again we will have an attribute called "ColumnName" so we can map the model to our training output. Predicted Species will represent the label, and Score is the confidence levels for each label.


        using Microsoft.ML.Data;

        namespace mlnet_intro
        {
            public class PredictionModel
            {
                [ColumnName("PredictedSpecies")]
                public string PredictedSpecies {get;set;}
                [ColumnName("Score")]
                public float[] Score {get;set;}
            }
        }

Create the Iris Prediction Application

Now that we have our two models created, we can create our application that will train our AI model for predicting Iris species. In the Program.cs file we will need to create some fields for holding our model context, along with referencing the Microsoft.ML namespace. Typically this would be a separate class, but we are getting close to 5 minutes.


    
        using System;
        using System.IO;
        using System.Linq;
        using System.Collections.Generic;
        using Microsoft.ML;
        
        
        static MLContext context;
        //model for training/testing
        static Microsoft.ML.Data.TransformerChain model;
        static IEnumerable trainingData;
        static IEnumerable testingData;
        
        static string fileName = "irisModel.zip";

With our global variables defined, the next thing we must do is train our model. In order to do that we must load our csv data, then we will split the data into training and testing data. We then need to tell our training model the columns used to represent our features and our labels, and select a training method. In this case we will use the multiclass classification trainer. Finally we want to map the predicted value back to our prediction model.


        static void TrainModel()
        {
            
            //Load data from csv file
            var data = context.Data.LoadFromTextFile("datasets_19_420_Iris.csv", hasHeader:true, separatorChar: ',', allowQuoting: true, allowSparse:true, trimWhitespace: true);
            
            //Splits data into training and testing data
            //Id is the unique key to keep labels from duplicating
            var split = context.Data.TrainTestSplit(data);
            
             
            //create data sets for trainiing and testing
            trainingData = context.Data.CreateEnumerable(split.TrainSet, reuseRowObject: false);
            testingData = context.Data.CreateEnumerable(split.TestSet, reuseRowObject: false);


            //Create our pipeline and set our training model
            var pipeline = context.Transforms.Conversion.MapValueToKey(outputColumnName: "Label", "Species") //converts string to key value for training
                .Append(context.Transforms.Concatenate("Features", new[]{"SepalLengthCm", "SepalWidthCm", "PetalLengthCm", "PetalWidthCm"})) //identifies training data from model
                .Append(context.MulticlassClassification.Trainers.SdcaMaximumEntropy(labelColumnName: "Label", featureColumnName: "Features")) //set trainer and identifies features and label
                .Append(context.Transforms.Conversion.MapKeyToValue(outputColumnName: "PredictedSpecies", inputColumnName: "PredictedLabel")); //convert prediction to string PredictedLabel is output label key for predict

            //traings the model
             model = pipeline.Fit(context.Data.LoadFromEnumerable(trainingData));



        }

In the training method, the main thing to note are the two lines "MapValueToKey" and "MapKeyToValue". What this is doing is taking our string for our label and creating a key value. This will allow our prediction model to return a string value for the Iris name instead of the numeric value.

Now that our model is trained, we want to test it against our test data and check it's accuracy. ML.Net has this build into the training model.


        static void TestModel()
        {
            //transform data to a view that can be evaluated
            IDataView testDataPredictions = model.Transform(context.Data.LoadFromEnumerable(testingData));
            //evaluate test data against trained model for accuracy
            var metrics = context.MulticlassClassification.Evaluate(testDataPredictions);
            double accuracy = metrics.MicroAccuracy;

            Console.WriteLine("Accuracy {0}", accuracy.ToString());

        }

Accuracy may vary on this since it is a small dataset, this is for learning so we are not too concerned. Next we will save and load the model to and from a file. This is helpful for re using your model in web applications or other services.


	static void SaveModel()
        {
            IDataView dataView = context.Data.LoadFromEnumerable(trainingData);
           context.Model.Save(model, dataView.Schema, fileName);
        }

        static ITransformer LoadModel()
        {
            DataViewSchema modelSchema;
            //gets a file from a stream, and loads it
            using(Stream s = File.Open(fileName, FileMode.Open))
            {
                return context.Model.Load(s, out modelSchema);

                
            }
         }

Finally, we can now use our newly saved model to predict Iris.


	static void Predict(IrisModel iris)
        {
            ITransformer trainedModel = LoadModel();

            //Creates prediction function from loaded model, you can load in memory model as well
             var predictFunction = context.Model.CreatePredictionEngine(trainedModel);
             
            //pass model to function to get prediction outputs
            PredictionModel prediction = predictFunction.Predict(iris);

            //get score, score is an array and the max score will align to key.
            float score = prediction.Score.Max();
        
            Console.WriteLine("Prediction: {0},  accuracy: {1}", prediction.PredictedSpecies, score);

        }

Our AI setup is complete, we just need to call our newly created methods and see the results. In my example I feed the species as "hello". This is to demonstrate that the model did not cheat and use the label as a feature.


	static void Main(string[] args)
        {
        	context = new MLContext();
            Console.WriteLine("Training Iris Model");
            TrainModel();
            Console.WriteLine("Testing Iris Model");
            TestModel();
            SaveModel();

            IrisModel test = new IrisModel(){
                    SepalLengthCm = 5.2f,
                    SepalWidthCm = 3.5f,
                    PetalLengthCm = 1.4f,
                    PetalWidthCm = 0.2f,
                    Species = "hello"
                };

            Predict(test);

            Console.Read();
        }

Clone the project

you can find the full project on my GitHub site here https://github.com/fiveminutecoder/blogs/tree/master/mlnet_intro

Five Minute Coder

Pages

Tuesday, October 6, 2020

Get Started with ML.NET in 5 Minutes