What is Natural Language Processing?
Natural language processing, or NLP, is taking text and and converting it to something your application can use. What we are expecting is for someone to type in a word or sentence and the application is able to understand and process the command. The challenge here is not everyone communicates the same way for example:
- Please save document.
- Save document.
- Update Document.
- I need my document to be put into my accounting folder.
Create NLP FAQ application in 5 minutes
The Data
Creating the project
If you have not read my previous blog "Getting Started with ML.NET", which can be found here
https://fiveminutecoder.blogspot.com/2020/07/getting-started-with-mlnet.html please do so before continuing since we will be referencing back to it frequently. Following our previous example, we will create a new console application called "mlnet_NLP". Once the project is created we will download the "Microsoft.ML" package from Nuget.
Once you have the project setup we will need to create our two data models, one for the input features of the FAQ (our question and answer), and one for our predictions.
public class FAQModel
{
[ColumnName("Question"), LoadColumn(0)]
public string Question {get;set;}
[ColumnName("Answer"), LoadColumn(1)]
public string Answer {get;set;}
}
public class PredictionModel
{
[ColumnName("PredictedAnswer")]
public string PredictedAnswer {get;set;}
[ColumnName("Score")]
public float[] Score {get;set;}
}
Once we have our data models setup, we need to setup our program. We will use the same 5 functions from our pervious example which will train, test, predict, save, and load our machine learning model. Before we begin, we need to add our references and fields that will hold our context and data. Again, this should be a separate class, but for the sake of time we will do all this in our program.cs.
//Main context
static MLContext context;
//model for training/testing
static Microsoft.ML.Data.TransformerChain model;
static IEnumerable trainingData;
static IEnumerable testingData;
static string fileName = "FAQModel.zip";
Our training function will be very similar to the previous one, with the exception to how to create our features. Instead of our features being several columns, we have 1 column with several words. Luckily ML.Net has a function that lets us featurize text making it a quick swap of our previous concatenate features line. The FeaturizeText method is very powerful and performs several operations under the hood, like remove stop words like the, and, or, etc. To learn more, visit Microsoft's documentation around preparing data
https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/prepare-data-ml-net
static void TrainModel()
{
context = new MLContext();
//Load data from csv file
var data = context.Data.LoadFromTextFile("faq.csv", hasHeader:true, separatorChar: ',', allowQuoting: true, allowSparse:true, trimWhitespace: true);
//create data sets for trainiing and testing
trainingData = context.Data.CreateEnumerable(data, reuseRowObject: false);
testingData = new List()
{
new FAQModel() {Question = "When are you open?", Answer = "Our hours are 9 am to 5pm Monday through Friday"},
new FAQModel() {Question = "Can i pay using a visa card?", Answer = "Our payment options are Credit, Check, or Bitcoin"},
new FAQModel() {Question = "How can i contact you.", Answer = "Our phone number is 555-5555 and our fax is 555-5557"}
};
//Create our pipeline and set our training model
var pipeline = context.Transforms.Conversion.MapValueToKey(outputColumnName: "Label", inputColumnName: "Answer") //converts string to key value for training
.Append(context.Transforms.Text.FeaturizeText( "Features","Question")) //creates features from our text string
.Append(context.Transforms.Text.f)
.Append(context.MulticlassClassification.Trainers.SdcaMaximumEntropy(labelColumnName: "Label", featureColumnName: "Features"))//set up our model
.Append(context.Transforms.Conversion.MapKeyToValue(outputColumnName: "PredictedAnswer", inputColumnName: "PredictedLabel")); //convert our key back to a label
//traings the model
model = pipeline.Fit(context.Data.LoadFromEnumerable(trainingData));
}
Now that our training method is setup, we want to test our model. This FAQ is too small to break up, so the accuracy will return as 0. To remedy this, I manually added a couple tests to our enumerable.
static void TestModel()
{
//transform data to a view that can be evaluated
IDataView testDataPredictions = model.Transform(context.Data.LoadFromEnumerable(testingData));
//evaluate test data against trained model for accuracy
var metrics = context.MulticlassClassification.Evaluate(testDataPredictions);
double accuracy = metrics.MacroAccuracy;
Console.WriteLine("Accuracy {0}", accuracy.ToString());
}
Now that our model is trained, we will save it so it can be loaded in our prediction engine.
static void SaveModel()
{
IDataView dataView = context.Data.LoadFromEnumerable(trainingData);
context.Model.Save(model, dataView.Schema, fileName);
}
static ITransformer LoadModel()
{
DataViewSchema modelSchema;
//gets a file from a stream, and loads it
using(Stream s = File.Open(fileName, FileMode.Open))
{
return context.Model.Load(s, out modelSchema);
}
}
Now we can setup our prediction engine. Again this is exactly how we set it up in the previous example. Our NLP uses a multiclass supervised learning model so predicting our answer is handled the same; pass our question in, and the machine learning algorithm will spit out an answer.
static void Predict(FAQModel Question)
{
ITransformer trainedModel = LoadModel();
//Creates prediction function from loaded model, you can load in memory model as wwell
var predictFunction = context.Model.CreatePredictionEngine(trainedModel);
//pass model to function to get prediction outputs
PredictionModel prediction = predictFunction.Predict(Question);
//get score, score is an array and the max score will align to key.
float score = prediction.Score.Max();
Console.WriteLine("Prediction: {0}, accuracy: {1}", prediction.PredictedAnswer, score);
}
Now that we have our functions setup, we can call them in the static main function and start answering questions.
static void Main(string[] args)
{
TrainModel();
TestModel();
SaveModel();
FAQModel question = new FAQModel(){
Question = "can i Pay online?",
Answer = ""
};
Predict(question);
}
AI, Artificial Intelligence, C#, NLP, Natural Language Processing, supervised learning, dotnet, dot net, machine learning, mldotnet, ml.net, dotnet core, dotnet 5, .NET 5