Tuesday, December 8, 2020

Predicting Bitcoin Prices Using ML.Net and Time Series Techniques

Obligatory Machine Learning Stock Predictor 

It seems that anyone who starts to learn machine learning and analyzing big data thinks they can predict patterns in something as volatile as the stock market. Once you go down the rabbit hole it is easy to see that things are not random, we just need enough data so it is easy to see the attraction to trying to predicting stock prices.

Unfortunately, there are many external factors that data cannot predict. This was very clear when the 2020 pandemic hit, and stock prices tanked. Even so, trying is fun and helps us understand concepts. 

I am not a stock guru, and everything is this post is purely for learning purposes of ML.Net and how to use it for the time series function. Please use the code at your own risk for anything beyond learning how to create a time series model.

What is a Time Series Model?

Time Series in machine learning is trying to predict out over several periods. This is different from Regression which predicts the next period from a series. Some of the more common examples are housing prices, gas prices, and sales predictions. In big data analytics we can use linear regression to plot thousands of points and find the average over those points to create a line representing our answer. This is a good route, if you have two values, for example price/date you can create a nice graph that can represent sales.


Data prep

Before getting into the coding, data prep is key here. We are predicting items over time, so you want to know a time frame. Sales data for example, you usually compare year over year which is last year's sales vs this year's sales. We would use our time series model to predict the next 3 years sales or So you would break your data into yearly chunks usually daily prices. 


Create NLP FAQ Application in 5 Minutes

Data


With our Bitcoin example we will be breaking our data into daily changes in price with 1 minute increments. We will be using the btc.csv you can find here on my github page https://github.com/fiveminutecoder/blogs/blob/master/%20mlnet_BTCTimeSeries/btc.csv. *Update, it was found that this file was not consistent in it's time stamps so the code has been updated to use the following data set from Kaggle Bitcoin Historical Data. It is too large to add to the Git Hub site, this dataset was loaded by Zielak.

Our dataset has multiple columns, but when forecasting we are tracking a value over time, so only price will be used.

Creating the project

If you have not read my previous blog "Getting Started with ML.NET", which can be found here https://fiveminutecoder.blogspot.com/2020/07/getting-started-with-mlnet.html, please do so before continuing.

For this project we will continue to use the Microsoft.ML nuget package, but we will also need to add the Microsoft.ML.TimeSeries nuget package to get the forecasting estimator.

Once you have created a project and installed the Nuget packages, we can go ahead and create our data models to be used for our training data and our predictions. 

The first model is the BTCDataModel. It contains our pricing and timestamps we will use for training.


using Microsoft.ML.Data;

namespace mlnet_BTCTimeSeries
{
    public class BTCDataModel
    {
        [LoadColumn(0)]
        public int TimeStamp {get;set;}

        [LoadColumn(1)]
        public float Open {get;set;}
        [LoadColumn(2)]
        public float High {get;set;}
        [LoadColumn(3)]
        public float Low {get;set;}
        [LoadColumn(4)]
        public float Close {get;set;}

        [LoadColumn(5)]
        public float Volume {get;set;}
        [LoadColumn(6)]
        public float Currency {get;set;}

        [LoadColumn(7)]
        public float Amount {get;set;}
    }
}

Once we have created our training model, let's create our prediction model. The time series prediction model is slightly different than our supervised learning model. Instead of an array of confidence scores, we will have our prediction with an upper and lower bounds for accuracy.


using Microsoft.ML.Data;

namespace mlnet_BTCTimeSeries
{
    public class PredictedSeriesDataModel
    {
        public float[] ForecastedPrice { get; set; }
        public float[] ConfidenceLowerBound { get; set; }
        public float[] ConfidenceUpperBound { get; set; }
    }
}

Now that our models are setup, we can go ahead and add our context and training dataset place holders. We will set these globally so we can easily access them in our functions


using System;
using System.Linq;
using System.IO;
using Microsoft.ML;
using Microsoft.ML.Transforms.TimeSeries;
using System.Collections.Generic;

...

static MLContext mlContext = new MLContext();
//last time to show the times of our future predictions
static DateTime lastTime; 
static List trainingData;
static List testingData;
static string fileName = "btcModel.zip";
//how far out we want to predict
static int horizon = 5; 
//holds or in memory model
static ITransformer forecastTransformer; 

The first step or our project is to create our training and testing data, our testing data will be the size of our horizon field since that is the number we are trying to forecast, everything else will be used for training. We will also capture the last timestamp from the training model, this is used to show the next 5 predictions match the time from our test data.


static void GetTrainingData()
{
	//load our dataset
	IDataView trainingDataFile = mlContext.Data.LoadFromTextFile("bitstampUSD_1-min_data_2012-01-01_to_2020-12-31.csv", hasHeader: true, separatorChar: ',');

	//create enumerable to manipulate data
	List data = mlContext.Data.CreateEnumerable(trainingDataFile, false, true).ToList();

	//times in the data set are not uniform, so we will pull unique time values
	data = data.OrderBy( o => o.TimeStamp).ToList();

	//determines the size of our testing data
	int dataSubset = data.Count() - horizon;

	//create our training data up to the dates we are trying to predict
	trainingData = data.GetRange(0, dataSubset).ToList(); 

	// will get the number of items we are trying to predict
	testingData = data.GetRange(dataSubset, horizon);
	
	//We want to capture time of last item in training data so we can increment the time stamp for our output and put a date/time to the forecast
	lastTime = ConvertTimeStamp(trainingData.Last().TimeStamp);
}

//helper for converting timestamp to date time
static DateTime ConvertTimeStamp(double TimeStamp)
{
	var offset = TimeSpan.FromSeconds(TimeStamp);
	DateTime startTime = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc);
	return startTime.Add(offset).ToLocalTime();
}

With our data sets created, we can move on to creating our estimator. Unlike the supervised learning estimator, there is very little data manipulation. With the time series we just need to input the property names from our models to the appropriate columns. For this we will "nameof" to get our property names instead of labeling each property. There are a lot of settings in the forecasting estimator, the 3 main settings to worry about are "windowSize", "seriesLength", and "horizon". Window size is the periods on which our data is to reflect, series is the timespan of those windows, and finally horizon is the forecasting we want to produce or how far out we want to predict. 


static void TrainModel()
{
	IDataView trainingDataView = mlContext.Data.LoadFromEnumerable(trainingData);

	// creates our estimater, as you cans see we are using forecasting estimator
	var estimator = mlContext.Forecasting.ForecastBySsa(outputColumnName: nameof(PredictedSeriesDataModel.ForecastedPrice),
					inputColumnName: nameof(BTCDataModel.Amount), //column used for time series prediction
					windowSize: 60, //series is sampled in 60 minute windows or periods, and the past 60 minutes will be used to make the prediction
					seriesLength: 1440, //we want to train over a day's worth of time so this will be the interval, we have 1440 minutes in a day
					trainSize: trainingData.Count(), //how many data points we want to sample
					horizon: horizon,
					confidenceLevel: 0.45f, //sets our margin of error, lower the confidence level the smaller the upper/lower bounds
					confidenceLowerBoundColumn: nameof(PredictedSeriesDataModel.ConfidenceLowerBound),
					confidenceUpperBoundColumn: nameof(PredictedSeriesDataModel.ConfidenceUpperBound)
	);

	//creates our fitted model
	forecastTransformer = estimator.Fit(trainingDataView); 
}

One thing to notice in our trainer is that we are not calling save yet, this is will happen later in our predict function since time series data needs updating to stay relevant, we save a little differently. Our data is now fitted to our estimator, we can now predict our future Bitcoin prices.


static void Predict()
{
	//prediction engine based on our fitted model
	 TimeSeriesPredictionEngine forecastEngine = forecastTransformer.CreateTimeSeriesEngine(mlContext);

	//call to predict the next 5 minutes
	 PredictedSeriesDataModel predictions = forecastEngine.Predict();

	 //write our predictions
	 for(int i = 0; i < predictions.ForecastedPrice.Count(); i++)
	{   
		lastTime = lastTime.AddMinutes(1);
		Console.WriteLine("{0} price: {1}, low: {2}, high: {3}, actual: {4}", lastTime, predictions.ForecastedPrice[i].ToString(), predictions.ConfidenceLowerBound[i].ToString(), predictions.ConfidenceUpperBound[i].ToString(), testingData[i].Amount);
	}


	//instead of saving, we use checkpoint. This allows us to continue training with updated data and not need to keep such a large data set
	//so we can append Jan. 2021 without having everythign before to train the model speeding up the process
	forecastEngine.CheckPoint(mlContext, fileName);
}

The predict function is very similar to the rest of our predict functions, we visualize our data a bit different since we want to see all our our values not just the most accurate. At the end of this function we called "CheckPoint" on our forecastEngine. This creates a saved model that we can continue to train and add data points without retraining the entire model.


With our training and prediction functions complete, all that is left is calling them and viewing our results.


static void Main(string[] args)
{
	GetTrainingData();
	TrainModel();
	Predict();
}

Results

As we can see, the forecaster is fairly accurate predicting the next minute's price (off by about $5) but unfortunately our model predicted a downward tend instead of an upward one as we see comparing to our actual results. This just shows price history alone is not enough to predict the stock market.



Clone the project


You can find the full project on my GitHub site here https://github.com/fiveminutecoder/blogs/tree/master/%20mlnet_BTCTimeSeries
AI, Artificial Intelligence, BTC, BitCoin, C#, Time Series, dotnet, dot net, machine learning, mldotnet, ml.net, dotnet core, dotnet 5, .NET 5
C#, C sharp, machine learning, ML.NET, dotnet core, dotnet, O365, Office 365, developer, development, Azure, Supervised Learning, Unsupervised Learning, NLP, Natural Language Programming, Microsoft, SharePoint, Teams, custom software development, sharepoint specialist, chat GPT,artificial intelligence, AI

Cookie Alert

This blog was created and hosted using Google's platform Blogspot (blogger.com). In accordance to privacy policy and GDPR please note the following: Third party vendors, including Google, use cookies to serve ads based on a user's prior visits to your website or other websites. Google's use of advertising cookies enables it and its partners to serve ads to your users based on their visit to your sites and/or other sites on the Internet. Users may opt out of personalized advertising by visiting Ads Settings. (Alternatively, you can opt out of a third-party vendor's use of cookies for personalized advertising by visiting www.aboutads.info.) Google analytics is also used, for more details please refer to Google Analytics privacy policy here: Google Analytics Privacy Policy Any information collected or given during sign up or sign is through Google's blogger platform and is stored by Google. The only Information collected outside of Google's platform is consent that the site uses cookies.