Tuesday, January 5, 2021

Create an Azure Document Queue for Loading and Tagging SharePoint Documents - Part 1

 

Why do we need a document queue?

SharePoint Online limitations


SharePoint is not known for being the most developer friendly platform, in fact the Stack Overflow survey indicates it being the most dreaded platform by developers https://insights.stackoverflow.com/survey/2018 . With so many companies moving to O365 and SharePoint online, developers have had even more struggles designing custom solutions on top of SharePoint.

One of the most frustrating of those is the dreaded 429 error, or when SharePoint throttles you. It is expected that using a software as a service product will come with some throttling. Microsoft does not want us bombarding their servers with requests, but there is no clear point on what throttles their system. This is a guess for all, and Microsoft determines this at random times. The best way to reduce throttling is to limit your calls. When dealing with list items this can be done by batching requests either using the REST or Graph API batch calls, but uploading documents does not have this luxury.  

Documents require several steps to upload a document with it's metadata which makes it very prone to throttling if you have several documents to upload. Looking at a typical CSOM call for a SharePoint upload we see about 5 calls  to upload and tag a document:

1. Call to get the list
2. Call to add document
3. Call to get the document after uploaded for list item properties
4. update the list item properties.
5. call to check in document

We can reduce these calls down to 3 using the REST API, but it still requires more calls than a list item. Uploading 1 or 2 documents at a time will not throttle your requests, but what if you need to migrate a shared drive, or you have an application that compiles refunds into a document that can be sent back to your customers? This is when throttling will begin. The solution: limit the number of calls to SharePoint. From experience, it seems 30 documents a minute is the maximum before you start to get throttled, this of course depends on how many total documents you are uploading. 30 documents once a day is fine, but if you are doing batches of 500, 30 documents continuously over a 20 minute period will start to throttle your user account. 

Getting performance out of my SharePoint app while limiting calls to SharePoint.


Knowing I need to make at least 3 calls a document how can I create an app that is performant but does not cause a system to throttle? The answer is Azure. Azure does have throttling limits but they are significantly higher thresholds than SharePoint. Using a storage account, we can create an intermediary between our application and SharePoint. Blob storage is fairly cheap in Azure. At the time of this writing, Azure charges for the average storage used not the total. What this means if  I am storing 2 GB worth of documents at a time I am charged for 2GB worth of storage. Even that means I load 2 GB's a day and remove it. 

A storage account also has several other features like creating a queue which can be used to hold our metadata. We can push and pop into the queue to reference our document,  and metadata by creating a simple JSON string. 

Once our documents are in Azure, we don't care how long it takes. Our users can continue on while we trickle load our documents into SharePoint using a web job.

Creating a SharePoint migration tool in 5 minutes - Part 1

Since our application is going to be split up into two parts, this post will be split up into two parts as well. Part one will cover creating a tool that will upload our documents to Azure. While Part 2 will cover the job that uploads and tags our documents in SharePoint.

The Data

Our data is simple, we are going to assume we have a file share that is storing our customer records. Our folder structure will be setup as so:

\\customers\{state}\{city}\{account number}

With this file folder setup I can extract my tags for state, city and account number for my documents.


Setup Azure Storage Account

If you do not have an Azure account please create a free one before continuing. To get started click Create Resource and search for "Storage Account". Create the resource and follow the prompts to create a new resource group, and storage account name. For the purposes of this blog, you can leave the rest as the default values.



Once our account is created we can see the options for blob and queues.



First click into containers, and at the top click "+ Container". We will name our container "customer". Repeat the same step for Queue but click "+ Queue" and name it "customer". Azure is now ready for us to start loading documents.

In order to authenticate to Azure, we will need a connection string. In the left hand navigation find Access Keys. Click into Access Keys and select "Show Keys" to expose your connection string. Once we have our connection string we can setup our application.




Creating the Project

For our migration tool, we will create 2 project. The first project covered in this blog is a console application that will run on our local machine and upload the documents to Azure. Our second project, will be a web job that runs in Azure and will continuously look for objects in the queue and try to upload them to SharePoint.

Let us start off by creating our first project called "SharePointMigration_Client". Once our console app is created we need to install 3 nuget packages (CTRL+ALT+P in VSCode). The 3 packages we need are:

  • Azure.Storage.Queues 
  • Azure.Storage.Blobs
  • Newtonsoft.JSON

The firsts two will give us references for accessing azure storage, and Newtonsoft is just a convienent way to serialize our object for the queue. This project will also require a reference System.Threading.Tasks to allow for async calls to Azure storage. Now that are project and references are created, we need to add a new class object to represent our SharePoint metadata. I called this file simply "CustomerMetadata.cs"


namespace SharePointMigration_Client
{
    public class CustomerMetadata
    {
        public string State {get; set;}
        public string City {get;set;}
        public string AccountNumber {get;set;}
        public string FileName {get;set;}
    }
}

Now that we have our customer object, we can define to properties; one for our queue, and another for our connection string to Azure. I define the queue as a property just so we can open the queue once, the blob container needs to be opened per document. The connection string is the same for our blob container and our queue so referecing it as property allows me to define it once and move on.


//making this a property so we do not have to keep recreating the object
static QueueClient queue; 
//connection string to our storage account, it is shared between blobs and queues.
static string cs = "Connection string from Azure";

The first function to define is our directory reader. This is a recursive function that will read all of the documents and sub directories found in our root directory "customers". When it finds a file it will upload it to Azure otherwise it will continue onto the next directory.


//Recurssive function for iterating directory and sub directories
public static async Task GetFilesInDirectory(string FileDirectory)
{
	Console.WriteLine("Looking for files in " + FileDirectory);
	string[] files = Directory.GetFiles(FileDirectory);

	foreach(string file in files)
	{
		Console.WriteLine(file);
		await UploadFileToAzureBlob(file);
		await UploadFileMetaDataToQueue(file);
	}

	string[] subdirectories = Directory.GetDirectories(FileDirectory);

	foreach(string subdirectory in subdirectories)
	{
		//Recursion for going thorugh all directories
		await GetFilesInDirectory(subdirectory);
	}
}

Once a file is found, we will need to upload the file to Azure. I prefer to upload the file before adding it to our queue since file uploads have more of a chance of breaking due to file size. The call to Azure is pretty straight forward:


public static async Task UploadFileToAzureBlob(string FilePath)
{
	string fileName = Path.GetFileName(FilePath);
	Console.WriteLine("File name {0}", fileName);
	//customer is the name of our blob container where we can view documents in Azure
	//blobs require us to create a connection each time we want to upload a file
	BlobClient blob  = new BlobClient(cs, "customer", fileName); 

	//Gets a file stream to upload to Azure
	using(FileStream stream = File.Open(FilePath, FileMode.Open))
	{
		await blob.UploadAsync(stream);
	}
}

Our file is now loaded into Azure, you can go back to your Azure storage account and view it in our customers container. Next we will upload our metadata. Since this is a five minute tutorial we can assume our directory is setup correctly and we do not need to parse or check our directory for the metadata. We will just split the directory by '\' and reference the position in the array. In a production environment, we would want to check to make sure our array is filled correctly. Also, note in this function I use "HTTPUtility.UrlEncode". I have found that Azure is picky about characters and can break if we do not encode the string. Specifically around non-UTC-8 characters.


public static async Task UploadFileMetaDataToQueue(string FilePath)
{
	string[] metadata = FilePath.Split('\\');

	//we know our metadata position because of our file structure. 
	CustomerMetadata customerMetadata = new CustomerMetadata()
	{
		State = metadata[2],
		City = metadata[3],
		AccountNumber = metadata[4],
		FileName = metadata[5]
	};

	//create a string for our queue
	string data = JsonConvert.SerializeObject(customerMetadata); 
	//We do this to ensure any non UTC-8 characters are safe for the web service
	data = HttpUtility.UrlEncode(data); 

	//adds message to back of the queue
	await queue.SendMessageAsync(data);
}

That's it, we just need to await our file directory sync and watch as files are switfly uploaded to Azure.


static async Task Main(string[] args)
{
	
	//create our queue client
	queue = new QueueClient(cs, "customer"); //customer is the name of our queue where we can view the items uploaded in Azure
	
	await GetFilesInDirectory("c:\\customers");
	Console.Read();
}

Results

After a successful run, you should see your files in your blob and metadata in your queue.

Continue to Part 2

In the next part, we will create a web job that reads the Azure Queue and uploads the documents to SharePoint.


Clone the project

You can find the full project on my GitHub site here https://github.com/fiveminutecoder/blogs/tree/master/SharePointMigration_Client




Azure, SharePoint, O365, SP, C#, Storage, Storage Account, Blob Storage, Azure Blob, Azure Blob Storage, Azure Storage Account, Azure Blob Storage Account, Azure Queue, Azure Queue Account, Queue, Serverless development, devleopment c sharp, 429, throttling, SharePoint Online, Office 365

No comments:

Post a Comment

C#, C sharp, machine learning, ML.NET, dotnet core, dotnet, O365, Office 365, developer, development, Azure, Supervised Learning, Unsupervised Learning, NLP, Natural Language Programming, Microsoft, SharePoint, Teams, custom software development, sharepoint specialist, chat GPT,artificial intelligence, AI

Cookie Alert

This blog was created and hosted using Google's platform Blogspot (blogger.com). In accordance to privacy policy and GDPR please note the following: Third party vendors, including Google, use cookies to serve ads based on a user's prior visits to your website or other websites. Google's use of advertising cookies enables it and its partners to serve ads to your users based on their visit to your sites and/or other sites on the Internet. Users may opt out of personalized advertising by visiting Ads Settings. (Alternatively, you can opt out of a third-party vendor's use of cookies for personalized advertising by visiting www.aboutads.info.) Google analytics is also used, for more details please refer to Google Analytics privacy policy here: Google Analytics Privacy Policy Any information collected or given during sign up or sign is through Google's blogger platform and is stored by Google. The only Information collected outside of Google's platform is consent that the site uses cookies.