Tuesday, December 27, 2022

Auto Tagging Invoices Using Azure AI Cognitive Services in 5 Minutes

In a previous blog post we covered SharePoint Syntex for auto tagging invoices by using a content type, which can be found here SharePoint Syntex in 5 Minutes. Sometimes there needs to be more processing outside of SharePoint before the document can be uploaded or external systems must be accessed for metadata properties. This kind of functionality can become very complex when trying to use a Power App or Flow to accomplish this. Microsoft provides AI services for reading invoices that can be read and then used for the business logic that goes beyond what Syntex can do. These services are a consumption based API in Azure that allows uploading invoices for processing to return the same metadata results that can be found in SharePoint Syntex.

Setting up Azure

1) Create a Cognitive Services Plan

2) Once the cognitive services is created, there is a list of several services including form services. Selecting this will open up the form studio which allows for uploading and reviewing the forms the service will be used for training.

3) Since this is a 5 minute tutorial, I will be using the prebuilt invoice recognizer.

4) Since this is a prebuilt model it comes with several examples already loaded. By clicking the "Analyze" button, the invoice will highlight all the points of interest and assign it metadata. This screen is verry similar to the SharePoint Syntex screen seen in my previous blog SharePoint Syntex in 5 Minutes

5) To make sure this predefined model works for your invoices, select the upload in the top left corner and then upload a sample invoice.

6) Finally, a storage account for the invoice service to access documents must be created. Our invoice service must be able to access the files they must be made available. For this demo, I will be making my blob storage available to the internet. For security reasons DO NOT DO THIS IN PRODUCTION. For a production environment you will want to setup a network for your AI service that is connected to your blob storage for secure access. For details on how to create a storage account, see my pervious blog post Create an Azure Document Queue for Loading and Tagging SharePoint Documents - Part 1

Consuming the service

For this example, I created a WPF app to display our uploaded invoice and its associated properties. To begin, 4 NuGet packages must be installed.

These 2 are needed for the form recognition service, the form recognizer API reading the invoice and the Azure storage blob API for exposing the invoice.


Azure.AI.FormRecognizer
Azure.Storage.Blobs

The other 2 NuGet packages needed are for drawing our invoice. PdfLibCore will be used to convert the PDF into an Image and System.Drawing.Common will be used for drawing the image. It is important to note that this example was done on Windows. System.Drawing may not be Linux/Mac compatible.


System.Drawing.Common
PdfLibCore

The app layout is a simple grid system made up of 3 rows. One for uploading an invoice, the other for displaying it's properties, and then the bottom row for any errors while uploading.


<window height="1000" mc:ignorable="d" title="Five Minute Invoice Tagger" width="1600" x:class="FiveMintueInvoiceTagger.MainWindow" xmlns:d="http://schemas.microsoft.com/expression/blend/2008" xmlns:local="clr-namespace:FiveMintueInvoiceTagger" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation">
    <grid>
        <grid.columndefinitions="">
            <columndefinition width="500"></columndefinition>
            <columndefinition width="1100"></columndefinition>
        </grid>
        <grid.rowdefinitions="">
            <rowdefinition height="50"gt;</rowdefinition>
            <rowdefinition height="750"gt;</rowdefinition>
            <rowdefinition height="750"gt;</rowdefinition>
        </grid>
        <stackpanel grid.column="0" grid.row="0">
        <label content="Select an invoice...">
        <button click="UploadFile_Click" content="Select Invoice">
        </button></label></stackpanel>
        <image grid.column="0" grid.row="1" height="800" name="InvoiceImage" width="450">
    <datagrid grid.column="1" grid.row="1" height="800" name="DocumentProperties" width="1050">
    <label grid.column="0" grid.row="2" name="ErrorMsg">
    </label></datagrid></image></grid> 
</window>

Next, an object is needed to hold our invoice properties for displaying the results. This class has 3 items, the Field's name, the Field's Value, and the confidence score that the API grabbed the right information.


public class InvoiceProperty
{
	//Field name found on invoice
	public string Field {get;set;}
	//Field value
	public string Value {get;set;}
	//How confident AI is that field value is correct
	public string Score {get;set;}
}

References to the NuGet packages must be added to the project, along with some other using statements for displaying the invoice image.


using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
using System.Windows;
using System.Windows.Media.Imaging;
using Azure;
using Azure.AI.FormRecognizer.DocumentAnalysis;
using Azure.Storage.Blobs;
using PdfLibCore;
using PdfLibCore.Enums;

A click event is added to the upload button to grab the invoice and process the request. This method is async so a loading screen should be added. Since this a 5 minute application it has been omitted.


private async void UploadFile_Click(object sender, RoutedEventArgs e)  
{  
	try
	{
		ErrorMsg.Content = "";

		//we only want pdf invoices
		Microsoft.Win32.OpenFileDialog openFileDlg = new Microsoft.Win32.OpenFileDialog(); 
		openFileDlg.Filter = "Pdf Files|*.pdf";
		// Launch OpenFileDialog by calling ShowDialog method
		Nullable result = openFileDlg.ShowDialog();
		// Get the selected file name and display in a TextBox.
		// Load content of file in a TextBlock
		if (result == true)
		{
			//Upload to azure blob so Azure AI can access file
			string invoicePath = await UploadInvoiceForProcessing(openFileDlg.FileName);

			//perform Invoice tagging
			Task> invoicePropertiesTask = GetDocumentProperties(invoicePath);

			//Convert PDF to image so we can view it next to properties
			UpdateInvoiceImage(openFileDlg.FileName);

			//Wait for Azure to return results, set it to our data grid
			DocumentProperties.ItemsSource = await invoicePropertiesTask;

		}
	}
	catch(Exception ex)
	{
		ErrorMsg.Content = ex.Message;
	}
}

In our button event, there are 3 functions called One for uploading the invoice to Azure, one for processing the invoice, and one for converting the image. Our upload function will upload the invoice to Azure Blob Storage to make the invoice available to the Azure Form Recognizer Service. Again, in a production environment make sure your blob storage is not publicly available.


private async Task UploadInvoiceForProcessing(string FilePath)
{
	string cs = "";
	string fileName = System.IO.Path.GetFileName(FilePath);
	Console.WriteLine("File name {0}", fileName);
	//customer is the name of our blob container where we can view documents in Azure
	//blobs require us to create a connection each time we want to upload a file
	BlobClient blob  = new BlobClient(cs, "invoice", fileName); 

	//Gets a file stream to upload to Azure
	using(FileStream stream = File.Open(FilePath, FileMode.Open))
	{
		var blobInfo = await blob.UploadAsync(stream);
		
	}
	
	return "blob base storage url" + fileName;
}

Next the invoice URL is passed to the Form Recognizer Service for processing


private async Task> GetDocumentProperties(string InvoicePath)
{
	
	List invoiceProperties = new List();

	//Endpoint and key found in Azure AI service
	string endpoint = "ai service url";
	string key = "ai service key";
	AzureKeyCredential credential = new AzureKeyCredential(key);
	DocumentAnalysisClient client = new DocumentAnalysisClient(new Uri(endpoint), credential);

	//create Uri for the invoice
	Uri invoiceUri = new Uri(InvoicePath);

	//Analyzes the invoice
	AnalyzeDocumentOperation operation = await client.AnalyzeDocumentFromUriAsync(WaitUntil.Completed, "prebuilt-invoice", invoiceUri);
	AnalyzeResult result = operation.Value;

	//iterate the results and populates list of field values
	for (int i = 0; i < result.Documents.Count; i++)
	{
		AnalyzedDocument document = result.Documents[i];
		foreach(string field in document.Fields.Keys)
		{
			DocumentField documentField = document.Fields[field];
			InvoiceProperty invoiceProperty = new InvoiceProperty()
				{
				  Field = field,
				  Value = documentField.Content,
				  Score = documentField.Confidence?.ToString()
				};

				invoiceProperties.Add(invoiceProperty);
			}
	}

	return invoiceProperties;
}

While the invoice is being processed, the application will convert the PDF to an image to be displayed in the application. The form recognizer service returns references for the PDF to draw the bounding boxes of the data found which could be used to draw onto the image.


 private void UpdateInvoiceImage(string FilePath)
{
	using(var pdf = new PdfDocument(File.Open(FilePath, FileMode.Open)))
	{
		//for this example we only want to show the first page
		if(pdf.Pages.Count > 0)
		{
			var pdfPage = pdf.Pages[0];

			var dpiX= 600D;
			var dpiY = 600D;
			var pageWidth = (int) (dpiX * pdfPage.Size.Width / 72);
			var pageHeight = (int) (dpiY * pdfPage.Size.Height / 72);
		
			var bitmap = new PdfiumBitmap(pageWidth, pageHeight, true);                                

			pdfPage.Render(bitmap, PageOrientations.Normal, RenderingFlags.LcdText);
			BitmapImage image = new BitmapImage();
			image.BeginInit();
			image.StreamSource = bitmap.AsBmpStream(dpiX,dpiY);
			image.EndInit();
			InvoiceImage.Source = image;
		}
		
	}
}

Once this is completed your application will display the invoice with the properties found with an application created in 5 minutes.

To view the full code, please visit the Five Minute Coder GitHub here: Five Minute Invoice Tagger

Friday, October 14, 2022

Getting Started with SharePoint Syntex in 5 Minutes

SharePoint is a tool that empowers business users to setup and design sites with little or no code knowledge. Using tools like SharePoint Designer to create workflows, or the new Power Automate and Power Apps tools, the barriers for creating robust applications have been removed. With everyone looking to incorporate AI into their business, SharePoint has come and provided several low code solutions with their Power Platform tools, such as users can create sentiment analysis tools, language detection, and even text translation apps using the platform. There is still a barrier into the Power Platform that requires some logic to design and query resources, but those with an understanding of Excel formulas should find the process similar.

Any SharePoint architect will tell you that metadata is important for creating well structure search schemas, but sometimes the amount of meta data needed is cumbersome to end users. What if a way to extract and auto tag documents is needed, there must be an easier way than a power app to achieve this, which is where SharePoint Syntex comes in. SharePoint Syntex can be thought of as a content type hub that allows for auto tagging of documents by just uploading the document to a library.

It is important to know before using SharePoint Syntex, you must purchase an additional license for each user using the service and Power Automate credits.

Setting up a Content Center

To begin using SharePoint Syntex, you must setup a content center to hold and host your Syntex content types. Just like the old content type hubs, this is done by creating a new site collection. In the SharePoint admin screen, create a new site collection and select the template "Content Center". If you do not see this option, make sure you have activated the service from the admin portal under setup then activating Automate Content Understanding. On a developer tenant the service is already available, however you will not be able to publish the content type as Microsoft will not sell you the license needed to do that on the developer tenant.

Creating a model

Once the site is created, navigate to the site collection. To create the first understanding model, at the top you have a list of options, selecting the "Document understanding model"

Since we are creating this in 5 minutes, I am going to use one of the preexisting models provided by Microsoft. There are 2 models; Invoices and Receipts. To train our model, we must have samples. The more samples the better the training model, your samples should also include documents that are NOT invoices to make sure it doesn't recognize them. Since we are using the prebuilt model our content type is setup with invoice fields (invoice number, date, amount, etc). To use a custom understanding model you will create columns and then highlight on the document where the data is found for the extractor to learn what to look for. This will be covered in a later blog post.

The screen for our model is pretty straight forward, the first step in the process is to analyze our files. To do this, you will upload the samples mentioned above.

In the analyze section we will see a document library to hold the files used for analyzing and training, only upload the documents that are invoices as the analyzing step is confirming data is being found correctly. Click add at the top upload your documents.

With your samples loaded, highlight the ones you want to analyze and click add.

Now at the bottom of your library, click next to start the analyzation of the documents.

Clicking next will start the analyzing process. Once complete, a screen with the document and the properties found show up. This is where you tell Syntex if it found the correct items and that they should be extracted. clicking each item under extractor Syntex will ask you if it is the correct extractor. Saying yes can happen two ways either selecting yes for each extractor, or clicking the extract check box. clicking no will flag the extractor as wrong so anything that is not found correctly select no from the popup. Once complete hit next at the bottom of the screen

The invoice SharePoint Syntex Extractor is complete. The final step would be to apply the extractor to a library.

To make sure you model works on other documents and does not work on documents it shouldn't additional files can be added to the "Training Files" library, and when we run the extractor we can see the prebuilt model only finds the business name, which to me shows the model is ready for production. if dates or items were found that should not match, more training items are needed.

Monday, August 29, 2022

Understanding Asynchronous vs Parallel development in C#

Why do we need asynchronous or parallel processes in C#?

Before diving into the differences of asynchronous or parallel processing the question needs to be asked "why do we need asynchronous or parallel processing." The goal of any program is to provide fast and consistent results. A basic program performs a task synchronously, meaning the application will go through each step one at a time and in the same order every time, in other words each step is in sync with the next step. As an application grows in complexity having a synchronous application will increase the time it takes to complete the task. Sometimes one of the steps in our task is dependent on the other task completing. Other times and application can run a task in tandem with another task since it does not require the first task to complete. By running 2 tasks at the same time we have created an asynchronous process and allows us to start 2 tasks independently of each other which can result in improved time of completion for the process. Notice I used the word "can". Just because a process runs separately doesn't always equal performance gains. With this trade off we introduce other over heads that will be discussed later.

Synchronous process

Now that we understand why we would want to run two processes independently, we need to understand the difference between asynchronous and parallel. As stated above asynchronous run independently of another process. Depending on the type of task and what the task is doing it does not necessarily mean it is running at the same time, it just means we do not rely on Task A to perform Task B. We typically see this when trying to keep a UI from locking up while it waits for a task to process the results giving the UI a responsive feel. Parallel processing means we can run two tasks at the same time, or in parallel. In order to do a true parallel process specific hardware requirements must be met on the processor. The processor must have multiple threads that can be utilized by the OS to process each task at the same. Most processors that have multiple threads also must have multiple cores. A typical dual core processor will have 2 threads one for each core, Intel does provide processors that contain hyper-threading (I series) which provides 2 threads per core.

Asynchronous process

This is a lot to take in, but since C# is a managed language we have a set of APIs that make these tasks simpler. This is where the Task Parallel Library or TPL for short comes in. The TPL will not only utilize threads but it will also utilize processor cores and tasks to process tasks simultaneously in the most efficient way possible.

Multi-Threading in C#

Multithreading was introduced with .Net 1.1, and allowed developers to send sub processes to a context different from the main context. This allows for processes to not be in a wait status for one process to continue. In simpler terms, multi-threading allowed two processes to run independently of each other. Multi-Threading does not necessarily mean parallel processing, it just allows for sub processes to be broken up into "Threads" so that a thread can process a task while another thread continues on a separate task. The main context, will then wait for the tasks to complete before returning. For example, let's say I am using the repository method to update database tables, I need to update the user and user profile table. Instead of updating the user table then updating the user profile table I can execute each task on a separate thread so the tasks can be processed independently of one another Once both tasks are complete, I can then send the update to the UI.


void CreateUser(UserModel User)
{
	CreateUserInDB(User);
	
	//create threads
	Thread profileThread = Thread.Start(() =>CreateUserProfile(User));	
	Thread emailThread = Thread.Start(() => WelcomeEmail(User.Email));
	
	
	//do more here this is a thread as well
	LogRegistrationForMarketing(User);
	
	//wait for threads to complete
	profileThread.Join();
	emailThread.Join();
	
}

void CreateUserProfile(UserModel User)
{
	DBWrite(User);
}

void SendEmail(UserModel User)
{
	SendEmail(User);
}

When using a threads, every time a new thread is started a new thread is opened. Depending on the system configuration you can have hundreds of threads running, but opening up too many threads will lead to thread exhaustion. This can cause problems when trying to scale out. Using the database example above, if we need to create two threads every time we make a user update we will start to experience thread exhaustion when we scale to 200 or more users trying to do simultaneous updates. To fix this issue, Microsoft introduced Thread Pools. Thread pools can be thought of as queues for threads, so instead of opening up 400 threads when 200 users try to update their information, we can create 2 thread pools (one for each database) and send the updates to the thread pool instead. This will help manage resources and reduce chances of thread exhaustion.

Threading does come with some issues. Threads return void so you cannot update an item directly, which means you must update an object outside of the context of the newly created thread. Some items are not thread safe and this practice will cause a deadlock or a runtime error. Also, since threads are not managed, creating too many threads can cause slowdown issues and can become very difficult to debug.

Tasks in C#

Tasks were introduced in .NET 4 and can be thought of as managed threads. What Tasks do is take some of the downsides of threading and abstract this into a Task. Tasks also make coding clearer by using async/await commands to tell the application when something is needed to be sent to a different thread. Probably one of the biggest advantages to a Task is the ability to return a value. Before Tasks, returning a value with a thread was difficult and error prone, since an object had to be updated outside the thread leading to context issues and deadlocks.

Task Example:


void CreateUser(UserModel User)
{
	CreateUserInDB(User);
	
	//create Tasks
	Task profileTask = CreateUserProfile(User);

	Task emailTask = WelcomeEmail(User.Email);
	
	//do more here this is a thread as well
	LogRegistrationForMarketing(User);
	
	//wait for tasks to complete
	await profileTask;
	await emailTask;
	
}

async Task CreateUserProfile(UserModel User)
{
	await DBWriteAsync(User);
}

async Task SendEmail(UserModel User)
{
	await SendEmailAsync(User);
}

With modern applications, many asynchronous tasks are not I/O intensive and are awaiting processes to finish to return a result. API calls to other 3rd party web services is an example of this. To maximize efficiency the Task library uses thread pools by default to manage tasks. This eliminates the need to setup a thread pool for threads and allows the system to manage the resources along with how many thread pools are created. The ValueTask was also introduced to reduce the overhead of creating a separate thread. What it does is adds an additional check so if a value is not needing to await a response returns immediately. For example, if your application caches a call, there is no need to create a thread to pull from memory cache. Using ValueTask will return the response from memory on the calling thread instead of creating a new thread that will never be used.

Managing the thread through an abstracted layer does come with disadvantages as well. By default, a thread pool is used. This means that several processes can be stored in one ThreadPool and can still be waiting on a different process to complete. The application will recognize a high volume thread and push tasks to a separate thread pool, but this may not be as efficient as we want and can lead to longer processing times. Typically by calling Task.Run() will create a separate thread pool but it is not guaranteed like it is with simple threading.

Tasks are not without its issues. For example, tasks are still prone to thread exhaustion and deadlocks, if a service is taking too long to return or times out the task can become locked. This can be overcome with a cancellation token, but this adds to some complexity. Also, since tasks use a managed thread pool performance may not be what you expect. The Tasks library will handle sleeping tasks appropriately but for more resource intensive tasks this may not be an option. The system tries to handle tasks and parallelism, but it does not guarantee an item will run asynchronous. Some expect a task to be removed from the main thread once it is called, but it will not move to a different thread until the 1st await is hit. This can be overcome by wrapping your code in a Task, but this creates additional overhead and can lead to thread exhaustion. Finally, until .NET 6, tasks could not be run inside of a Task.Run(). Task.Run() will take an async function but then there is nothing to await the tasks inside and the task will not guarantee completion.

Parallelism in C#

Up until now, we have been talking about asynchronous operations. Using threads was about reducing the load on a particular thread, usually the UI thread, to create a better user experience by not locking a thread for long processes. To summarize, we didn't want the user to feel like the app was locked up or didn't want the app to wait while other items were being processed. While this sounds like the application was running items in parallel, asynchronous tasks does not guarantee this it is just a way to manage resource wait time by basically using a queue to process tasks in the background.

How do we execute code simultaneously in C#? Microsoft provides us with a set of commands that can be found in the System.Threading.Tasks.Parallel library. This library will handle the management of threads and processors to segment tasks to run concurrently. When thinking of overhead here, make sure to take into account hardware. Remember true parallelism requires multiple processors/cores. If you are running a system that is single core this library will not give much benefits. It can still be used, but a single core can still only handle 1 process at a time.

To use parallelism, Microsoft has provided Parallel.For and Parallel.ForEach loops respectively. Each works similarly to their repsective for/foreach loop. The main difference is that each action is run parallel instead of sequential. It is important to remember this in case objects are being updated outside the loop. Otherwise concurrent actions on an object might occur causing unexpected over rights or errors. Starting with DotNet 6, parallel loops can now return objects making for a better thread safe experience, if you are not using DotNet 6 or later, Concurrent Queues and Bags (generic list) can be used to update items in a safe manner which can be found in System.Collections.Concurrent.


async Task CreateUsers(List Users)
{
        //Update multiple users in parallel
	await Parallel.ForEachAsync(Users, async (User) => 
	{
	CreateUserInDB(User);
	
	//create Tasks
	Task profileTask = CreateUserProfile(User);

	Task emailTask = WelcomeEmail(User.Email);
	
	//do more here this is a thread as well
	LogRegistrationForMarketing(User);
	
	//wait for tasks to complete
	await profileTask;
	await emailTask;
	});
}

async Task CreateUserProfile(UserModel User)
{
	await DBWriteAsync(User);
}

async Task SendEmail(UserModel User)
{
	await SendEmailAsync(User);
}

When not to use the TPL Library?

The discussion has been heavily focused on when to use the TPL, but why shouldn't everything be asynchronous? There are two real reasons to not use one of the TPL methods, overhead to create a Task is more than the benefit of the task. Trivial compute tasks should not use TPL to process. Math is simple for a PC so if you are calculating a total for an order it might not make sense to create a separate thread for this task.

Another reason to not create a separate task is when there is communication between a tasks. If you think of a pattern like the Observer Pattern where an object is waiting for an event from another object running on the same thread is a must. Communication between threads is not allowed so if your observer is on a separate thread, it will never know the event executed.

Thursday, May 26, 2022

How to Use CI/CD for Azure App Services Using Azure Dev Ops

What is Dev Ops?

Before looking at how to configure Azure Dev Ops it is important to understand what DevOps is. Like the word, DevOps is combing development with operations. The goal is to get updates and features to the end user faster and in a more automated fashion. In a water fall method, gathering requirements, development, testing, and deployment are handled in an IT bubble. The end user isn't part of the process until the end. How is the team supposed to know if they are on the right track? What if something was interpreted wrong? These types of issues can cause lengthy hold ups and serious budget issues. Agile tries to rectify this by working with the business owners more regularly by deploying smaller features at a faster pace. The goal is to get feedback quicker and pivot to to that feedback.

This is where Azure DevOps comes in. In order to get code out quickly, automating processes is necessary. We can use Azure to build our code, run test cases, package our code, and deploy it our web server all without human intervention. Additional workflows can be added to alert approvers schedule deployments and setup different environments can also be utilized

Continuous Integration

Continuous integration is the first step of our DevOps automation. Usually a team is made up of more than one developer. Each developer will be working on a feature or part of a feature. When they are done, they must integrate their code with the rest of the code. Using the source control called Git developers can request the code be integrated into the rest of the code with a pull request. This can alert a manager or other developers to review the code then approve the merge. This will merge the code back into the larger code set and the developer can pull a new feature. To make sure the code integrates correctly, a developer will create a series of unit tests to validate the code passes the requirements. If the tests do not pass the merged updates will not deploy the updates.

Continuous Delivery

Continuous delivery is the automation of deploying the code to an environment. Once the code is accepted into the branch a release is created and a user is alerted to initiate the deployment of code. This is especially useful for production ready code, it allows for a person to intervene and review the code before it is deployed to a production site. Typically once the review is over the user will approve and the code is deployed.

Continuous Deployment

Continuous deployment removes the human interaction from the delivery side. This is useful for you dev and staging sites and will allow for developers to freely deploy their code into a site seamlessly to see changes immediately once all tests pass from the integration side. It is assumed here that test cases and unit tests have been thoroughly designed and tested otherwise it is very easy to introduce a bug without someone knowing.

Setting up branches

When setting up branches, I prefer the Gitflow strategy that can be referenced here https://www.gitkraken.com/learn/git/git-flow

What makes this setup different from other git strategies is the fact that there are 3 main branches and then branches are created from these. Other styles will create a new feature branch for new changes and create releases from this instead of merging into one main branch for production.

3 main branches

1) Main branch - production code

2) dev branch - development of new features

3) Hotfix branch - fixes for production.

From the dev branch you would create your feature or release branches, from the hotfix branch you would create branches for fixing production bugs, and additional branches should be made from dev branch or hotfix branch then merged into these branches. Having a dev branch, we are able to create an integration site for testing and approved before deployment. Because multiple features and releases can be merged into the dev branch before making it to production you must make sure your dev branch is production ready before the merge, which will require a code freeze before promoting to production. This is the main difference between continuous deployment and continuous delivery.

Creating a CI/CD solution in 5 minutes

To begin using Azure Dev ops we need something to deploy and interact with our site. To begin, we will create a new code repository for source control, I called mine "ci_cd blog". When creating your repository make sure to choose GIT and not TFVC.

With our repository created we can clone it to our local machine.

I will then create a simple web application along with a test project.

dotnet new mvc -o BasicApp

Before I commit my code, I will want to to setup my continuous integration pipeline. This is the pipeline section

From within the pipeline section, I will select Pipelines. The pipeline is where I will create the build/test/deploy for my application. Before an application is deployed the pipeline will run several commands to ensure our code is ready to deploy. If it does not the pipeline fails and our code will not be deployed to the application. To create a pipeline, click "Pipelines" and then select "Create Pipeline"

Next we must select where our code is currently stored. Our code for this example will be stored within Azure Dev Ops, so I will select "Azure Repos Git". Azure Dev Ops also integrates with GitHub or other git repositories if those are being used. Notice the bubble that says YAML, this is the language used to develop the pipeline.

Once the Azure Repos Git is selected, the pipeline must be tied to a repo. The next step is to select the repo created earlier.

Finally, we will select how we will configure the pipeline. There are several options to start from that come preconfigured for different applications and languages. For this post, I will be using the Starter Pipeline.

The pipeline should now look like this.

From the image above, we see a drop down with the branch this pipeline is saved to, the trigger to run the pipeline, and the steps the pipeline will take. Since we are deploying a web app we can delete the current steps. We will leave the VM image as ubuntu for the blog but if you app is running windows, change this to windows-latest to build for a Windows machine.

//TODO: pool code

A basic pipeline should consist of at least 5 steps. Our project should build, test, publish, copy published files, then publish those files to the pipeline. In the menu to the right, we see several tasks, here we can select the tasks above to implement in our pipeline. Simply search for what you want to accomplish and the task will help you build the basics for each task. Intellisense in the pipeline will also give you clues to advanced settings the GUI doesn't offer. In the tasks we will use the .NET Core task to build, test, and publish our files.

Once you click Add at the bottom, our YAML file will fill in with the appropriate syntax.

//TODO build code

Our test and publish are the same, steps. Publish offers more options, we will use the defaults.

//TODO: test code

//TODO publish code

Test will run our test projects and if it fails will stop the pipeline and publishing. This keeps the published site clean and free from any mistakes.

Now that the project is built and tested, the published files get copied to the staging directory. We do this to keep folder clean, an advanced setting is "CleanTargetFolder" this way the code is copied to an empty directory and old/bad DLLs, files, or zips are not published. We will use the built in variables for the directories.

//TODO: copy file Yaml

For the fifth step we can do one of two things. First, we could just deploy directly to Azure from the pipeline. I only recommend this solution for dev environments, the reason being is it does not give you control over what is deployed to your environment. If the pipeline builds it will deploy automatically. This removes any approval control or deferral of deployment. Also, it does not give you the option to roll back to a previous build. You would need to run the entire pipeline again to revert your deployment which can be costly.

I prefer to use releases to deploy my code. These can still be continuous and without intervention, but it gives more control over what happens when our code deploys. To create a release, a pipeline container must be created. To do this, search "publish" to find the "Publish build artifacts" in the task pane. For the task details, the default values will work.

//TODO: publish artifacts YAML

With the pipeline setup, all the builds can be viewed along with test results, status, and deployment times. Each build can be drilled into to view branch changes that kicked the build off.

Setup deployment Releases

1) Create new release pipeline

2) Select app service deployment

3) Name deployment stage

4) Click job/task in stage

5) select step.

6) connect to subscription

Adding Code to the repository

1) open a project in VSCode

Before connecting to the project, a git repository must be created locally, open the termianal in VSCode and type the following commands:

2) run "Git Init" to create empty git repository

3) run "git add ." to add all items to git repository

4) run "git commit -m "Initial Commit" " this will commit all items to be ready to push to repository

Now, open the devops repository to find the clone button. This will give the repository URL to push our project too

5) In devops go to your repository and find the clone button and copy URL

Using the Clone button can result in lost code we do not want to pull the empty project, we want to push what we have to the empty project. To do this type the following commands in the VSCode terminal.

6) "git remote add origin <url>"

*Be sure to replace <url> with the url copied from DevOps

Once connected to the remote origin run "git push" to push your committed files. You should receive a prompt to login using your email/password. If not, you can create an account credentials by selecting the little man in the corner and going to alternate credentials.

7) "git push"

When the project is pushed, our repository will automatically kick off and start publishing our website. With that, a successful DevOps pipeline is created.

Thursday, January 27, 2022

Creating a CosmosDB Azure Function in 5 Minutes

What is an Azure Function?

When talking about cloud solutions, there are many choices. A web application can be deployed using a typical virtual machine or the PaaS (Platform as a Service) solution of an Azure web app. Sometimes a quick function is all you need, and a large application is overkill. This is where an Azure Function comes into play. With an Azure Function, we can implement a web end point that can be called by other Azure services or create a typical http endpoint. These functions can can be coded directly in the browser for a quick way to roll up an endpoint. In addition to this, there are integration features where you can setup triggers, integrate data sources, and choose the output option for the function all without code.

Azure Functions work well with a microservice architecture. Functions are well suited for keeping your code to one domain and avoiding a sprawling application into other domains. They are meant to fit a very specific action and it makes it difficult to go beyond that specific action. Also, with the ability to trigger the function from various methods setting up an event bus that will react and trigger the function when necessary.

Example of a function setup.

Create an Azure Function in 5 minutes

Setup CosmosDB

In this example, we are going to create a function that returns some data from a CosmosDB. For our function to be configured using integrations, we need to make sure we are using a SQL backed Cosmos DB. We can still use Azure Functions to connect to other NoSQL databases, but it will be the traditional connection, such as BSON.

In the newly created Cosmos DB, we need to create a new container. For this example, a simple Product database will be used. Our container will use the following model, and we will use Product ID for our partition key.


public class GetTestItem
{
    public string id {get;set;}
    public string ProductId {get;set;}
    public string manufacturer {get;set;}
    public string description {get;set;}
}

Here is an example of some data inserted into our newly created database.

Setup Azure Function

With the data source created, the next step is to setup the Azure function.

With our app created, next we need to setup a trigger. For this example, I am going to call it GETALL for demonstration purposes. Proper REST Url would be "Products". When creating the function, we are also going to select "Development in Portal", and "Anonymous" for authorization. Finally, this will be an HTTP trigger.

In our new function, click "Integration". This will bring up our menu to configure the function.

In our integration screen, a new input needs to be created. This will bring up a message for creating a connection to the Cosmos DB database. Just follow the steps to select your Cosmos DB connection. The "Document Parameter Name" is the name we will use to reference in our function.

Going back to our integration screen, select the function. Now pull up the run.csx. This is where we will write our code to manipulate the Cosmos DB data. The biggest changes will be in the constructor of our function.


public static async Task Run(HttpRequest req, ILogger log)

In the constuctor, we will add a collection that will represent our SQL collection. You will notice in the code below, that the connection string now has an IEnumerable name Products to represent the input we declared in in the integration screen. Once this is added our function will connect to the database and pull all items from our container with no code. The example below will just write back all the data as a response to the request.


#r "Newtonsoft.Json"

using System.Net;
using System.Collections.Generic;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Extensions.Primitives;
using Newtonsoft.Json;



public class Product
{
    public string id {get;set;}
    public string ProductId {get;set;}
    public string manufacturer {get;set;}
    public string description {get;set;}
}

public static async Task Run(HttpRequest req, 
                IEnumerable Products,
                ILogger log)
{
    try
    {
    log.LogInformation("C# HTTP trigger function processed a request.");

    return new OkObjectResult(Products);
    }
    catch (Exception ex)
    {
                log.LogError($"Couldn't insert item. Exception thrown: {ex.Message}");
                return new StatusCodeResult(StatusCodes.Status500InternalServerError);
    }
}

That's it. Our function app is ready and will show all items for our Cosmos DB with very little coding.

Bonus! Setup an additional call to get item by id

It is pretty impractical to display all items from our database. So how can we add a query to our integration to pull the data we need. If you notice, on the bottom of our input integration screen there is a SQL query section. To setup a query function, let's go back and create a new function GetById. An additional function can be added from the Azure portal in the function settings. We will use the same settings from our GET app We will have it be anonymous and use the designer. Once the new function is created, go to the integration screen and create the input. fill out the database name and collection name from the previous step. This time, a SQL query will be added at the bottom. The query will select the top item where id = id


SELECT TOP 1 * FROM d where d.id = {id}

In our SQL query you will find the parameter {id} which will represent the data we are wanting to pass to the query. So to get our id parameter we will want to update the request's route to include the id. To do this, open up the Trigger in our integration menu to get the edit screen for adding a route.

In the route template we will add our parameter {id} to the route. For this example I will have GetByID/{id} to show what I am actually doing, in a production environment this route would be something like Products/{id}.

Now in our new function app, we will open the function and go to our run file. I will add the new parameters for ID and my SQL database to the constructor. Since i have a query setup, my SQL collection will now be filtered, in this case contain the one item with the id passed to the function.


#r "Newtonsoft.Json"

using System.Net;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Extensions.Primitives;
using Newtonsoft.Json;

public class Product
{
    public string id {get;set;}
    public string ProductId {get;set;}
    public string manufacturer {get;set;}
    public string description {get;set;}
}

public static async Task Run(HttpRequest req, 
string id,
 IEnumerable Products,
  ILogger log)
{
    log.LogInformation("C# HTTP trigger function processed a request.");
    log.LogInformation(id.ToString());
    log.LogInformation(Products.First().ProductId);
}

That's it, we have setup two Azure functions for pulling data from Cosmos DB without ever opening Visual Studio.

Pages