Previously...
Prepping SharePoint
Prepping Azure
Azure AD comes free with your O365 subscription, and this is where we will register our app. Login to Azure and go to Azure Active Directory. From there you will find "App Registrations". This is where we will register our new app that will upload our documents.we will name our app something we will remember, I chose Graph API Document Upload. We will then set our supported account type. This setting will not matter for us since we will use a token to access the API, so choose "Accounts in this organizational directory only" so that it is secure. You can leave the web blank, we will set that up next.
Creating a Web Job in 5 minutes
Packages to install:
- Microsoft.Graph.Core
- Microsoft.Graph
- Microsoft.Identity.Client
- Azure.Storage.Blobs
- Azure.Storage.Queues
Now that we have our nuget packages installed, we can start developing our web job. For the most part, our process will just be the reverse of adding items to the queue. We will need to manage the queue by popping items when finished uploading to SharePoint. We will need to bring over our CustomerMetadata model we created in the first part of this blog since it represents our queue data.
namespace SharePointMigration_WebJob
{
public class CustomerMetadata
{
public string State {get; set;}
public string City {get;set;}
public string AccountNumber {get;set;}
public string FileName {get;set;}
}
}
Now that we have our data model covered, we will want to pull our data from our queue. The Azure queue has some interesting features, we can peek or receive an item in the queue. Peek allows us to look without a locking the item, while receive will put a temporary hold on the item so other jobs cannot receive or pop the item from the queue. The queue also allows us to do batch requests, this will help with throttling the queue. Our queue function will take two parameters one for the number of messages we want to receive from the queue and how long we want to lock the queue. In our example we will want to load a document every 10 seconds, which comes to approximately 30 documents every 5 minutes. In our function the parameters will be set to the defaults of 1 message and a lock of 30 seconds. When we call the function, it will pull a larger batch.
public static async Task GetFileMetaDataFromQueue(int MessageCount=1, int QueueLock=60)
{
//calls the queue and pulls messages for processing
QueueMessage[] queueMessages = await queue.ReceiveMessagesAsync(MessageCount, TimeSpan.FromSeconds(QueueLock));
return queueMessages;
}
Now that we have our queue items, we will need to get the document that is associated with the queue. Documents are returned as streams, and the Graph API expects a stream to write to SharePoint. So we will just return the stream from our blob storage.
public static async Task GetFileFromAzureBlob(string FileName)
{
BlobClient blobClient = new BlobClient(cs, "customer", FileName);
using (BlobDownloadInfo downloadInfo = await blobClient.DownloadAsync())
{
MemoryStream stream = new MemoryStream();
downloadInfo.Content.CopyTo(stream);
return stream;
}
}
Pulling metadata and documents from Azure queue is straight forward, the Graph API is not. In older SharePoint libraries (SSOM, CSOM, JSOM) you could get sites and libraries by title. With Graph everything is ID based. That means we need to get our site, library, and folder ids before we can add/update SharePoint. These do not result in additional calls, CSOM required us to pull them first as well, but it does make it more challenging to code. Before we can get our IDs we need to setup Graph to use our new app credentials. To do this we will create a function that authenticates to Azure and returns our token.
public static async Task GetGraphAPIToken()
{
string tenantId = ""; //realm
//some service account to upload docs. Documents cannot use app
string clientId = "";
//service account password
string clientSecret = "";
string[] scopes = new string[] {"https://graph.microsoft.com/.default" };
IConfidentialClientApplication app = ConfidentialClientApplicationBuilder.Create(clientId)
.WithClientSecret(clientSecret)
.WithAuthority(new Uri("https://login.microsoftonline.com/" + tenantId))
.Build();
AuthenticationResult result = await app.AcquireTokenForClient(scopes).ExecuteAsync();
//result contains an ExpiresOn property which can be used to cache token
return result.AccessToken;
}
Now that we can pull our token, we will create a function that pulls our Ids. We only need to do this once and will call it at the beginning our our main function to reduce the amount of calls to the Graph API. The Ids are stored in fields so they can be accessed by other functions.
public static async Task GetSiteandListIDs()
{
//Crate our graph client
GraphServiceClient graphClient = new GraphServiceClient("https://graph.microsoft.com/v1.0", new DelegateAuthenticationProvider(
async(requestMessage) =>{
string token = await GetGraphAPIToken();
requestMessage.Headers.Authorization = new System.Net.Http.Headers.AuthenticationHeaderValue("Bearer", token);
}
));
//Gets the root site, if your library lives somewhere else you will need the collection and find it.
var sites = await graphClient.Sites.Root.Request().GetAsync();
siteId = sites.Id;
//gets all libraries. Since our app is written in 5 minutes it is easier to filter the entire collection
var libraries = await graphClient.Sites[siteId].Drives.Request().GetAsync();
var library = libraries.First(f => f.Name == "CustomerDocs");
libraryId = library.Id;
// gets root folder of our library
var rootFolder = await graphClient.Sites[siteId].Drives[libraryId].Root.Request().GetAsync();
rootFolderId = rootFolder.Id;
}
We are ready to start uploading documents to SharePoint. We will follow similar steps for uploading documents as we did for getting our Ids. It is important to note that O365 now refers to libraries as drives. This is because the same functionality can be applied to OneDrive and Teams, which are all backed by SharePoint libraries.
public static async Task UploadDocumentToSharePoint(Stream FileStream, CustomerMetadata Metadata)
{
//Gets graph client. We get this each time to make sure our token is not expired
GraphServiceClient graphClient = new GraphServiceClient("https://graph.microsoft.com/v1.0", new DelegateAuthenticationProvider(
async(requestMessage) =>{
string token = await GetGraphAPIToken();
requestMessage.Headers.Authorization = new System.Net.Http.Headers.AuthenticationHeaderValue("Bearer", token);
}
));
//Uploads our file to our library
DriveItem createDocument = await graphClient.Sites[siteId].Drives[libraryId].Items[rootFolderId].ItemWithPath(Metadata.FileName).Content.Request().PutAsync(FileStream);
//Our metadata for our document
FieldValueSet custData = new FieldValueSet{
AdditionalData = new Dictionary()
{
{"State", Metadata.State},
{"City", Metadata.City},
{"AccountNumber", Metadata.AccountNumber}
}
};
//sets the metada properites for our item
await graphClient.Sites[siteId].Drives[libraryId].Items[rootFolderId].ItemWithPath(Metadata.FileName).ListItem.Fields.Request().UpdateAsync(custData);
//try checking in file, some libraries it is required
try
{
await graphClient.Sites[siteId].Drives[libraryId].Items[createDocument.Id].Checkin().Request().PostAsync();
}
catch(Exception ex){
//ignoring this error becuase library is not set for checkin/out
if(!ex.Message.Contains("The file is not checked out"))
{
throw ex;
}
}
}
Our documents now sit in SharePoint with metadata. The last thing to do is clean up our queues. Remember Microsoft charges an average amount that is in our blob storage, so keeping it with minimal documents as possible will keep costs down. We will use our PopReceipt and MessageId to remove any queue items. We dont want them back in the queue looking for items again, that will break our loop.
public static async Task RemoveItemFromQueue(string MessageId, string Receipt)
{
await queue.DeleteMessageAsync(MessageId, Receipt);
}
public static async Task RemoveDocumentFromQueue(string FileName)
{
BlobClient blobClient = new BlobClient(cs, "customer", FileName);
await blobClient.DeleteAsync();
}
Now that we have all our functions we can put our main loop together. This is a continuous job, so all of our logic will sit in a while loop to keep it running forever. We will get our Ids from the Graph API, and then pull from the queue. Once we have the queue we loop through our items and upload to SharePoint. We will sleep every once and awhile to make sure our app runs at an appropriate speed for SharePoint and eliminate any throttling.
//making this a property so we do not have to keep recreating the object
static QueueClient queue;
//connection string to our storage account, it is shared between blobs and queues.
static string cs = "";
//The site ID our library lives in
static string siteId = "";
//The ID of the library
static string libraryId = "";
//The ID of the root folder in the library
static string rootFolderId = "";
static async Task Main(string[] args)
{
queue = new QueueClient(cs, "customer");
int batchMessageCount = 30; //number of items to pull from queue at once
int queueLock = batchMessageCount * 10; //number of batches time 10 since each message will take 10 seconds to process.
//
await GetSiteandListIDs();
//Creating an infinite loop for our continuous job
while(true)
{
DateTime startTime = DateTime.Now;
try
{
Console.WriteLine("Getting queue");
QueueMessage[] messages = await GetFileMetaDataFromQueue(batchMessageCount, queueLock);
Console.WriteLine("Found {0} items in the queue", messages.Length);
foreach(QueueMessage message in messages)
{
//our cleint job encoded the message, this will decode it
string data = HttpUtility.UrlDecode(message.MessageText);
CustomerMetadata customer = JsonConvert.DeserializeObject(data);
Console.WriteLine("Pulling document {0}", customer.FileName);
using(MemoryStream document = await GetFileFromAzureBlob(customer.FileName))
{
Console.WriteLine("Uploading document {0}", customer.FileName);
await UploadDocumentToSharePoint(document, customer);
}
Console.WriteLine("Upload was successful, removing {0} from the queue", customer.FileName);
//remove message from queue so it doesnt get pulled again since we were successful
await RemoveItemFromQueue(message.MessageId, message.PopReceipt);
//remove document from storage
await RemoveDocumentFromQueue(customer.FileName);
//sleep 10 seconds before next call to sharepoint to prevent throttling.
System.Threading.Thread.Sleep(10000);
}
}
catch(Exception ex)
{
Console.WriteLine("Error writing queue to SharePoint: " + ex.Message);
}
Console.WriteLine("Finished with current queue list, will wait 5 minutes from last call");
//we want our job to sleep if it takes less than 5 minutes to process the queue. This is to prevent throttling
DateTime endTime = DateTime.Now;
double totalMinutes = endTime.Subtract(startTime).TotalMinutes;
if(totalMinutes < 5)
{
double sleepTime = (5-totalMinutes) * 60000;
System.Threading.Thread.Sleep(Convert.ToInt32(sleepTime));
}
}
}
Deploying our Web Job
dotnet publish -c Release