Why do we need asynchronous or parallel processes in C#?
Before diving into the differences of asynchronous or parallel processing the question needs to be asked "why do we need asynchronous or parallel processing." The goal of any program is to provide fast and consistent results. A basic program performs a task synchronously, meaning the application will go through each step one at a time and in the same order every time, in other words each step is in sync with the next step. As an application grows in complexity having a synchronous application will increase the time it takes to complete the task. Sometimes one of the steps in our task is dependent on the other task completing. Other times and application can run a task in tandem with another task since it does not require the first task to complete. By running 2 tasks at the same time we have created an asynchronous process and allows us to start 2 tasks independently of each other which can result in improved time of completion for the process. Notice I used the word "can". Just because a process runs separately doesn't always equal performance gains. With this trade off we introduce other over heads that will be discussed later.
Synchronous process
Now that we understand why we would want to run two processes independently, we need to understand the difference between asynchronous and parallel. As stated above asynchronous run independently of another process. Depending on the type of task and what the task is doing it does not necessarily mean it is running at the same time, it just means we do not rely on Task A to perform Task B. We typically see this when trying to keep a UI from locking up while it waits for a task to process the results giving the UI a responsive feel. Parallel processing means we can run two tasks at the same time, or in parallel. In order to do a true parallel process specific hardware requirements must be met on the processor. The processor must have multiple threads that can be utilized by the OS to process each task at the same. Most processors that have multiple threads also must have multiple cores. A typical dual core processor will have 2 threads one for each core, Intel does provide processors that contain hyper-threading (I series) which provides 2 threads per core.
Asynchronous process
This is a lot to take in, but since C# is a managed language we have a set of APIs that make these tasks simpler. This is where the Task Parallel Library or TPL for short comes in. The TPL will not only utilize threads but it will also utilize processor cores and tasks to process tasks simultaneously in the most efficient way possible.
Multi-Threading in C#
Multithreading was introduced with .Net 1.1, and allowed developers to send sub processes to a context different from the main context. This allows for processes to not be in a wait status for one process to continue. In simpler terms, multi-threading allowed two processes to run independently of each other. Multi-Threading does not necessarily mean parallel processing, it just allows for sub processes to be broken up into "Threads" so that a thread can process a task while another thread continues on a separate task. The main context, will then wait for the tasks to complete before returning. For example, let's say I am using the repository method to update database tables, I need to update the user and user profile table. Instead of updating the user table then updating the user profile table I can execute each task on a separate thread so the tasks can be processed independently of one another Once both tasks are complete, I can then send the update to the UI.
void CreateUser(UserModel User)
{
CreateUserInDB(User);
//create threads
Thread profileThread = Thread.Start(() =>CreateUserProfile(User));
Thread emailThread = Thread.Start(() => WelcomeEmail(User.Email));
//do more here this is a thread as well
LogRegistrationForMarketing(User);
//wait for threads to complete
profileThread.Join();
emailThread.Join();
}
void CreateUserProfile(UserModel User)
{
DBWrite(User);
}
void SendEmail(UserModel User)
{
SendEmail(User);
}
When using a threads, every time a new thread is started a new thread is opened. Depending on the system configuration you can have hundreds of threads running, but opening up too many threads will lead to thread exhaustion. This can cause problems when trying to scale out. Using the database example above, if we need to create two threads every time we make a user update we will start to experience thread exhaustion when we scale to 200 or more users trying to do simultaneous updates. To fix this issue, Microsoft introduced Thread Pools. Thread pools can be thought of as queues for threads, so instead of opening up 400 threads when 200 users try to update their information, we can create 2 thread pools (one for each database) and send the updates to the thread pool instead. This will help manage resources and reduce chances of thread exhaustion.
Threading does come with some issues. Threads return void so you cannot update an item directly, which means you must update an object outside of the context of the newly created thread. Some items are not thread safe and this practice will cause a deadlock or a runtime error. Also, since threads are not managed, creating too many threads can cause slowdown issues and can become very difficult to debug.
Tasks in C#
Tasks were introduced in .NET 4 and can be thought of as managed threads. What Tasks do is take some of the downsides of threading and abstract this into a Task. Tasks also make coding clearer by using async/await commands to tell the application when something is needed to be sent to a different thread. Probably one of the biggest advantages to a Task is the ability to return a value. Before Tasks, returning a value with a thread was difficult and error prone, since an object had to be updated outside the thread leading to context issues and deadlocks.
Task Example:
void CreateUser(UserModel User)
{
CreateUserInDB(User);
//create Tasks
Task profileTask = CreateUserProfile(User);
Task emailTask = WelcomeEmail(User.Email);
//do more here this is a thread as well
LogRegistrationForMarketing(User);
//wait for tasks to complete
await profileTask;
await emailTask;
}
async Task CreateUserProfile(UserModel User)
{
await DBWriteAsync(User);
}
async Task SendEmail(UserModel User)
{
await SendEmailAsync(User);
}
With modern applications, many asynchronous tasks are not I/O intensive and are awaiting processes to finish to return a result. API calls to other 3rd party web services is an example of this. To maximize efficiency the Task library uses thread pools by default to manage tasks. This eliminates the need to setup a thread pool for threads and allows the system to manage the resources along with how many thread pools are created. The ValueTask was also introduced to reduce the overhead of creating a separate thread. What it does is adds an additional check so if a value is not needing to await a response returns immediately. For example, if your application caches a call, there is no need to create a thread to pull from memory cache. Using ValueTask will return the response from memory on the calling thread instead of creating a new thread that will never be used.
Managing the thread through an abstracted layer does come with disadvantages as well. By default, a thread pool is used. This means that several processes can be stored in one ThreadPool and can still be waiting on a different process to complete. The application will recognize a high volume thread and push tasks to a separate thread pool, but this may not be as efficient as we want and can lead to longer processing times. Typically by calling Task.Run() will create a separate thread pool but it is not guaranteed like it is with simple threading.
Tasks are not without its issues. For example, tasks are still prone to thread exhaustion and deadlocks, if a service is taking too long to return or times out the task can become locked. This can be overcome with a cancellation token, but this adds to some complexity. Also, since tasks use a managed thread pool performance may not be what you expect. The Tasks library will handle sleeping tasks appropriately but for more resource intensive tasks this may not be an option. The system tries to handle tasks and parallelism, but it does not guarantee an item will run asynchronous. Some expect a task to be removed from the main thread once it is called, but it will not move to a different thread until the 1st await is hit. This can be overcome by wrapping your code in a Task, but this creates additional overhead and can lead to thread exhaustion. Finally, until .NET 6, tasks could not be run inside of a Task.Run(). Task.Run() will take an async function but then there is nothing to await the tasks inside and the task will not guarantee completion.
Parallelism in C#
Up until now, we have been talking about asynchronous operations. Using threads was about reducing the load on a particular thread, usually the UI thread, to create a better user experience by not locking a thread for long processes. To summarize, we didn't want the user to feel like the app was locked up or didn't want the app to wait while other items were being processed. While this sounds like the application was running items in parallel, asynchronous tasks does not guarantee this it is just a way to manage resource wait time by basically using a queue to process tasks in the background.
How do we execute code simultaneously in C#? Microsoft provides us with a set of commands that can be found in the
System.Threading.Tasks.Parallel library. This library will handle the management of threads and processors to segment tasks to run concurrently. When thinking of overhead here, make sure to take into account hardware. Remember true parallelism requires multiple processors/cores. If you are running a system that is single core this library will not give much benefits. It can still be used, but a single core can still only handle 1 process at a time.
To use parallelism, Microsoft has provided Parallel.For and Parallel.ForEach loops respectively. Each works similarly to their repsective for/foreach loop. The main difference is that each action is run parallel instead of sequential. It is important to remember this in case objects are being updated outside the loop. Otherwise concurrent actions on an object might occur causing unexpected over rights or errors. Starting with DotNet 6, parallel loops can now return objects making for a better thread safe experience, if you are not using DotNet 6 or later, Concurrent Queues and Bags (generic list) can be used to update items in a safe manner which can be found in
System.Collections.Concurrent.
async Task CreateUsers(List Users)
{
//Update multiple users in parallel
await Parallel.ForEachAsync(Users, async (User) =>
{
CreateUserInDB(User);
//create Tasks
Task profileTask = CreateUserProfile(User);
Task emailTask = WelcomeEmail(User.Email);
//do more here this is a thread as well
LogRegistrationForMarketing(User);
//wait for tasks to complete
await profileTask;
await emailTask;
});
}
async Task CreateUserProfile(UserModel User)
{
await DBWriteAsync(User);
}
async Task SendEmail(UserModel User)
{
await SendEmailAsync(User);
}
When not to use the TPL Library?
The discussion has been heavily focused on when to use the TPL, but why shouldn't everything be asynchronous? There are two real reasons to not use one of the TPL methods, overhead to create a Task is more than the benefit of the task. Trivial compute tasks should not use TPL to process. Math is simple for a PC so if you are calculating a total for an order it might not make sense to create a separate thread for this task.
Another reason to not create a separate task is when there is communication between a tasks. If you think of a pattern like the Observer Pattern where an object is waiting for an event from another object running on the same thread is a must. Communication between threads is not allowed so if your observer is on a separate thread, it will never know the event executed.
No comments:
Post a Comment