The Retry Design Pattern

The Retry Design Pattern

·

9 min read

As developers, we often come across situations where we need to make requests to external services, such as APIs or databases. However, these requests can sometimes fail due to various reasons, such as network issues or the external service being temporarily unavailable. In such cases, it's important to implement a Retry Pattern in our code to ensure that the request is retried a certain number of times before giving up. In this article, we will discuss how to implement the Retry Pattern in C#.

What is the Retry Pattern?

The Retry Pattern is a design pattern that allows us to automatically retry an operation that has failed, with the aim of making it eventually succeed. The pattern involves repeating the operation a certain number of times or until it succeeds, with a delay between retries. The Retry Pattern is particularly useful in situations where an operation can fail due to transient errors, such as network issues or database connectivity problems.

Implementing the Retry Pattern in C#

Let's consider a simple scenario where we want to make a request to an API using the HttpClient class in C#. We can implement the Retry Pattern by wrapping the HttpClient request in a retry loop. Here's an example:

public async Task<string> MakeRequestWithRetryAsync()
{
    HttpClient httpClient = new HttpClient();
    int retryCount = 0;
    while (retryCount < 3)
    {
        try
        {
            HttpResponseMessage response = await httpClient.GetAsync("https://example.com/api");
            if (response.IsSuccessStatusCode)
            {
                return await response.Content.ReadAsStringAsync();
            }
        }
        catch (Exception ex)
        {
            // Log the exception
        }
        retryCount++;
        await Task.Delay(TimeSpan.FromSeconds(1));
    }
    throw new Exception("Failed to make request after 3 retries.");
}

In this example, we create an instance of the HttpClient class and define a retry count of 3. We then wrap the HttpClient.GetAsync() method in a while loop that repeats the request until it succeeds or until the retry count is exceeded. If an exception is thrown during the request, we log the exception and increment the retry count. We also introduce a delay of one second between each retry.

If the request fails after three retries, we throw an exception. This ensures that we don't get stuck in an infinite loop if the request consistently fails due to an unrecoverable error.

Configuring the Retry Pattern

In the above example, we hardcoded the retry count and delay between retries. However, it's often better to make these values configurable so that they can be adjusted without modifying the code. We can achieve this by passing the retry count and delay as parameters to the method, like this:

public async Task<string> MakeRequestWithRetryAsync(int retryCount, TimeSpan delay)
{
    HttpClient httpClient = new HttpClient();
    int currentRetry = 0;
    while (currentRetry < retryCount)
    {
        try
        {
            HttpResponseMessage response = await httpClient.GetAsync("https://example.com/api");
            if (response.IsSuccessStatusCode)
            {
                return await response.Content.ReadAsStringAsync();
            }
        }
        catch (Exception ex)
        {
            // Log the exception
        }
        currentRetry++;
        await Task.Delay(delay);
    }
    throw new Exception($"Failed to make request after {retryCount} retries.");
}

In this version of the method, we pass the retry count and delay as parameters. We also use a different variable name to make the code more readable. We then use the values of these parameters to control the retry loop.

Using a Separated Class

In the previous examples, we implemented the Retry Pattern directly in the method where we wanted to retry an operation. However, this approach can make the code more difficult to read and maintain, especially if we need to retry the same operation in multiple places. A better approach is to create a separate class that encapsulates the retry logic, and then use that class whenever we need to retry an operation. Here's an example:

public class RetryPolicy
{
    private readonly int _maxAttempts;
    private readonly TimeSpan _delay;

    public RetryPolicy(int maxAttempts, TimeSpan delay)
    {
        _maxAttempts = maxAttempts;
        _delay = delay;
    }

    public async Task<T> ExecuteAsync<T>(Func<Task<T>> operation)
    {
        int currentAttempt = 0;
        while (currentAttempt < _maxAttempts)
        {
            try
            {
                T result = await operation();
                return result;
            }
            catch (Exception ex)
            {
                // Log the exception
            }
            currentAttempt++;
            await Task.Delay(_delay);
        }
        throw new Exception($"Failed to execute operation after {_maxAttempts} attempts.");
    }
}

In this example, we create a RetryPolicy class that takes a maximum number of attempts and a delay between retries as parameters. We then define a public method called ExecuteAsync that takes a Func<Task<T>> as a parameter. This allows us to pass in any operation that returns a Task, including HttpClient requests or database queries.

In the ExecuteAsync method, we wrap the operation in a while loop that repeats the operation until it succeeds or until the maximum number of attempts is exceeded. If an exception is thrown during the operation, we log the exception and increment the current attempt count. We also introduce a delay between retries using the TimeSpan parameter.

To use the RetryPolicy class, we can create an instance of the class and then call the ExecuteAsync method, passing in the operation we want to retry. Here's an example:

public async Task<string> MakeRequestWithRetryAsync()
{
    RetryPolicy retryPolicy = new RetryPolicy(3, TimeSpan.FromSeconds(1));
    HttpClient httpClient = new HttpClient();
    string result = await retryPolicy.ExecuteAsync(async () =>
    {
        HttpResponseMessage response = await httpClient.GetAsync("https://example.com/api");
        if (response.IsSuccessStatusCode)
        {
            return await response.Content.ReadAsStringAsync();
        }
        throw new Exception($"Request failed with status code {response.StatusCode}");
    });
    return result;
}

In this example, we create an instance of the RetryPolicy class with a maximum of three attempts and a one-second delay between retries. We then create an instance of the HttpClient class and pass in an anonymous async method that makes the HttpClient request. Inside the anonymous method, we check if the response is successful, and if so, we return the response body as a string. If the response is not successful, we throw an exception with a message that includes the status code.

By using a separate RetryPolicy class, we can simplify the code where we need to retry an operation, and make it easier to maintain and modify the retry logic. We can also reuse the RetryPolicy class in multiple places throughout our application.

Exponential BackOff

Exponential backoff is a technique used in the Retry Pattern to increase the delay between retries exponentially with each failed attempt. This approach is used to avoid overwhelming the external service with too many retry attempts, while still allowing the application to recover from transient errors.

The basic idea behind exponential backoff is to start with a small delay between retries, and then double the delay with each subsequent retry. This means that the first retry may occur after a one-second delay, the second retry may occur after a two-second delay, the third retry may occur after a four-second delay, and so on.

Here's an example of how we can implement exponential backoff in our RetryPolicy class:

public class RetryPolicy
{
    private readonly int _maxAttempts;
    private readonly TimeSpan _initialDelay;
    private readonly TimeSpan _maxDelay;

    public RetryPolicy(int maxAttempts, TimeSpan initialDelay, TimeSpan maxDelay)
    {
        _maxAttempts = maxAttempts;
        _initialDelay = initialDelay;
        _maxDelay = maxDelay;
    }

    public async Task<T> ExecuteAsync<T>(Func<Task<T>> operation)
    {
        int currentAttempt = 0;
        TimeSpan delay = _initialDelay;

        while (currentAttempt < _maxAttempts)
        {
            try
            {
                T result = await operation();
                return result;
            }
            catch (Exception ex)
            {
                // Log the exception
            }

            currentAttempt++;
            if (currentAttempt < _maxAttempts)
            {
                await Task.Delay(delay);
                delay = TimeSpan.FromTicks(Math.Min(_maxDelay.Ticks, delay.Ticks * 2));
            }
        }

        throw new Exception($"Failed to execute operation after {_maxAttempts} attempts.");
    }
}

In this example, we modify the ExecuteAsync method to introduce the exponential backoff delay. We start by initializing the delay to the initial delay value, which is passed as a parameter to the constructor. We then introduce the delay between retries using the Task.Delay method, and double the delay using a simple calculation that ensures that the delay does not exceed the maximum delay value, which is also passed as a parameter to the constructor.

With this implementation, the RetryPolicy class will start with a small delay between retries, and gradually increase the delay with each subsequent retry. This can help avoid overwhelming the external service, and increase the reliability and resilience of our application.

It's worth noting that while exponential backoff can be a useful technique, it's not always the best approach for all situations. In some cases, we may want to use a fixed delay between retries, or a combination of fixed and exponential delays. The best approach will depend on the specific requirements of our application, and the characteristics of the external service or system that we are interacting with.

Jitter

Jitter is a technique that can be used in conjunction with the Retry Pattern to introduce some randomness into the delay between retries. This can help avoid retry storms, where multiple clients are retrying at the same time and overwhelming the external service.

The idea behind jitter is to add a random amount of time to the delay between retries, so that each retry attempt is slightly different. This randomness can help distribute the load on the external service, and reduce the likelihood of multiple clients retrying at the same time.

Here's an example of how we can add jitter to our RetryPolicy class:

public class RetryPolicy
{
    private readonly int _maxAttempts;
    private readonly TimeSpan _initialDelay;
    private readonly TimeSpan _maxDelay;
    private readonly Random _random;

    public RetryPolicy(int maxAttempts, TimeSpan initialDelay, TimeSpan maxDelay)
    {
        _maxAttempts = maxAttempts;
        _initialDelay = initialDelay;
        _maxDelay = maxDelay;
        _random = new Random();
    }

    public async Task<T> ExecuteAsync<T>(Func<Task<T>> operation)
    {
        int currentAttempt = 0;
        TimeSpan delay = _initialDelay;

        while (currentAttempt < _maxAttempts)
        {
            try
            {
                T result = await operation();
                return result;
            }
            catch (Exception ex)
            {
                // Log the exception
            }

            currentAttempt++;
            if (currentAttempt < _maxAttempts)
            {
                int jitter = _random.Next((int)-delay.TotalMilliseconds / 2, (int)delay.TotalMilliseconds / 2);
                delay += TimeSpan.FromMilliseconds(jitter);
                delay = TimeSpan.FromTicks(Math.Min(_maxDelay.Ticks, delay.Ticks * 2));
                await Task.Delay(delay);
            }
        }

        throw new Exception($"Failed to execute operation after {_maxAttempts} attempts.");
    }
}

In this example, we add a new Random object to our RetryPolicy class, which we use to generate a random amount of jitter. We then modify the delay between retries to add this jitter, so that each retry attempt has a slightly different delay.

The amount of jitter we add is based on the current delay value, and we ensure that the total delay does not exceed the maximum delay value, which is also passed as a parameter to the constructor.

With this implementation, the RetryPolicy class will add some randomness to the delay between retries, which can help avoid retry storms and increase the reliability and resilience of our application.

It's worth noting that while jitter can be a useful technique, it's not always necessary or appropriate for all situations. In some cases, a fixed delay between retries may be sufficient, and in other cases, a combination of fixed and exponential delays may be more appropriate. The best approach will depend on the specific requirements of our application, and the characteristics of the external service or system that we are interacting with.