aws / aws-sdk-net

The official AWS SDK for .NET. For more information on the AWS SDK for .NET, see our web site:

Home Page:http://aws.amazon.com/sdkfornet/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

connectCases:Add retry if created customer or case is not immediately available

michael-freidgeim-webjet opened this issue · comments

Describe the feature

We noticed that sometimes (not often) if a new customer is created and then we immediately call CreateCaseAsync(https://docs.aws.amazon.com/sdkfornet/v3/apidocs/items/ConnectCases/MIConnectCasesCreateCaseAsyncCreateCaseRequestCancellationToken.html) for the customer, we’ve had an error “No customer profile found for customer_id”
We had to add a few retries (with delay) before the object becomes available.

Similarly “case not found” sometimes happens if connectCases.SearchCasesAsync(https://docs.aws.amazon.com/sdkfornet/v3/apidocs/items/ConnectCases/MIConnectCasesSearchCasesAsyncSearchCasesRequestCancellationToken.html) is called immediately after case creation.

I’ve looked at https://docs.aws.amazon.com/sdk-for-net/v3/developer-guide/retries-timeouts.html but it supports retry requests that fail due to server-side throttling or dropped connections.

Use Case

Avoid proprietary code to retry in case of errors “No customer profile found for customer_id” or “case not found”

Proposed Solution

It will be good to include optional parameter RetryIfNotFound(default false) for the operations, that developers can specify if they know that the object can be not immediately available.

Other Information

No response

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

AWS .NET SDK and/or Package version used

"AWSSDK.Connect" Version="3.7.312.13" />
"AWSSDK.ConnectCases" Version="3.7.302.15" />

Targeted .NET Platform

.Net 8

Operating System and version

Windows 11, Ubuntu

@michael-freidgeim-webjet Good morning. Thanks for opening feature request. The default retry mechanism would retry only on specific throttling and service timeout error codes. It is more generic and is handled at the Http pipeline level, which cannot be selectively used for specific service error codes (such as case not found for your use case). The probably cause for your issue is that after a case is created, it might take some time to replicate across different AWS regions. Hence, if you immediately invoke service operation for the specific case, it might not return results as expected.

Also note that the package AWSSDK.ConnectCases is auto-generated from service models. Hence it might not be feasible to add additional parameters to methods generated by from the service models. You might want to develop a wrapper package for your specific use case to add this retry mechanism for specific operations.

I would review this with the team to check if there is any other workaround.

Thanks,
Ashish

Thanks @ashishdhingra for your reply.
“Hence it might not be feasible to add additional parameters to methods generated by from the service models. You might want to develop a wrapper package for your specific use case to add this retry mechanism for specific operations.”
We’ve already implemented the retry mechanism in our code However the problem is not specific to our project and even to .Net, and will be beneficial, if it will be resolved by Amazon, rather that each team will resolved it by themselves after hitting intermittent errors.
By the “service model” do you mean rest api contract documented in https://docs.aws.amazon.com/connect/latest/APIReference/API_connect-cases_CreateCase.html ?
If yes, the parameter should by added to the service model to be available for any language implementation.

Thanks @ashishdhingra for your reply. “Hence it might not be feasible to add additional parameters to methods generated by from the service models. You might want to develop a wrapper package for your specific use case to add this retry mechanism for specific operations.” We’ve already implemented the retry mechanism in our code However the problem is not specific to our project and even to .Net, and will be beneficial, if it will be resolved by Amazon, rather that each team will resolved it by themselves after hitting intermittent errors. By the “service model” do you mean rest api contract documented in https://docs.aws.amazon.com/connect/latest/APIReference/API_connect-cases_CreateCase.html ? If yes, the parameter should by added to the service model to be available for any language implementation.

@michael-freidgeim-webjet Service API models are pushed to downstream SDK systems from service teams. For .NET SDK, these are available here. The API operation definition specifies parameters required for operation, including any exception that is thrown by the service. The retry behavior is specific to downstream SDK(s), which define retry behavior for specific AWS service error codes, including throttling and timeout errors. Hence, the specific parameter as you suggested cannot be supported at API level as this not required for API operation, but instead client level implementation.

Thanks,
Ashish

@ashishdhingra Thanks for explaining separation of concerns between the Service API and SDK. Service API returns exception and SDK implements retries for some known types of exceptions. SDK currently doesn’t have a mechanism to specify extra retry behavior on individual call level, but only on ClientConfig level for some known exceptions.

The problem is that the intermittent issue is not an exception of singe call and may be a genuine “not found” return. If application developer knows that the record has been created just millisecond before as a result of previous call, they should consider “not found” as a reason for retry.

If it is not feasible to implement, please consider to add a note in the documentation for individual operations, that may experience such issue, e.g.

If related objects(e.g. customer profile or case) are created just before the method call, they maybe not found immediately, as it might take some time to replicate across different AWS regions. If you experience (or likely to experience) such issue, consider to implement retry/delay mechanism for your call.

Hello @michael-freidgeim-webjet,

AWS services, including AWS Connect, operate on the principle of eventual consistency. So, immediately after creating a customer or case, it might not be available yet through the API.

Having said that, we've created an internal ticket-V1422063706 with the Amazon Connect Docs team, requesting them to improve the documentation around eventual consistency and how to handle potential delays when querying recently created data.

I will mark this as Closing-soon. Kindly let me know if you have further queries.

Thanks again for providing your feedback on the encountered issue.

Regards,
Chaitanya

I’ve implemented RetryAsync generic helper method (with the help of ChatGPT )

using System;
 using System.Collections.Generic;
 using System.Linq;
 using System.Text;
 using System.Threading.Tasks;

 namespace CommonHelpers
 {
 // from https://chatgpt.com/share/aa3a9406-163c-4b50-9c7c-36d5eb801793
     public static class RetryHelper
     {
         /// <summary>
         /// The RetryAsync method is generic, allowing it to work with any return type TResponse.
         /// </summary>
         /// <typeparam name="TResponse"></typeparam>
         /// <param name="function">The function parameter is a delegate representing the asynchronous operation to retry.</param>
         /// <param name="successCondition">The successCondition parameter is a function that determines if the result is successful.</param>
         /// <param name="maxRetries"></param>
         /// <returns>returns TResponse</returns>
         /// <example>return await RetryAsync(
         /// () => connectCases.SearchCasesAsync(searchCasesRequest),
         /// searchResult => searchResult.Cases.Count > 0,
         /// maxRetries
         /// );
         ///</example>
         public static async Task<TResponse?> RetryAsync<TResponse>(Func<Task<TResponse>> function, Func<TResponse, bool> successCondition, int maxRetries = 3)
         {
             TResponse? result = default;
             int retryCount = 0;

             while (retryCount < maxRetries)
             {
                 result = await function();
                 if (successCondition(result))
                 {
                     break; // Exit the loop if the success condition is met
                 }
                 else
                 {
                     retryCount++;
                     await Task.Delay(1000 * retryCount); // Wait for an increasing amount of time before retrying
                 }
             }

             return result;
         }
     }
 }

Example of call

public static async Task<SearchCasesResponse?> SearchRetryIfNotFound(
           IAmazonConnectCases connectCases, SearchCasesRequest searchCasesRequest, int maxRetries = 3)
{
   return await RetryAsync(
       () => connectCases.SearchCasesAsync(searchCasesRequest),
       searchResult => searchResult.Cases.Count > 0,
       maxRetries
   );
}