codeyu / Hangfire.LiteDB

LiteDB storage for Hangfire.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Lock never released

niklr opened this issue · comments

Dear @codeyu

It seems the (distributed) lock is never released under certain circumstances. This happens only after the database reaches a reasonable size around 100MB or more. Trying to access the deleted jobs freezes the dashboard and basically never returns a response. At this point Hangfire can no longer execute any jobs and the dashboard hangs. The amount of deleted jobs is probably not causing the issue but rather the content of them. I just have around 300 deleted jobs but each of them contains a reasonably large aggregated exception message of around 2000+ lines.

Versions used:

  • Hangfire.Core: v1.7.7
  • Hangfre.LiteDB: v0.3.0
  • Hangfire.AspNet: v0.2.0

Hi @niklr ,

Can you tell us what values ​​you are using in the LiteDbStorageOptions class?

On the other hand, if you are using the default values ​​it seems excessive to have a 100MB database.

Follow these recommendations for the use of Hangfire. Mainly the following:

image

https://docs.hangfire.io/en/latest/best-practices.html

Hi @felixclase

I'm using the default values for LiteDbStorageOptions indeed and already following the best practices. However, the job arguments don't have an impact on the content of the potential exception being thrown by the job itself. Let's assume we would like to delete files in a background job:

[DisableConcurrentExecution(10)]
[AutomaticRetry(Attempts = 0, OnAttemptsExceeded = AttemptsExceededAction.Delete)]
public async Task CleanupFilesAsync(IJobCancellationToken ct)
{
    List<Exception> exceptions = new List<Exception>();

    var fileRepository = new FileRepository();

    var files = fileRepository.GetQuery().OrderBy(e => e.Id).Take(100).ToList();

    foreach (var file in files)
    {
        try
        {
            ct.ThrowIfCancellationRequested();

            fileRepository.Delete(file);
            System.IO.File.Delete(file.Path);
        }
        catch (OperationCanceledException e)
        {
            exceptions.Add(e);
            break;
        }
        catch (Exception e)
        {
            exceptions.Add(e);
        }
    }

    if (exceptions.Count > 0)
        throw new AggregateException(exceptions);

    await Task.CompletedTask;
}

In some cases we might not have appropriate permissions to delete any of the files on the file system resulting in an AggregateException containing 100 UnauthorizedAccessExceptions which are finally written to LiteDB. Of course this could be implemented differently but in the end it's just an example to reproduce a potential bug involving the distributed lock never being released.

I will try to replicate the error.

Meanwhile you can register these errors with Serilog or register it with the extension Hangfire.JobsLogger (https://github.com/raisedapp/Hangfire.JobsLogger), it creates it to keep track of job executions, but also the You can use to record exceptions.

If you use Hangfire.JobsLogger, your code could be implemented as follows:

[DisableConcurrentExecution(10)]
[AutomaticRetry(Attempts = 0, OnAttemptsExceeded = AttemptsExceededAction.Delete)]
public async Task CleanupFilesAsync(IJobCancellationToken ct,  PerformContext pf)
{
    var hasException = false;
    var fileRepository = new FileRepository();

    var files = fileRepository.GetQuery().OrderBy(e => e.Id).Take(100).ToList();

    foreach (var file in files)
    {
        try
        {
            ct.ThrowIfCancellationRequested();

            fileRepository.Delete(file);
            System.IO.File.Delete(file.Path);
        }
        catch (OperationCanceledException e)
        {
            hasException = true;
            context.LogError($"OperationCanceledException.. Detail: {e.ToString()}");
            break;
        }
        catch (Exception e)
        {
            hasException = true;
            context.LogError($"Exception.. Detail: {e.ToString()}");
        }
    }

    if (hasException)
        throw new Exception($"Error processing one of the files. Check the detail on the dashboard.");

    await Task.CompletedTask;
}

thanks @felixclase , fixed in v0.3.1 that released nuget.org.