jondot / sneakers

A fast background processing framework for Ruby and RabbitMQ

Home Page:https://github.com/jondot/sneakers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Memory leak

EmptyLungs opened this issue · comments

I'm starting sneakers(2.12.0) process like this:

CMD ["bundle", "exec", "rails", "sneakers:run"]

with this confiuration:

  Sneakers.configure(
    amqp: RabbitClient::CONFIG.amqp,
    vhost: RabbitClient::CONFIG.vhost,
    heartbeat: 5,
    workers: 24,
    threads: 1,
    prefetch: 24,
    durable: true,
    log: $stdout,
    env: Rails.env,
    ack: true
  )

Even if I specify 1 thread per worker, anyways I see 19 processes per worker:

ps huH p 381 | wc -l
19

And in the result we have a huge memory leak after first job is started:

$ ps axo rss,comm,pid | awk '{ proc_list[$2] += $1; } END { for (proc in proc_list) { printf("%d\t%s\n", proc_list[proc],proc); }}' | sort -n | tail -n 10 | sort -rn | awk '{$1/=1024;printf "%.0fMB\t",$1}{print $2}'

23257MB	ruby
4MB	bash

I'm not familiar with Ruby so could you please tell how to profile such an issue?

You have 24 workers, so why do you expect fewer than that number of processes?

I don't really see any proof of a memory leak. Peak memory use is not the same thing as a leak.
You have 24 workers and a prefetch value of 24, which means you can have up to 24 * 24 = 576 messages delivered and unacknowledged at any given moment.

Depending on their size it can have a massive effect on the amount of memory used.

Perhaps try using fewer workers, and if your messages can be large (say, in megabytes), a lower prefetch of 8-16.

@michaelklishin thanks for you reply!
I've added monitoring with Node Exporter and Prometheus, here's some metrics:
image

To elaborate on the screenshot, we rebooted the sneakers docker container at around 19:00, and it had some minor tasks to process. Then, at just past 00:00, there is a major workload that lasts for roughly 45 minutes. After it finished, the memory used is not released back.

Could you please tell how Worker class instance handles tasks, does it get destroyed after messages were acked/nacked? If not, looks like we should avoid storing any data in those workers.