Memory Leak in REST API

Question

Memory Leak in REST API

cccs-kevin opened this issue 2 years ago · comments

Kevin commented 2 years ago

About accounts on capesandbox.com

Issues isn't the way to ask for account acctivation. Ping capesandbox in Twitter with your username

This is opensource and you getting free support so be friendly!

Free support from doomedraven ended, no whiskey no support. For something he updated the documentation :)

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest version
I did read the README!
I checked the documentation and found no answer
I checked to make sure that this issue has not already been filed
I'm reporting the issue to the correct repository (for multi-repository projects)
I'm have read all configs with all optional parts

Expected Behavior

Please describe the behavior you are expecting. If your samples(x64) stuck in pending ensure that you set tags=x64 in hypervisor conf for x64 vms

I should be able to run the REST API cape-web.service indefinitely without it restarting due to a not enough RAM.

Current Behavior

What is the current behavior?

Running the REST API cape-web.service for a long period of time causes the RAM to eventually max out on the CAPE nest and causes the REST API to restart with a status code 247. There is definitely a memory leak somewhere in this process.

Failure Information (for bugs)

Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.

See the logs I've attached. There are 6 things that I use the REST API for:

Get the status of CAPE via GET /apiv2/cuckoo/status/
Search for the SHA256 of a sample via GET /apiv2/tasks/search/sha256/<sha256>/
Submit a sample for file analysis via POST /apiv2/tasks/create/file/
Poll the task by task ID until it is completed via GET /apiv2/tasks/view/<task-id>/
Get the lite JSON report and ZIP generated via GET /apiv2/tasks/get/report/<task-id>/lite/zip/
Delete the task via GET /apiv2/tasks/delete/<task-id>/

Somewhere in these calls, the memory is being leaked...

Steps to Reproduce

Please provide detailed steps for reproducing the issue.

Submit a bunch of files that follow this logic:
Get the status of CAPE via GET /apiv2/cuckoo/status/
Search for the SHA256 of a sample via GET /apiv2/tasks/search/sha256/<sha256>/
Submit a sample for file analysis via POST /apiv2/tasks/create/file/
Poll the task by task ID until it is completed via GET /apiv2/tasks/view/<task-id>/
Get the lite JSON report and ZIP generated via GET /apiv2/tasks/get/report/<task-id>/lite/zip/
Delete the task via GET /apiv2/tasks/delete/<task-id>/
Watch the memory usage go to the moon 🚀

Context

Please provide any relevant information about your setup. This is important in case the issue is not reproducible except for under certain conditions.

Question	Answer
Git commit	Most recent (I think... I don't use Git to update my CAPE)
OS version	Ubuntu 20.04

Failure Logs

Please include any relevant log snippets or files here.

cape@cape-nest-vm:~$ sudo systemctl status cape-web.service
● cape-web.service - CAPE WSGI app
     Loaded: loaded (/lib/systemd/system/cape-web.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2022-08-24 15:10:59 UTC; 5h 2min ago
       Docs: https://github.com/kevoreilly/CAPEv2
   Main PID: 3763817 (python)
      Tasks: 18 (limit: 77163)
     Memory: 6.6G
     CGroup: /system.slice/cape-web.service
             ├─3763817 /home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.8/bin/python manage.py runserver_plus 0.0.0.0:8000 --traceback --keep-meta-shutdown
             └─3763841 /home/cape/.cache/pypoetry/virtualenvs/capev2-t2x27zRb-py3.8/bin/python /opt/CAPEv2/web/manage.py runserver_plus 0.0.0.0:8000 --traceback --keep-meta-shutdown

Aug 24 20:13:17 cape-nest-vm python3[3763841]: <ip> - - [24/Aug/2022 20:13:17] "GET>
Aug 24 20:13:17 cape-nest-vm python3[3763841]: Get task: 6598.5546875
Aug 24 20:13:17 cape-nest-vm python3[3763841]: <ip> - - [24/Aug/2022 20:13:17] "GET >
Aug 24 20:13:17 cape-nest-vm python3[3763841]: Get task: 6598.5546875
Aug 24 20:13:17 cape-nest-vm python3[3763841]: <ip> - - [24/Aug/2022 20:13:17] "GET>
Aug 24 20:13:17 cape-nest-vm python3[3763841]: Get report 6598.5546875
Aug 24 20:13:17 cape-nest-vm python3[3763841]: <ip> - - [24/Aug/2022 20:13:17] "GET>
Aug 24 20:13:17 cape-nest-vm python3[3763841]: Get task: 6598.5546875
Aug 24 20:13:17 cape-nest-vm python3[3763841]: <ip> - - [24/Aug/2022 20:13:17] "GET>

The line that is generating the Get task: <Memory usage for process> is print("Get task:", psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2). That value has been slowly increasing over the day from a starting value of ~100MB.

doomedraven · Answer 1 · Thu Aug 25 2022 15:50:32 GMT+0800 (China Standard Time)

Hello, thats interesting, that will be hard to debug. I personally use it with uwsgi and recycling workers after X jobs. So i guess is why i didn't spot it before. We already saw similar in process.py but Sandor added garbage collector there so we were able to spot the problem relatively easy

Kevin · Answer 2 · Sat Aug 27 2022 03:40:38 GMT+0800 (China Standard Time)

Okay cool, do you have any documentation for setting up the REST API with UWSGI rather than Django?

doomedraven · Answer 3 · Sat Aug 27 2022 03:44:42 GMT+0800 (China Standard Time)

good point, i will have to add it. we use it as uwsgi+nginx, but you also can use it as standalone

cat /etc/uwsgi/apps-enabled/cape.ini
[uwsgi]
lazy-apps = True
vacuum = True
;http-socket = 127.0.0.1:8000
http-socket = 0.0.0.0:8000
static-map = /static=/opt/CAPEv2/web/static
plugins = python38
callable = application
chdir = /opt/CAPEv2/web
file = web/wsgi.py
env = DJANGO_SETTINGS_MODULE=web.settings
uid = cape
gid = cape
enable-threads = true
master = true
processes = 10
workers = 10
;max-requests = 300
manage-script-name = true
;disable-logging = True
listen = 2056
;harakiri = 30
hunder-lock = True
#max-worker-lifetime = 30
;Some files found in this directory are processed by uWSGI init.d script as
;uWSGI configuration files.

after place that in that file, just start or restart uwsgi and it will start your web using uwsgi, you might need to adjust workers/work processes

doomedraven · Answer 4 · Sat Aug 27 2022 03:49:46 GMT+0800 (China Standard Time)

check this for instalation https://capev2.readthedocs.io/en/latest/usage/dist.html?highlight=uwsgi#good-practice-for-production
i need to fix formating, there is also modules as you can see here is plugin=python38

Kevin · Answer 5 · Fri Sep 02 2022 03:44:09 GMT+0800 (China Standard Time)

For those who are looking at this issue, using UWSGI to run Django rather than the default method of using the cape-web.service avoids RAM buildup to the point where the Django process crashes, BUT it does not solve the memory leak issue that is still present in the REST API...

Nick Bargnesi · Answer 6 · Fri Sep 02 2022 06:25:00 GMT+0800 (China Standard Time)

Using Gunicorn here, I don't see any leaks in the APIv2 paths you've mentioned. Not to say they aren't there - just that I can't reproduce them.

If there is a leak in those paths, I'd suspect POST /apiv2/tasks/create/file/ and GET /apiv2/tasks/get/report//lite/zip/.

FWIW you can get a quick sense of memory just by looking at the systemctl status output:

If you can narrow it down a bit more, it should be pretty easy to debug. Toss some Python debugging symbols in there and wrap the whole thing in Valgrind:

$ sudo apt install python3-dbg valgrind
$ valgrind --tool=memcheck --leak-check=yes --log-file=whats-eating-gilbert-cape.log /usr/bin/python3 -m poetry run python manage.py runserver_plus 0.0.0.0:8000 --traceback --keep-meta-shutdown

doomedraven · Answer 7 · Wed Jan 11 2023 16:24:32 GMT+0800 (China Standard Time)

Im checking this now with https://github.com/bloomberg/memray and https://github.com/bloomberg/pytest-memray

doomedraven · Answer 8 · Wed Jan 11 2023 18:04:12 GMT+0800 (China Standard Time)

i can't reproduce it. Kevin do you want to help test this or we can close it as you have solved it with uwsgi?

Kevin · Answer 9 · Wed Jan 11 2023 21:42:30 GMT+0800 (China Standard Time)

i can't reproduce it. Kevin do you want to help test this or we can close it as you have solved it with uwsgi?

I don't have the cycles to test this out, so we can close this ticket as long as people are aware that UWSGI is the preferred method of running the API.