gtnao / embulk-memory-leak-debug

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


This repository is dedicated to reproducing the memory leak issue associated with the buffer in Embulk.



Duplicate multiple files from seed.csv into data/

This duplication process may take some time. In my case, with a heap allocation of 1GB, garbage collection activity increased when the number of files reached around 20,000. Depending on your needs, you may adjust the heap allocation size and the number of target files.

for i in {1..30000}; do cp seed.csv data/input$i.csv; done

Run embulk with explicit heap memory allocation.

Additionally, because the results may vary depending on the number of CPU cores in each environment, the exec.max_threads and exec.min_output_tasks parameters are fixed in the config.yml file.

embulk -J-Xms1024m -J-Xmx1024m run config.yml
スクリーンショット 2023-11-22 12 09 00
