how long will the preprocessing step take

Question

how long will the preprocessing step take

ssscj opened this issue 2 years ago · comments

Hi,
thanks for developing CHEUI. I run CHEUI on my own data and the nanopolish result file is about 700G. I have been running the preprocess_m6A step using 20 cpus for 10 days and it hasn't finished yet. How long will this step take normally, and is there any way to accelerate this step? thank you.

Chujie

Eduardo Eyras · Answer 1 · Tue Nov 08 2022 15:32:30 GMT+0800 (China Standard Time)

Hi Chujie The preprocessing step can be run in parallel, as it processes one read at a time. Can you split the input data? I cc Pablo, Akanksha, and AJ who have run CHEUI with large samples and might be able to provide additional suggestions best Eduardo

…

On Tue, 8 Nov 2022 at 13:16, ssscj ***@***.***> wrote: Hi, thanks for developing CHEUI. I run CHEUI on my own data and the nanopolish result file is about 700G. I have been running the preprocess_m6A step using 20 cpus for 10 days and it hasn't finished yet. How long will this step take normally, and is there any way to accelerate this step? thank you. Chujie — Reply to this email directly, view it on GitHub <#6>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADCZKBZSCFY3CKYAMTDTICDWHGZXJANCNFSM6AAAAAARZZ7HZM> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

pabloacera · Answer 2 · Thu Nov 10 2022 08:30:25 GMT+0800 (China Standard Time)

Hi,
Just a follow up, the preprocessing step can be run in parallel using the flag -n and the number of cores you want to run it. But in terms of storage and computing require is relativelly high. I would recommend using cloud/HPC to run CHEUI.

ssscj · Answer 3 · Thu Nov 10 2022 16:34:21 GMT+0800 (China Standard Time)

Hi,
I use 20 cpus for the preprocessing step using the flag -n on a cluster, but it takes much more time than I expected.

Akanksha2511 · Answer 4 · Fri Nov 11 2022 07:28:07 GMT+0800 (China Standard Time)

Hi,

Sorry about the issue. We are working on making the preprocessing faster.

In the meantime is it possible for you to try a really large number of n. Let's say -n 400. This is because, by defining -n flag you also define how many small files you can create from the input file. The number of parallel process will be limited by the number of CPU's you have. But since it will finish faster on small files maybe over all time can be reduced.

Thanks,
Akanksha

baishengjun · Answer 5 · Sun May 07 2023 18:16:13 GMT+0800 (China Standard Time)

Hi
I generate the nanopolish result file is about 4.2T, and I use 20 cpus for the preprocessing step; but it seems only use single core to process. it's too long time to wait...; any suggestions ?

Eduardo Eyras · Answer 6 · Sun May 07 2023 19:37:53 GMT+0800 (China Standard Time)

The preprocessing is multi-threaded, so it should be possible to run it faster. We also have a C++ version that is much faster than the python version. Did you try it? I hope this helps E.

…

On Sun, 7 May 2023 at 20:16, baishengjun ***@***.***> wrote: Hi I generate the nanopolish result file is about 4.2T, and I use 20 cpus for the preprocessing step; but it seems only use single core to process. it's too long time to wait...; any suggestions ? — Reply to this email directly, view it on GitHub <#6 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADCZKB2VHGDHFBUBJ4IMWJLXE5Y7PANCNFSM6AAAAAARZZ7HZM> . You are receiving this because you commented.Message ID: ***@***.***>

baishengjun · Answer 7 · Sun May 07 2023 19:50:54 GMT+0800 (China Standard Time)

Hi,
Yes, I used the C++ version, and it generate so many folers that I can not open it, it takes too much time to open

Thanks a lot

Eileen Xue · Answer 8 · Sun May 07 2023 21:32:24 GMT+0800 (China Standard Time)

Hi,

The preprocessing step will first create a new folder and generate some temp files. The number of temp files is the same as the number of CPUs you choose in your command line. This step will use only one CPU. After all the temp files are generated, the C++ program will run in parallel with multi-threads. And the temp files will be removed in the end.

Upgrading your GCC compiler to a later version can speed up. For C++ version, I recommend setting the -n CPU, --cpu CPU to your actual physical CPU number.

Hope this helps,
Eileen

Eileen Xue · Answer 9 · Mon May 08 2023 10:50:36 GMT+0800 (China Standard Time)

Hi there,

A faster preprocessing solution:

Split the huge nanopolish file into smaller files
Run with preprocessing smaller runs
Predict CHEUI model 1 on the split files
Merge and sort the CHEUI model 2 predictions, e.g. cat *model1.tsv | sort -k1 --parallel=32 --buffer-size=128G > "./${sample_name}_combined_read_level_sorted.txt"
Run model2 on the merged file

Hope this helps,
Eileen

baishengjun · Answer 10 · Mon May 08 2023 10:55:10 GMT+0800 (China Standard Time)

Hi,
Great! I will try it.

Thanks a lot
Bai

baishengjun · Answer 11 · Wed May 10 2023 21:50:17 GMT+0800 (China Standard Time)

Hi Eileen,
I followed your suggestions, but I was unfamiliar with the preprocessing output file because it is a binary file. Can you please provide the scripts for combining all the preprocessing outputs into one file?

Thanks a lot,
Bai

Akanksha2511 · Answer 12 · Thu May 11 2023 13:51:30 GMT+0800 (China Standard Time)

Hi Bai,

To combine the split files can you please run the combine_binary_file.py in the scripts folder as below:
python3 ../scripts/combine_binary_file.py -i folder with split binary files -o combined output file name

here you need to provide the path of the folder with all the split files in it and the output file name.
Thanks,
Akanksha

baishengjun · Answer 13 · Fri May 12 2023 00:51:36 GMT+0800 (China Standard Time)

Hi Akanksha,

Big thanks for your help.
I still have a question. The splited preprocessed file and combined file was the same file format. but why the combined file size were small than splited file?

Thanks a lot,
Bai

Akanksha2511 · Answer 14 · Fri May 12 2023 10:06:23 GMT+0800 (China Standard Time)

Hi Bai,

Yes, we noted that as well in our test case for the script. But it could be because it is a binary file. Also, the number of processed signals in the combined file is equal to the sum of processed signals in the individual files. So it should be fine for the next step. But, I would recommend not deleting the individual split files until you have the final results.

Thanks,
Akanksha

baishengjun · Answer 15 · Tue May 16 2023 15:11:50 GMT+0800 (China Standard Time)

Hi Akanksha,

The combined file seem does not work in next step, it throws the following error:
2023-05-16 15:09:28.362177: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1 2023-05-16 15:09:29.863226: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2023-05-16 15:09:29.863463: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2023-05-16 15:09:29.863479: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303) 2023-05-16 15:09:29.863506: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (retox): /proc/driver/nvidia/version does not exist 2023-05-16 15:09:29.863697: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-05-16 15:09:29.867528: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set dictionary update sequence element #0 has length 152; 2 is required All signals have been processed 1
It worked with my individual split files.

Thanks a lot
Bai

Akanksha2511 · Answer 16 · Wed May 17 2023 11:15:19 GMT+0800 (China Standard Time)

Hi Bai,

Sorry, there was a bug in the code. The code was only combining the keys and not the values.
I have updated it now. Could you please give it a try now?

Thanks,
Akanksha

baishengjun · Answer 17 · Wed May 17 2023 14:16:25 GMT+0800 (China Standard Time)

Hi Akanksha,

Yes, it works, but it consumed too much memory usage.

Thanks,
Bai

Akanksha2511 · Answer 18 · Thu May 18 2023 08:53:08 GMT+0800 (China Standard Time)

Hi Bai,

Sorry, could you please try the latest updated version of the script? It should solve the memory issue.

Thanks,
Akanksha

baishengjun · Answer 19 · Thu May 18 2023 13:38:00 GMT+0800 (China Standard Time)

Hi Akanksha,

Big thanks to you, the memory issue solved.

Thanks,
Bai