JayYip / m3tl

BERT for Multitask Learning

Home Page:https://jayyip.github.io/m3tl/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Out-of-memory issue

autapomorphy opened this issue · comments

I tried to run the notebook Run Pre-defined problems.ipynb
after

train_bert_multitask(problem='weibo_ner&weibo_cws', num_gpus=1, num_epochs=3)

I got the error message:

Traceback (most recent call last):
File "/cluster/kappa/90-days-archive///g_transformer/git/bert-multitask-learning/bert_multitask_learning/params.py", line 206, in assign_problem
self.get_data_info(self.problem_list, self.ckpt_dir)
File "/cluster/kappa/90-days-archive///g_transformer/git/bert-multitask-learning/bert_multitask_learning/params.py", line 270, in get_data_info
list(self.read_data_fnproblem))
File "/cluster/kappa/90-days-archive///g_transformer/git/bert-multitask-learning/bert_multitask_learning/create_generators.py", line 300, in create_single_problem_generator
example_list=example) for example in example_list
File "/cluster/tufts//lib/anaconda3/envs/1001-nlp/lib/python3.7/site-packages/joblib/parallel.py", line 1017, in call
self.retrieve()
File "/cluster/tufts/
/lib/anaconda3/envs/1001-nlp/lib/python3.7/site-packages/joblib/parallel.py", line 909, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/cluster/tufts//lib/anaconda3/envs/1001-nlp/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 562, in wrap_future_result
return future.result(timeout=timeout)
File "/cluster/tufts/
/lib/anaconda3/envs/1001-nlp/lib/python3.7/concurrent/futures/_base.py", line 435, in result
return self.__get_result()
File "/cluster/tufts/**/lib/anaconda3/envs/1001-nlp/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {SIGKILL(-9)}

How much RAM do I need?