oomkilled
jentur-zabbeJ-8basdy opened this issue · comments
默认的脚本
`set -x
export BS=${BS:-16}
export MEMCAP=${MEMCAP:-0}
export GPUNUM=${GPUNUM:-1}
export MODLE_PATH="facebook/opt-${MODEL}"
model_name_or_path=./opt6.7b
# HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1
torchrun
--nproc_per_node ${GPUNUM}
--master_port 19198
train_gemini_opt.py
--mem_cap ${MEMCAP}
--model_name_or_path ${model_name_or_path}
--batch_size ${BS} `
Environment
版本:torch1.12+cu113
deepspeed:0.7.7
内存:80G
Originally posted by @iMountTai in hpcaitech/ColossalAI#2772
oomkilled
默认的脚本
`set -x
export BS=${BS:-16}
export MEMCAP=${MEMCAP:-0}
export GPUNUM=${GPUNUM:-1}
export MODLE_PATH="facebook/opt-${MODEL}"
model_name_or_path=./opt6.7b
# HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1
torchrun \
--nproc_per_node ${GPUNUM} \
--master_port 19198 \
train_gemini_opt.py \
--mem_cap ${MEMCAP} \
--model_name_or_path ${model_name_or_path} \
--batch_size ${BS} `
Environment
版本:torch1.12+cu113
deepspeed:0.7.7
内存:80G
Originally posted by @iMountTai in hpcaitech/ColossalAI#2772