imoneoi / openchat

OpenChat: Advancing Open-source Language Models with Imperfect Data

Home Page:https://openchat.team

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

I tried to pre-tokeinzation but failed

userdsr opened this issue · comments

I tried to run 'python -m ochat.data.generate_dataset --model-type MODEL_TYPE --model-path BASE_REPO --in-files data.jsonl --out-prefix PRETOKENIZED_DATA_OUTPUT_PATH' but failed:

Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/public/home/szlab_daisr/Deploy/openchat/ochat/data/generate_dataset.py", line 167, in
generate_dataset(**vars(args))
File "/public/home/szlab_daisr/Deploy/openchat/ochat/data/generate_dataset.py", line 149, in generate_dataset
generate_split(model_type, model_path, train_conversations, "train", out_prefix, per_sequence_loss)
File "/public/home/szlab_daisr/Deploy/openchat/ochat/data/generate_dataset.py", line 131, in generate_split
parquet.write_table(pyarrow.concat_tables([ray.get(handle) for handle in handles]), f"{out_prefix}.{split_name}.parquet")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/public/home/szlab_daisr/Deploy/openchat/ochat/data/generate_dataset.py", line 131, in
parquet.write_table(pyarrow.concat_tables([ray.get(handle) for handle in handles]), f"{out_prefix}.{split_name}.parquet")
^^^^^^^^^^^^^^^
File "/public/home/szlab_daisr/anaconda3/envs/openchat2/lib/python3.11/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/public/home/szlab_daisr/anaconda3/envs/openchat2/lib/python3.11/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/public/home/szlab_daisr/anaconda3/envs/openchat2/lib/python3.11/site-packages/ray/_private/worker.py", line 2624, in get
raise value.as_instanceof_cause()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/public/home/szlab_daisr/anaconda3/envs/openchat2/lib/python3.11/site-packages/ray/exceptions.py", line 148, in as_instanceof_cause
class cls(RayTaskError, cause_cls):
TypeError: type 'pydantic_core._pydantic_core.ValidationError' is not an acceptable base type