jerryji1993 / DNABERT

DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome

Home Page:https://doi.org/10.1093/bioinformatics/btab083

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Conflict in package version

MagpiePKU opened this issue · comments

Hi,

The current repo requests a tokenizer==0.5.0 however it calls for a tokenizer function which is from 0.10.0.

Installation with conda strictly followed the order on README but failed to run run_finetune.py. With import issues.

export PATH_TO_DNABERT_REPO=/gpfs/bin/DNABERT
export SOURCE=/gpfs/bin/DNABERT
export KMER=6
export MODEL_PATH=/gpfs/bin/DNABERT/pretrained/6-new-12w-0
export DATA_PATH=sample_data/ft/prom-core/$KMER
export OUTPUT_PATH=./ft/prom-core/$KMER

python run_finetune.py
--model_type dna
--tokenizer_name=dna$KMER
--model_name_or_path $MODEL_PATH
--task_name dnaprom
--do_train
--do_eval
--data_dir $DATA_PATH
--max_seq_length 75
--per_gpu_eval_batch_size=16
--per_gpu_train_batch_size=16
--learning_rate 2e-4
--num_train_epochs 3.0
--output_dir $OUTPUT_PATH
--evaluate_during_training
--logging_steps 100
--save_steps 4000
--warmup_percent 0.1
--hidden_dropout_prob 0.1
--overwrite_output
--weight_decay 0.01
--n_process 8

Traceback (most recent call last):
File "run_finetune.py", line 69, in
from transformers import glue_compute_metrics as compute_metrics
ImportError: cannot import name 'glue_compute_metrics'

Yi

Hi,

Thanks for pointing this out. Can you successfully run the code when upgrade tokenizer to 0.10.0?

Closing this for now but can continue the discussion if needed.