DLoRA-TrOCR

Architecture:

In the Transformer layer of the image encoder, DoRA weight was embedded into the multi-head attention weight part. In the Transformer layer of the text decoder, the LoRA weights are partially embedded for the multi-head attention weights and the masked multi-head attention weights. The overall architecture of the method is as follows,

Usage:

Dataset setting

Folder:
Dataset/
    IAM/
        train/
            img1.jpg
            ...
        gt_train.txt
        test/
            ...
        gt_test.txt
    SROIE/
        train/
            img1.jpg
            ...
        test/
            ...
    STR_BENCHMARKS/
        IC13/
            train/
                img1.jpg
                ...
            test/
                ...
        IC15/
            train/
                img1.jpg
                ...
            test/
                ...
        III5K/
            train/
                img1.jpg
                ...
            test/
                ...
        SVT
            train/
                img1.jpg
                ...
            test/
                ...
        SVTP
            test/
                ...
        CU80
            test/
                ...

train

# use huggingface trainer
python train.py --processor_dir path to processor  \
--model_dir path to model --save_dir path to save \
--data_dir path to datasets --log_file_dir ./log_file/ \
--epochs 20 --batch_size 16 --task IAM_SROIE_STR --max_len 32 \
--peft_mode --peft_method LORA \
--target_modules e_qkv --rank 8 --alpha 32 --lr 5e-5 \
--eval_mode --use_trainer

# use lora
python train.py --processor_dir path to processor  \
--model_dir path to model --save_dir path to save \
--data_dir path to datasets --log_file_dir ./log_file/ \
--epochs 20 --batch_size 16 --task IAM_SROIE_STR --max_len 32 \
--peft_mode --peft_method LORA \
--target_modules e_qkv --rank 8 --alpha 32 --lr 5e-5 \
--eval_mode

test

python main.py --processor_dir path to processor  \
--model_dir path to model --save_dir path to save \
--data_dir path to datasets --log_file_dir ./log_file/ \
--epochs 20 --batch_size 16 --task IAM_SROIE_STR --max_len 32 \
--peft_mode --peft_method LORA \
--target_modules e_qkv --rank 8 --alpha 32 --lr 5e-5 \
--eval_mode --only_test

VceChang / DLoRA-TrOCR