How to use ./test.py to test a model on my own dataset?
codeandbit opened this issue · comments
I have made my own train,val,and test dataset. The file structure is:
data
├── test
│ ├── data.mdb
│ └── lock.mdb
├── train
│ └── real
│ ├── data.mdb
│ └── lock.mdb
└── val
├── data.mdb
└── lock.mdb
I have trained a model using my own dataset under data/train/real/
and data/val/
. But I don't know how to test this model using dataset under data/test/
.
In test.py, I can't find a parameter specifying the test dataset:
Lines 65 to 75 in 8fa5100
And if I run directly ./test.py outputs/<model>/<timestamp>/checkpoints/last.ckpt
, it will report an error:
Additional keyword arguments: {'charset_test': '0123456789abcdefghijklmnopqrstuvwxyz'}
Traceback (most recent call last):
File "./test.py", line 133, in <module>
main()
...
...
...
lmdb.Error: data/test/CUTE80: No such file or directory
My charset_test
is not '0123456789abcdefghijklmnopqrstuvwxyz' and I don't want to test the model on CUTE80.
In strhub/data/module.py
, at the top of the function there is a bunch of variable. These are tuples that contain test dataset names, including CUTE80. By default, test.py
will only look after TEST_BENCHMARK
and TEST_BENCHMARK_SUB
.
There is already a parameter in test.py
called new
that allows you to include TEST_NEW
datasets to you test trial.
If you want to train only on your own data set, you can create another tuple, let's say TEST_CUSTOM=("MyDataset" , )
and then in test.py
change the way test datasets are selected:
parser.add_argument('--std', action='store_true', default=False, help='Evaluate on standard benchmark datasets')
parser.add_argument('--new', action='store_true', default=False, help='Evaluate on new benchmark datasets')
parser.add_argument('--custom', action='store_true', default=True, help='Evaluate on custom personal datasets')
[...]
test_set = tuple()
if args.std:
test_set = SceneTextDataModule.TEST_BENCHMARK_SUB + SceneTextDataModule.TEST_BENCHMARK
if args.custom:
test_set += SceneTextDataModule.TEST_CUSTOM
if args.new:
test_set += SceneTextDataModule.TEST_NEW
test_set = sorted(set(test_set))
[...]
result_groups = dict()
if args.std:
result_groups.update({'Benchmark (Subset)': SceneTextDataModule.TEST_BENCHMARK_SUB})
result_groups.update({'Benchmark': SceneTextDataModule.TEST_BENCHMARK})
if args.custom:
result_groups.update({'Custom': SceneTextDataModule.TEST_CUSTOM})
if args.new:
result_groups.update({'New': SceneTextDataModule.TEST_NEW})
Now if you want to change your charset_test
:
- Open
configs/main.yaml
- Under
model
setcharset_test: ???
- Open
configs/charset/your_custom_file_or_we_else.yaml
- Add the line
charset_test: "myCustomCharsetTest"
- Make sur in
configs/main.yaml
that at the top, under defaultscharset: your_custom_file_or_we_else
Edit: The modification of the .yaml file must be done prior to training of the model even though it concerns charset_test
Alternatively you can just change the value of charset_test
in main.yaml
but the lines above allow you to have multiple charset test under the conditions that you explicitly provide it in every charset_like.yaml
file
That's so cool! I successfully test the model using your method. Thank you very much!
Excuse me, but is this necessary? Do I have to change the code:
Line 84 in 8fa5100
to
kwargs.update({'charset_test': 'Lots of characters in my charset_test...'})
When I test the model on my test set 2 hours ago, it worked normally but I didn't change this line of code. The output was:
Additional keyword arguments: {'charset_test': '0123456789abcdefghijklmnopqrstuvwxyz'}
MyDataset: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 121/121 [01:20<00:00, 1.49it/s]
And the accuracy was 94%.
Did I just use '0123456789abcdefghijklmnopqrstuvwxyz' to test the model? But If so, the accuracy should be very low.
Also, I didn't change the value of charset_test
in main.yaml
directly. I just perform the following actions:
- Open
configs/main.yaml
- Under model set
charset_test: ???
- Open
configs/charset/your_custom_file_or_we_else.yaml
- Add the line
charset_test: "myCustomCharsetTest"
- Make sure in
configs/main.yaml
that at the top, under defaultscharset: your_custom_file_or_we_else
I have run a few tests on my own and it seems that any charset_test
from a .yaml file, whether it is in some custom_case.yaml
or even in the main.yaml
never gets to the model in test.py
if you changed them after you actually trained the model. If I print hp.charset_test
I get the full 94 char (which were my charsets when I trained my model) regardless of the current value of my charset_test
field in .yaml files.
My thoughts are that since you load an already built model, the charset_test
is already define within the model. And this is why changing the value in .yaml a posteriori doesnt change anything. So i guess the trick with the .yaml file editing only works if you did so since training.
Though, as mentionned by baudm, the kwargs.update({'charset_test': charset_test})
overwrite it all and you can actually define your charset_test
as will, even after you trained your model. By defaults kwargs.update({'charset_test': charset_test})
is set to '0123456789abcdefghijklmnopqrstuvwxyz'
.
Now regarding the result of your test, it is indeed very strange. Something off somewhere and I couldn't figure out what
I would recommend to retry a test and directly edit kwargs.update({'charset_test': "My very long charset test"})
Yes, It is necessary to change the code:
Line 84 in 8fa5100
to
kwargs.update({'charset_test': 'Lots of characters in my charset_test...'})
Before change, if I print hp.charset_test
, it prints the full 94 char.
After change, if I print hp.charset_test
, it prints my own charset.
Thank you!
To put it in a nutshell, you can either:
- Change
charset_test
field in .yaml before training such that the model retain this value when load for testing. If you want to use this charset for test you need to disable kwargs.update ofcharset_test
. - Change
charset_test
value of the model at testing, after training, thanks tokwargs.update({'charset_test': "My Charset Test"})
.
By defaults: charset_test is overriden by kwargs.update with value "0123456789abcdefghijklmnopqrstuvwxyz". This default string can further be modified with the native parameter cased
and punctuation
of test.py
@baudm Do you concur ?
To clarify:
charset_test
(from the yaml config) is only used by the model for validation during training.test.py
always overridescharset_test
. It became a bit confusing because I decided to not use Hydra in the scripts for inference (test.py
,read.py
). Still thinking whether to refactor that bit.
In summary, for now just specify charset_test
directly inside test.py
.
Aaah thank you for clarifications !
But then, to make it more transparent, shouldn't charset_test
from yaml config be called charset_val
?
@bmusq it can't be since all models (i.e. BaseSystem
subclasses) expect a charset_test
parameter.
Since a model checkpoint (either .ckpt or .pt) already contains the hyperparameters used for training, the config files aren't used anymore during inference (hence the decision to not use Hydra for test.py
and read.py
).
Hydra is only a train-time dependency. It's not required for using the model during inference. charset_test
is used by validation_step
and test_step
.