How to evaluate the `output_file` using `m2scorer` and `errant`

Question

How to evaluate the `output_file` using `m2scorer` and `errant`

hezy29 opened this issue 2 years ago · comments

Hi, I'm trying to do the evaluation using m2scorer, but the output_file of our model is a unique format(i.e., preprocessed format) to train the GECToR model instead of the paralleled sentences.

How can I use m2scorer in this particular circumstance to evaluate the model performance? Thanks!

Matt · Answer 1 · Thu Nov 10 2022 14:15:53 GMT+0800 (China Standard Time)

I figure out that only the training process needs to preprocess data into the special format.
The prediction process only needs to input the paralleled .src source text will do.
Thanks for your work!

Lj4040 · Answer 2 · Tue Jan 17 2023 11:46:36 GMT+0800 (China Standard Time)

@hezy29 Hello, is the training file for stage 2 the training file needed to convert the m2 format into two parallel files and then process the data? May I ask how do you convert M2 file data into parallel files? I really need your help. Thank you for your help，

Matt · Answer 3 · Tue Jan 17 2023 18:00:05 GMT+0800 (China Standard Time)

Hi @Lj4040 , the codes to convert .m2 to paralleled plain text files can be found here.