Rounding error for the number of frames
aidiary opened this issue · comments
Hi.
I found that there was a slight difference in the number of frames between linguistic features (X) and acoustic features (Y).
For example,
(filename, X_acoustic.shape, Y_acoustic.shape) (#frames, #features)
BASIC5000_0619.npy (1242, 541) (1281, 199)
BASIC5000_0538.npy (2651, 541) (2761, 199)
BASIC5000_0537.npy (587, 541) (587, 199)
https://github.com/r9y9/nnmnkwii/blob/master/nnmnkwii/frontend/merlin.py#L186
I think that it is good to modify the implementation of this part as follows.
frame_number = int((end_time - start_time) / frame_shift_in_micro_sec)
↓
frame_number = int(end_time / frame_shift_in_micro_sec) - int(start_time / frame_shift_in_micro_sec)
The original implementation of Merlin look like this.
frame_number = int(end_time/50000) - int(start_time/50000)
Regards.
Thank you for the report. I can confirm the rounding error and your fix is right. I will add the change to master with a unit test soon.
e.g.
> tail -1 BASIC5000_0619.lab
55725000 58324999 r^u-sil+xx=xx/A:xx+xx+xx/B:20-4_2/C:xx_xx+xx/D:xx+xx_xx/E:2_2!0_xx-xx/F:xx_xx#xx_xx@xx_xx|xx_xx/G:xx_xx%xx_xx_xx/H:6_19/I:xx-xx@xx+xx&xx-xx|xx+xx/J:xx_xx/K:2+11-41
58324999
was causing the rounding error. (58324999 - 55725000) / 50000 = 51.99998
, but it should be rounded to 52.