r9y9 / nnmnkwii

Library to build speech synthesis systems designed for easy and fast prototyping.

Home Page:https://r9y9.github.io/nnmnkwii/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Rounding error for the number of frames

aidiary opened this issue · comments

Hi.

I found that there was a slight difference in the number of frames between linguistic features (X) and acoustic features (Y).

For example,

(filename, X_acoustic.shape, Y_acoustic.shape) (#frames, #features)
BASIC5000_0619.npy (1242, 541) (1281, 199)
BASIC5000_0538.npy (2651, 541) (2761, 199)
BASIC5000_0537.npy (587, 541) (587, 199)

https://github.com/r9y9/nnmnkwii/blob/master/nnmnkwii/frontend/merlin.py#L186

I think that it is good to modify the implementation of this part as follows.

frame_number = int((end_time - start_time) / frame_shift_in_micro_sec)
↓
frame_number = int(end_time / frame_shift_in_micro_sec) - int(start_time / frame_shift_in_micro_sec)

The original implementation of Merlin look like this.

https://github.com/CSTR-Edinburgh/merlin/blob/9160d9f1d18fee45d1f0398779883a410a511112/src/frontend/label_normalisation.py#L209

frame_number = int(end_time/50000) - int(start_time/50000)

Regards.

Thank you for the report. I can confirm the rounding error and your fix is right. I will add the change to master with a unit test soon.

e.g.

> tail -1 BASIC5000_0619.lab
55725000 58324999 r^u-sil+xx=xx/A:xx+xx+xx/B:20-4_2/C:xx_xx+xx/D:xx+xx_xx/E:2_2!0_xx-xx/F:xx_xx#xx_xx@xx_xx|xx_xx/G:xx_xx%xx_xx_xx/H:6_19/I:xx-xx@xx+xx&xx-xx|xx+xx/J:xx_xx/K:2+11-41

58324999 was causing the rounding error. (58324999 - 55725000) / 50000 = 51.99998, but it should be rounded to 52.