albertaparicio / tfg-voice-conversion

Deep Learning-based Voice Conversion system

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A question about "Zaska" and "dtw -b", how could I get more feature by running "compute_dtw.sh"?

HudsonHuang opened this issue · comments

I tried the solution provided by lf0_lstm.py and so. When I tried to modify the parameters in tranning, a script in /data/training/compute_dtw.sh made me confused.

` ZASKA="Zaska -P $PRM_NAME $PRM_OPT"

# Compute mfcc $DIR_REF/${FILENAME}.wav $DIR_TST/${FILENAME}.wav => mfcc/$DIR_REF/${FILENAME}.prm mfcc/$DIR_TST/${FILENAME}.prm
${ZASKA} -t RAW -x wav=msw -n . -p mfcc -F ${DIR_REF}/${FILENAME}_sil ${DIR_TST}/${FILENAME}_sil

# Align: mfcc/${DIR_REF}/${FILENAME}.prm, mfcc/${DIR_TST/${FILENAME}.prm => dtw/${DIR_REF}-${DIR_TST}/${FILENAME}.dtw
b=2
dtw -b -${b} -t mfcc/${DIR_TST} -r mfcc/${DIR_REF} -a ${DIR_DTW}/beam${b} -w -B -f -F ${FILENAME}_sil`

Running the script is diffcult, as the command "Zaska" is not exist in any package I found and the "dtw" command doesnt have the parameter of "-b" . How could I sovle it?

By the way, I want to run this script because I wanted to add more parameters on training, I modified "tfglib" and tried to build the /data/train_datatable.h5 again.

It resulting in very few harmonic elements, may need to use high-order of feature extraction and adjust the network to fit more high-order implied feature.(It seems the default training also resulting over-fitting.)
In addition, the result of converted result sounds dull and low, lacks a sense of penetration, may due to the lack of high-order features harmonic elements.

Dear Hudson,

Sorry for the delay in my response, I have been busy working on the seq2seq model.

Regarding the Zaska and dtw scripts, they belong to the Signal Theory and Communications department at the UPC university in Catalonia (this project is being developed for my bachelor thesis).

I have contacted the people at the department to ask them if I can distribute these scripts. I'll get back to you as soon as I have a response

Regarding the resulting sound of the system, I am aware that it does not give very accurate results. You see, the scripts you write about belong to the 'baseline' of the system. This version was developed only to have a reference level of results quality, as we have been focusing our efforts (and still are) on the sequence-to-sequence model.

If you find a way to improve this baseline, that is great news, but we are not going to work on it anymore

As always, thank you for your interest in this project

Cheers!

Dear Albert,

Thank you so much for your response. The seq2seq model is definitely a good idea.

And as a reference, you can also check up this company:https://lyrebird.ai/. They are trying to give out an API-level Voice Conversion Solution, for commercial purposes. And it seems they have a good team including Yoshua Bengio.

But as you can see, they still didn't reach a much higher quality as the Mixture Neural Network solution in your project, I mean, maybe they have set a peak level for the Voice Conversion Systems, which is still not very natural, so don't be discouraged if the seq2seq solution doesn't work much better than the Mixture Neural Network solution.

Best regards!

@MissPassenger
I found that the ZASKA is an DTW toolkit developed by the UPC and,the dtw is a DTW tool inside of it.
so, I am trying to instead it with mfcc and dtw code in SPTK。
like this:
`
b=2
sox mfcc/${DIR_REF}/${FILENAME}_sil.wav mfcc/${DIR_REF}/${FILENAME}_sil.raw
sox mfcc/${DIR_TST}/${FILENAME}_sil.wav mfcc/${DIR_TST}/${FILENAME}_sil.raw

x2x +sf < mfcc/${DIR_REF}/${FILENAME}_sil.raw | frame -l 480 -p 80 | \
	mfcc -l 480 -m 20 -s 16 > mfcc/${DIR_REF}/${FILENAME}.mfcc
	
x2x +sf < mfcc/${DIR_TST}/${FILENAME}_sil.raw | frame -l 480 -p 80 | \
	mfcc -l 480 -m 20 -s 16 > mfcc/${DIR_TST}/${FILENAME}.mfcc

dtw -l 480 mfcc/${DIR_REF}/${FILENAME}.mfcc < mfcc/${DIR_TST}/${FILENAME}.mfcc >> ${DIR_DTW}/${FILENAME}_ascii.dtw

x2x +af ${DIR_DTW}/${FILENAME}_ascii.dtw  ${DIR_DTW}/beam${b}/${FILENAME}.dtw

`

but the dtw command output a unreadable format for x2x command and build_datatable
ansd which seem to be ASCII
I use x2x +af to convert it but it fails.
Any idea?
Thanks.

That helps a lot~ many thanks.