Training is not working

Question

Training is not working

jonDel opened this issue 6 years ago · comments

When using the hardcoded weights, the predict script works fine and produces a multi-notes midi file. However, when training from scratch, using lstm script for learning the weigths, the predict script produces a single note midi file. I tried several combinations of subsets (from midi songs folder), and the result is always the same.

יוֹנָתָן · Answer 1 · Sun May 06 2018 23:09:33 GMT+0800 (China Standard Time)

Any thoughts on this?

Sigurður Skúli Sigurgeirsson · Answer 2 · Mon May 07 2018 03:42:31 GMT+0800 (China Standard Time)

Hi, I just pushed a fix to a bug that might have been causing this as well. See if it has been fixed. :) Sincerely, Sigurður Skúli

…

On Sun, May 6, 2018 at 3:09 PM, יוֹנָתָן ***@***.***> wrote: Any thoughts on this? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#3 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFj9p6DSNFrnpQPnh7w15VUZxBzKK5fhks5tvxIugaJpZM4TqRwc> .

יוֹנָתָן · Answer 3 · Wed May 09 2018 05:28:44 GMT+0800 (China Standard Time)

Thanks! I will try again and notify here of it works.

KPlanas · Answer 4 · Mon Jun 04 2018 13:32:39 GMT+0800 (China Standard Time)

Hi, I am having the same problem as the one mentioned above.
When training the LSTM on my own music files, it generates only a single repeating note MIDI File

Any update on this? Am I doing something wrong or are there some conditions on the dataset?
Also, when you say "Single Instrument", does it mean that it works with a MIDI File with Left Piano and Right Piano Track, or should I only keep one of them?

Sigurður Skúli Sigurgeirsson · Answer 5 · Thu Jun 14 2018 04:04:07 GMT+0800 (China Standard Time)

It only works with a single track. If you can combine the left and right piano tracks into a single track that would work best.

Regarding the repeating note I had that issue at the start. How big is your dataset?

KPlanas · Answer 6 · Thu Jun 14 2018 15:46:27 GMT+0800 (China Standard Time)

Ok, so I will try to mix them into a single instrument track

Sadly, I have a really small dataset (Trying to generate melodies corresponding to an emotion, so quite hard to find single instrument emotion-bearing examples). Depending on the dataset I am using, it ranges from 10 to 20.

For now, I am brute forcing this by increasing the number of epochs (200 -> 500) and it kinda works depending on the dataset. I suppose you had the same problem when your dataset was small ?

Next thing I'm gonna try is resampling the dataset by tone shifting by one or two tones, I hope it will work.

Jitesh Singhvi · Answer 7 · Tue Jun 26 2018 14:09:27 GMT+0800 (China Standard Time)

Hey there! So I am facing the issue as the one above. I have a decent dataset of 32 midi files taken from http://www.piano-midi.de/beeth.htm. I believe they all consist of single instrument but I wouldn't bet my life on it. I tried different combinations of epochs (100 with a single midi file, 10 with the above mentioned dataset) , number of notes to generate etc.
Here is a portion of the value of 'prediction' from predict.py :
[[1.13396626e-02 2.28804900e-04 9.67228334e-05 4.83751064e-04
2.05806704e-04 1.13458314e-04 6.99275770e-05 1.14716301e-02
1.34334527e-03 7.15563074e-05 1.54286786e-03 2.23979194e-04
9.47747903e-04 4.45510354e-03 7.15357764e-03 7.32331682e-05
2.32197126e-04 6.38070842e-03 2.12815910e-04 4.92099812e-03
.....]]
just to give you an idea.
Everytime numpy.argmax is applied to this 'prediction' value, it always picks the 240th note.
Any suggestions?

Jitesh Singhvi · Answer 8 · Tue Jun 26 2018 14:12:19 GMT+0800 (China Standard Time)

Also, I run tensorflow on cpu (intel i5) so increasing the number of epochs wouldn't be my first choice as it took about 2 hours just for a single epoch.
Still, any thoughts on the matter would be appreciated.

KPlanas · Answer 9 · Thu Jun 28 2018 22:36:59 GMT+0800 (China Standard Time)

Hi !
To check if your MIDIs are single instrument , try using music21 instrument function or something like this
Also, just to give you an idea, it took me about 30 hours or so to train 200 epochs with the given dataset on a Nvidia P5000

Jitesh Singhvi · Answer 10 · Sun Jul 01 2018 03:40:00 GMT+0800 (China Standard Time)

Hey! thanks for replying @KPlanas .
I'm positive that I have a single instrument midi files. I plan on training the model for 200 epochs on a Tesla K80 (Google Colab). Maybe it will resolve the problem? I certainly hope so.
What I still don't understand is this - since I trained it for a small number of epochs, I'd be okay if it generated some random garbage tune, but why on earth would it pick a single note during generation, EVERY SINGLE TIME?! It's still a mystery to me.

Jitesh Singhvi · Answer 11 · Mon Jul 02 2018 13:48:57 GMT+0800 (China Standard Time)

Update : After ~180 epochs, the model finally works fine! Although the loss plateaued in the last 50 epochs, it's still pretty good.

KPlanas · Answer 12 · Tue Jul 03 2018 16:20:06 GMT+0800 (China Standard Time)

Hi !
I have no idea why (Maybe I should study the theory slightly more), but it also gives me repeated single note "music" at first when the loss is around 4. Still a mystery...
Anyway, good that it works for you :)
As for me, I'm now doing my learning with a sequence length of 25 and it learns very quickly! Compared to when I was training on length = 100, the time for an epoch went from 30s -> 9s and takes ~300 -> ~160 epochs to have ok results, ~700 -> ~350 to have really good results
I don't know if this is caused by my small dataset, but worth a try if you wanna make it quicker

Jitesh Singhvi · Answer 13 · Tue Jul 03 2018 20:19:49 GMT+0800 (China Standard Time)

I see. Yeah I guess decreasing the sequence length might be a good idea after all. How many epochs did you train model for to obtain a loss of about 0.12? Mine took about 200.
Also, what dataset did you use and how big was it?

KPlanas · Answer 14 · Sat Jul 07 2018 11:55:41 GMT+0800 (China Standard Time)

Another advantage when you decrease the sequence length is that it avoids "overfitting" when you have a small dataset like mine(When I was at seq_len = 100, I was basically generating one of the dataset music only with slight changes)
A loss value of 0.12 should correspond to what I call "ok results" so ~160 epochs now (Will have to check on Monday to be sure)
For the dataset, I use multiple that I made myself, but they each have around 10 ~ 15 single instrument MIDI. Each MIDI is about 3mn long

Jitesh Singhvi · Answer 15 · Sun Jul 08 2018 03:43:23 GMT+0800 (China Standard Time)

I see. Thanks for the info!
Good luck with your model. :)

Charles Van Tassel · Answer 16 · Mon Apr 29 2019 21:50:03 GMT+0800 (China Standard Time)

Hi everyone, I'm a little late to the discussion, but I am having a somewhat similar issue, though not exactly the same. First of all, the pattern.append(index) never worked for me. It always gave me an error that np.ndarray object has no attribute 'append' so instead I had to use pattern = np.append(pattern, index) Did anyone else have this issue?

After doing this and running the program again, I also was getting single note patterns. But, after training it for 100 epochs, I got much better results.

BUT here is the part that is really stumping me:
I have used start = np.random.randint(0, len(network_input) - 1) in the code, like the tutorial said to, and (because I was printing the start point to make sure) I can see that it does in fact start at a different sequence point each time. However, every time I go through to generate a new output, it always gives me the exact same output as before, no matter what the sequence starting point is. Does anyone have any ideas about this?

Satashree Roy · Answer 17 · Sun Nov 03 2019 17:17:53 GMT+0800 (China Standard Time)

@cvantass
Hey. I am facing the same issue!
pattern.append never worked.
predict.py gives single note patterns, even for different start positions.

Did it finally work out for you?

Satashree Roy · Answer 18 · Sun Nov 03 2019 17:36:14 GMT+0800 (China Standard Time)

@jsinghvi
Hi. Im having the same issue. numpy.argmax picks the same note index ever single time! And hence it generates a single note pattern.
@Skuldur
But the real surprise is that the problem occurs even with the trained weights that you have given in your repo. How do we go about this?

Charles Van Tassel · Answer 19 · Sun Nov 03 2019 21:31:09 GMT+0800 (China Standard Time)

@satashree27 Yes, I did get it to work eventually. Took me forever, and I don't remember everything I had to adjust, but I do know that I had to use

pattern = np.append(pattern, index)
pattern = pattern[1:len(pattern)]

As for getting it to stop picking the same note every single time, I do remember having to train it for many more epochs than is suggested in the article, but I also definitely changed some of other things that I unfortunately can't recall right now. Even then, it was still overfitting like crazy on whatever piece of music I used as my reference, so I eventually just went rogue and made an entirely new script for everything, but this was a great jumping off point for me anyway! I hope this was at least a little helpful. Feel free to email me directly if you want to talk more about what worked/didn't work in general. I'd be happy to help.

Sigurður Skúli Sigurgeirsson · Answer 20 · Wed Nov 13 2019 19:38:16 GMT+0800 (China Standard Time)

Hi, the problem is the current model on the master branch is very fragile. Try using this model instead:

model = Sequential()
    model.add(LSTM(
        512,
        input_shape=(network_input.shape[1], network_input.shape[2]),
        recurrent_dropout=0.3,
        return_sequences=True
    ))
    model.add(LSTM(512, return_sequences=True, recurrent_dropout=0.3,))
    model.add(LSTM(512))
    model.add(BatchNormalization())
    model.add(Dropout(0.3))
    model.add(Dense(256))
    model.add(Activation('relu'))
    model.add(BatchNormalization())
    model.add(Dropout(0.3))
    model.add(Dense(n_vocab))
    model.add(Activation('softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

The differences are:

We're using recurrent_dropout instead of regular dropout for the LSTM layers. This is because using regular dropout in between recurrent layers such as LSTM can actually harm the performance of the model
Added BatchNormalization which helps normalize the outputs of layers and helps the model converge. In my experience it almost always prevents the model from generalizing to one note

Regarding overfitting, this model will not generalize as well Transformer networks such as GPT-2 and is expected to overfit at least a bit. Especially if you don't have enough training data. The goal of this repository is to keep the model simple so that it can serve as an entry point into Music Generation.

I'm glad that you could use this library as a jumping point to more advanced research. I recommend checking out the varying_speed_notes branch to see how the model can be expanded to generate better and more varied songs.