Skuldur / Classical-Piano-Composer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training is not working

jonDel opened this issue · comments

When using the hardcoded weights, the predict script works fine and produces a multi-notes midi file. However, when training from scratch, using lstm script for learning the weigths, the predict script produces a single note midi file. I tried several combinations of subsets (from midi songs folder), and the result is always the same.

Any thoughts on this?

Thanks! I will try again and notify here of it works.

Hi, I am having the same problem as the one mentioned above.
When training the LSTM on my own music files, it generates only a single repeating note MIDI File

Any update on this? Am I doing something wrong or are there some conditions on the dataset?
Also, when you say "Single Instrument", does it mean that it works with a MIDI File with Left Piano and Right Piano Track, or should I only keep one of them?

It only works with a single track. If you can combine the left and right piano tracks into a single track that would work best.

Regarding the repeating note I had that issue at the start. How big is your dataset?

Ok, so I will try to mix them into a single instrument track

Sadly, I have a really small dataset (Trying to generate melodies corresponding to an emotion, so quite hard to find single instrument emotion-bearing examples). Depending on the dataset I am using, it ranges from 10 to 20.

For now, I am brute forcing this by increasing the number of epochs (200 -> 500) and it kinda works depending on the dataset. I suppose you had the same problem when your dataset was small ?

Next thing I'm gonna try is resampling the dataset by tone shifting by one or two tones, I hope it will work.

Hey there! So I am facing the issue as the one above. I have a decent dataset of 32 midi files taken from http://www.piano-midi.de/beeth.htm. I believe they all consist of single instrument but I wouldn't bet my life on it. I tried different combinations of epochs (100 with a single midi file, 10 with the above mentioned dataset) , number of notes to generate etc.
Here is a portion of the value of 'prediction' from predict.py :
[[1.13396626e-02 2.28804900e-04 9.67228334e-05 4.83751064e-04
2.05806704e-04 1.13458314e-04 6.99275770e-05 1.14716301e-02
1.34334527e-03 7.15563074e-05 1.54286786e-03 2.23979194e-04
9.47747903e-04 4.45510354e-03 7.15357764e-03 7.32331682e-05
2.32197126e-04 6.38070842e-03 2.12815910e-04 4.92099812e-03
.....]]
just to give you an idea.
Everytime numpy.argmax is applied to this 'prediction' value, it always picks the 240th note.
Any suggestions?

Also, I run tensorflow on cpu (intel i5) so increasing the number of epochs wouldn't be my first choice as it took about 2 hours just for a single epoch.
Still, any thoughts on the matter would be appreciated.

Hi !
To check if your MIDIs are single instrument , try using music21 instrument function or something like this
Also, just to give you an idea, it took me about 30 hours or so to train 200 epochs with the given dataset on a Nvidia P5000

Hey! thanks for replying @KPlanas .
I'm positive that I have a single instrument midi files. I plan on training the model for 200 epochs on a Tesla K80 (Google Colab). Maybe it will resolve the problem? I certainly hope so.
What I still don't understand is this - since I trained it for a small number of epochs, I'd be okay if it generated some random garbage tune, but why on earth would it pick a single note during generation, EVERY SINGLE TIME?! It's still a mystery to me.

Update : After ~180 epochs, the model finally works fine! Although the loss plateaued in the last 50 epochs, it's still pretty good.

Hi !
I have no idea why (Maybe I should study the theory slightly more), but it also gives me repeated single note "music" at first when the loss is around 4. Still a mystery...
Anyway, good that it works for you :)
As for me, I'm now doing my learning with a sequence length of 25 and it learns very quickly! Compared to when I was training on length = 100, the time for an epoch went from 30s -> 9s and takes ~300 -> ~160 epochs to have ok results, ~700 -> ~350 to have really good results
I don't know if this is caused by my small dataset, but worth a try if you wanna make it quicker

I see. Yeah I guess decreasing the sequence length might be a good idea after all. How many epochs did you train model for to obtain a loss of about 0.12? Mine took about 200.
Also, what dataset did you use and how big was it?

Another advantage when you decrease the sequence length is that it avoids "overfitting" when you have a small dataset like mine(When I was at seq_len = 100, I was basically generating one of the dataset music only with slight changes)
A loss value of 0.12 should correspond to what I call "ok results" so ~160 epochs now (Will have to check on Monday to be sure)
For the dataset, I use multiple that I made myself, but they each have around 10 ~ 15 single instrument MIDI. Each MIDI is about 3mn long

I see. Thanks for the info!
Good luck with your model. :)

Hi everyone, I'm a little late to the discussion, but I am having a somewhat similar issue, though not exactly the same. First of all, the pattern.append(index) never worked for me. It always gave me an error that np.ndarray object has no attribute 'append' so instead I had to use pattern = np.append(pattern, index) Did anyone else have this issue?

After doing this and running the program again, I also was getting single note patterns. But, after training it for 100 epochs, I got much better results.

BUT here is the part that is really stumping me:
I have used start = np.random.randint(0, len(network_input) - 1) in the code, like the tutorial said to, and (because I was printing the start point to make sure) I can see that it does in fact start at a different sequence point each time. However, every time I go through to generate a new output, it always gives me the exact same output as before, no matter what the sequence starting point is. Does anyone have any ideas about this?

@cvantass
Hey. I am facing the same issue!
pattern.append never worked.
predict.py gives single note patterns, even for different start positions.

Did it finally work out for you?

@jsinghvi
Hi. Im having the same issue. numpy.argmax picks the same note index ever single time! And hence it generates a single note pattern.
@Skuldur
But the real surprise is that the problem occurs even with the trained weights that you have given in your repo. How do we go about this?

@satashree27 Yes, I did get it to work eventually. Took me forever, and I don't remember everything I had to adjust, but I do know that I had to use

pattern = np.append(pattern, index)
pattern = pattern[1:len(pattern)]

As for getting it to stop picking the same note every single time, I do remember having to train it for many more epochs than is suggested in the article, but I also definitely changed some of other things that I unfortunately can't recall right now. Even then, it was still overfitting like crazy on whatever piece of music I used as my reference, so I eventually just went rogue and made an entirely new script for everything, but this was a great jumping off point for me anyway! I hope this was at least a little helpful. Feel free to email me directly if you want to talk more about what worked/didn't work in general. I'd be happy to help.

Hi, the problem is the current model on the master branch is very fragile. Try using this model instead:

model = Sequential()
    model.add(LSTM(
        512,
        input_shape=(network_input.shape[1], network_input.shape[2]),
        recurrent_dropout=0.3,
        return_sequences=True
    ))
    model.add(LSTM(512, return_sequences=True, recurrent_dropout=0.3,))
    model.add(LSTM(512))
    model.add(BatchNormalization())
    model.add(Dropout(0.3))
    model.add(Dense(256))
    model.add(Activation('relu'))
    model.add(BatchNormalization())
    model.add(Dropout(0.3))
    model.add(Dense(n_vocab))
    model.add(Activation('softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

The differences are:

  • We're using recurrent_dropout instead of regular dropout for the LSTM layers. This is because using regular dropout in between recurrent layers such as LSTM can actually harm the performance of the model
  • Added BatchNormalization which helps normalize the outputs of layers and helps the model converge. In my experience it almost always prevents the model from generalizing to one note

Regarding overfitting, this model will not generalize as well Transformer networks such as GPT-2 and is expected to overfit at least a bit. Especially if you don't have enough training data. The goal of this repository is to keep the model simple so that it can serve as an entry point into Music Generation.

I'm glad that you could use this library as a jumping point to more advanced research. I recommend checking out the varying_speed_notes branch to see how the model can be expanded to generate better and more varied songs.