SongDriver2: Real-time Emotion-based Music Arrangement with Soft Transition

SongDriver2 first recognize the last timestep's music emotion and then fuse it with the current timestep's target input emotion. The fused emotion then serves as the guidance for SongDriver2 to generate the upcoming music based on the input melody data. SongDriver2 achieves a balance between real-time emotion fit and soft transitions, enhancing the coherence of the generated music.

The respository structure

APP_Code: the code of application including fontend and backend. The users can interate with the application by selecting their current emotion and the generated music by SongDriver2 or the recommneded music will be played.
SongDriver2_Code: the code of SongDriver2 pipeline, which is implemented by Pytorch.

Downsampling pipeline & w/o Downsampling pipeline

We do not adopt the w/o Downsampling pipeline in our research, because the low similarity to the original music is an unbearable pain point. From the perspective of application value, we ultimately chose the Downsampling pipeline (generate a high-resolution version including harmony and melody details from the low-resolution representation based on the real-time input emotion) for our research.

However, we are quite fond of the music arrangements generated by the w/o Downsampling pipeline, because they are even more creative and have more possibilities from a musician's perspective. So we do not want to eliminate the possibility of this being chosen (constructing positive samples by applying noise masking, major-minor key conversion, pitch transposition, etc. to randomly selected segments of a song). Therefore, for application scenarios that require retaining a small amount of similarity while providing more possibilities, we recommend using the w/o Downsampling pipeline. On the other hand, for scenarios requiring obvious similarity to the original music, we suggest using the Downsampling pipeline.

About

SongDriver2 achieves a balance between real-time emotion fit and soft transitions, enhancing the coherence of the generated music.

Languages

Language:C 60.7%Language:HTML 22.8%Language:Python 9.8%Language:Lua 1.9%Language:C++ 1.7%Language:JavaScript 1.6%Language:Makefile 0.6%Language:CSS 0.5%Language:Roff 0.4%Language:EJS 0.0%Language:Shell 0.0%