DANCING-AI

Extraction of pose coordinates from dance videos using openpose human pose estimation.
Training LSTM network on extracted coordinates using songs as input and coordinates as output.
Trained lstm is used to predict dance coordinates for the remaining song( 95% of the audio is used for training and remaining 5% for predictions ).
Display output videos by joining predicted coordinates to generate dancing human stick figures.

Requirements

   opencv-contrib-python==4.7.0.72
   pandas==2.0.1
   librosa==0.10.0.post2
   moviepy==1.0.3
   yt-dlp==2023.3.4
   tensorflow==2.12.0
   keras==2.12.0

Training/Demo

Run get_data.py to download videos and audios to data folder. You can add youtube videos links to "video_links.txt" file for downloading. Alternatively you can copy videos( '.mp4' format ) and audios( '.wav' format ) directly to the data folder.
Download pretrained weights for pose estimation from here. Download pose_iter_440000.caffemodel and save it in "models" folder.
Run main.py to train lstm and display predicted dance video.

 python main.py --video "path to input video" --audio "path to input audio" --background "path to background image" --display
 Example - python main.py --video data/0.mp4 --audio data/0.wav --background inputs/bg0.jpg --display

#Note - If the gpu-ram is 3 GB or less, Reduce memory-limit in this line to a value less than your gpu-ram.

Pose estimation using openpose

Predictions

References

About

LSTM network trained on dance videos using audio( songs ) as input and human pose estimated coordinates as output. Trained LSTM models are then used to generate dance videos using songs as input.

deep-learning computer-vision lstm

Apache License 2.0

Languages

Language:Python 100.0%