keshavoct98 / DANCING-AI

LSTM network trained on dance videos using audio( songs ) as input and human pose estimated coordinates as output. Trained LSTM models are then used to generate dance videos using songs as input.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DANCING-AI

  1. Extraction of pose coordinates from dance videos using openpose human pose estimation.
  2. Training LSTM network on extracted coordinates using songs as input and coordinates as output.
  3. Trained lstm is used to predict dance coordinates for the remaining song( 95% of the audio is used for training and remaining 5% for predictions ).
  4. Display output videos by joining predicted coordinates to generate dancing human stick figures.

Requirements

   opencv-contrib-python==4.7.0.72
   pandas==2.0.1
   librosa==0.10.0.post2
   moviepy==1.0.3
   yt-dlp==2023.3.4
   tensorflow==2.12.0
   keras==2.12.0

Training/Demo Open In Colab

  1. Run get_data.py to download videos and audios to data folder. You can add youtube videos links to "video_links.txt" file for downloading. Alternatively you can copy videos( '.mp4' format ) and audios( '.wav' format ) directly to the data folder.
  2. Download pretrained weights for pose estimation from here. Download pose_iter_440000.caffemodel and save it in "models" folder.
  3. Run main.py to train lstm and display predicted dance video.
 python main.py --video "path to input video" --audio "path to input audio" --background "path to background image" --display
 Example - python main.py --video data/0.mp4 --audio data/0.wav --background inputs/bg0.jpg --display

   #Note - If the gpu-ram is 3 GB or less, Reduce memory-limit in this line to a value less than your gpu-ram.

Pose estimation using openpose

Predictions

References

  1. https://www.learnopencv.com/deep-learning-based-human-pose-estimation-using-opencv-cpp-python/
  2. https://github.com/CMU-Perceptual-Computing-Lab/openpose
  3. https://python-pytube.readthedocs.io/en/latest/
  4. https://zulko.github.io/moviepy/
  5. https://librosa.org/librosa/
  6. https://www.youtube.com/channel/UCX9y7I0jT4Q5pwYvNrcHI_Q

About

LSTM network trained on dance videos using audio( songs ) as input and human pose estimated coordinates as output. Trained LSTM models are then used to generate dance videos using songs as input.

License:Apache License 2.0


Languages

Language:Python 100.0%