Imitation learning without policy optimization !
https://www.notion.so/Adversarial-Soft-Advantage-Fitting-441698eb0ccb40eab4f59275d637466a
Every expert demo (state-action pairs) file must be a pickle file and in this form: [[np.array([state0]), np.array([action0])], [np.array([state1]), np.array([action1])], ...]
Adjust the parameters in ./src/train.py and then run it.
Adjust the parameters in ./src/test.py and then run it.