berlino / seq_icl

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

In-context Language Learning: Architectures and Algorithms [WIP]

This repo serves for the experiments for the paper:

Title: In-context Language Learning: Architectures and Algorithms

Authors : Ekin Akyürek, Bailin Wang, Yoon Kim, Jacob Andreas

Setup

conda create -n seq_icl python=3.11
pip install -r requirements.txt

Experiments

Experiments on DFA

To run the training,

python -m train experiment=dfa/lstm
python -m train experiment=dfa/retnet
python -m train experiment=dfa/gla
python -m train experiment=dfa/transformer+

Troubleshooting

  • add export PATH=$PATH:/usr/local/sbin:/usr/sbin:/sbin so that ldconfig can work properly
  • The MHA in simple_lm.py use num_heads, but in other modules we use n_heads. The name needs to be changed for consistency, but they're kept as is for now.
  • you might need to set up conv1d following the command in this issue
git clone https://github.com/Dao-AILab/causal-conv1d.git
cd causal_conv1d
git checkout v1.0.2  # this is the highest compatible version allowed by Mamba
CAUSAL_CONV1D_FORCE_BUILD=TRUE pip install .

Acknowledgements

This repo is adapted from safari. Triton implementations are taken from linear rnn.

About

License:Apache License 2.0


Languages

Language:Jupyter Notebook 68.1%Language:Python 31.8%Language:Shell 0.1%