asciinema / asciinema

Terminal session recorder 📹

Home Page:https://asciinema.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Proposal: collect dataset

elexunix opened this issue · comments

Hello guys!

Maybe it would be of interest or fun to collect a moderately large asciinema recordings dataset, from many different users -- for that, just asciinema rec your terminal, if you are not doing something too personal there, and then share the recording. Perhaps, we can collect them together to a dataset of casts, and then, since the asciinema recording structure is luckily simple, train an LLM on that corpus, and have fun watching "realistic" (in the view of that NN) casts of doing something in a terminal

What do you think about collectively collecting such a dataset? I have a 4090, can train the LLM on it

:)

How would you train the model? You're thinking of some RNN like GRU, or rather a transformer model? What about the timing information - would this be part of the model as well, or you were thinking of training on the raw output only?