Proposal: collect dataset
elexunix opened this issue · comments
Hello guys!
Maybe it would be of interest or fun to collect a moderately large asciinema recordings dataset, from many different users -- for that, just asciinema rec
your terminal, if you are not doing something too personal there, and then share the recording. Perhaps, we can collect them together to a dataset of casts, and then, since the asciinema recording structure is luckily simple, train an LLM on that corpus, and have fun watching "realistic" (in the view of that NN) casts of doing something in a terminal
What do you think about collectively collecting such a dataset? I have a 4090, can train the LLM on it
:)
How would you train the model? You're thinking of some RNN like GRU, or rather a transformer model? What about the timing information - would this be part of the model as well, or you were thinking of training on the raw output only?