Benchmark and identify the best ways to speedup LLM inference.
Resources
prompt-engineering-guide : This has mistral specific details.
Coursera prompt-engineering course
another prompt engineering course
For structuring experimets(mlops): https://github.com/vin136/MLOPS
Fine tuning llms
https://www.youtube.com/playlist?list=PL23FjyM69j92o_j5JFH9sNlbhCx4n0ZYh
Hugging face blog/references
generation strategies chat-template prompting
End-point detection
basically, use an ML model to detect end-of-command.
Practical solutions:
vad - detect end point faster whisper, whisper-live
TODO: