There are 2 repositories under local-inference topic.
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
Tool for test diferents large language models without code.
LLM chatbot example using OpenVINO with RAG (Retrieval Augmented Generation).