Simon S. Viloria's starred repositories
awesome-public-real-time-datasets
A list of publicly available datasets with real-time data maintained by the team at bytewax.io
text-generation-inference
Large Language Model Text Generation Inference
ensemble-instruct
codebase release for EMNLP2023 paper publication
instructlab
InstructLab Command-Line Interface. Use this to chat with a model and execute the InstructLab workflow to train a model using custom taxonomy data.
dspy-redteam
Red-Teaming Language Models with DSPy
prometheus
[ICLR 2024 & NeurIPS 2023 WS] An Evaluator LM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically designed for fine-grained evaluation on a customized score rubric, Prometheus is a good alternative for human evaluation and GPT-4 evaluation.
text-generation-inference
IBM development fork of https://github.com/huggingface/text-generation-inference
llm4regression
Examining how large language models (LLMs) perform across various synthetic regression tasks when given (input, output) examples in their context, without any parameter update
ml-engineering
Machine Learning Engineering Open Book
distilabel
⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.
presidio-research
This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.
terraform-ibm-cloud-pak
Terraform modules and examples to support installation for IBM Cloud Paks onto OpenShift clusters
cloud-pak-cli
Cloudctl is a command line tool to manage Container Application Software for Enterprises (CASE)