jorirsan

followers

following

stars

Universitat Politècnica de València

Valencia

Jorge Iranzo's starred repositories

UTMOSv2

Language:PythonMIT1000

llama-recipes

Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.

Language:Jupyter Notebook1075000

RapidFuzz

Rapid fuzzy string matching in Python using various string metrics

Language:C++MIT250600

jiwer

Evaluate your speech-to-text system with similarity measures such as word error rate (WER)

Language:PythonApache-2.057600

outlines

Structured Text Generation

Language:PythonApache-2.0732400

epitran

A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)

Language:PythonMIT61300

WMT-Biomed-Test

1400

toLLMatch

toLLMatch🔪: Context-aware LLM-based simultaneous translation

Language:Jupyter NotebookMIT300

simul_whisper

Code for our INTERSPEECH paper Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection

Language:Python1100

NAT_vs_AT

MIT200

ccextractor

CCExtractor - Official version maintained by the core team

Language:CGPL-2.068900

whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

Language:PythonAGPL-3.0174000

Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Language:PythonMIT289500

MLVU

🔥🔥MLVU: Multi-task Long Video Understanding Benchmark

Language:Python9600

VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Language:PythonApache-2.057100

NAST-S2x

A fast speech-to-any translation model that supports simultaneous decoding and offers 28× speedup.

Language:Python5100

paella-core

Paella Player core library

Language:JavaScriptECL-2.02000

mbrs

A library for minimum bayes risk (MBR) decoding

Language:PythonMIT1400

eole

Open language modeling toolkit based on PyTorch

Language:PythonMIT2600

eamt24-linguistic-mt

A repo for resources for our EAMT 2024 tutorial

600

llm-foundry

LLM training code for Databricks foundation models

Language:PythonApache-2.0388500

ParaDocs

Language:Python100

CroCoAlign

A Cross-Lingual, Context-Aware and Fully-Neural Sentence Alignment System for Long Texts.

Language:PythonNOASSERTION600

gpt-neox

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Language:PythonApache-2.0672900

apptainer

Apptainer: Application containers for Linux

Language:GoNOASSERTION100100

compare-mt

A tool for holistic analysis of language generations systems

Language:PythonBSD-3-Clause46600

lingua-py

The most accurate natural language detection library for Python, suitable for short text and mixed-language text

Language:PythonApache-2.0103500

tensorrt_backend

The Triton backend for TensorRT.

Language:C++BSD-3-Clause5800

exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

Language:PythonMIT329600

konoha

🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.

Language:PythonMIT22400