Israel Gonzalez-Brooks's starred repositories
awesome-public-datasets
A topic-centric list of HQ open datasets.
traingenerator
🧙 A web app to generate template code for machine learning
openwebtext
Open clone of OpenAI's unreleased WebText dataset scraper. This version uses pushshift.io files instead of the API for speed.
recipes
Recipes are a standard, well supported set of blueprints for machine learning engineers to rapidly train models using the latest research techniques without significant engineering overhead.Specifically, recipes aims to provide- Consistent access to pre-trained SOTA models ready for production- Reference implementations for SOTA research reproducibility, and infrastructure to guarantee correctness, efficiency, and interoperability.
peerlibrary
Facilitating the global conversation on academic literature
Taskmaster
Please see the readme file as well as our 2019 EMNLP paper linked here -->
Adventures-in-TensorFlow-Lite
This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.
iOS-Shortcuts-Reference
Reference documentation for the iOS Shortcuts app file structure
krita_stable_diffusion
A Stable Diffusion plugin for Krita
c4-dataset-script
Inspired by google c4, here is a series of colossal clean data cleaning scripts focused on CommonCrawl data processing. Including Chinese data processing and cleaning methods in MassiveText.
circa
Circa (meaning ‘approximately’) dataset aims to help machine learning systems to solve the problem of interpreting indirect answers to polar questions. The dataset contains pairs of yes/no questions and indirect answers, together with annotations for the interpretation of the answer. The data is collected in 10 different social conversation situations (eg. food preferences of a friend).
distributed-model-training
Approach to implementing distributed training of an ML model on mobile devices.
pdfTextanalyzer
pdf text extractor, text analyzed by NTLK, testing for gpt3 as last stage