Joseph Imperial's repositories
Philippine-Languages-Online-Corpora
This repository contains the Philippine Languages Online Corpora (PLOC)
BasahaCorpus-HierarchicalCrosslingualARA
This repository contains the code and data for BasahaCorpus paper accepted for EMNLP 2023 (Main).
getting-started-with-the-twitter-api-v2-for-academic-research
A course on getting started with the Twitter API v2 for academic research
readability-standard-alignment
Code and data repository for Readability Standard Alignment paper by Joseph Imperial and Harish Tayyar Madabushi at GEM 2023.
ara-close-lang
Code for Automatic Readability Assessment for Closely Related Philippine Languages (ACL2023)
ACL2023-Retrieval-LM.github.io
https://acl2023-retrieval-lm.github.io/
BIG-bench
Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models
CEFR-SP
Repository for CEFR-SP corpus and sentence level assessment
egyptians-in-ai
A website dedicated to showcasing the profiles of prominent Egyptian researchers in the field of AI.
evaluation
Code and Data for Evaluation WG
filipino-tiktok-hatespeech
A dataset containing hate speech in text form transcribed from Filipino Tiktok videos related to politics.
gpt-2-simple
Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts
imperialite
My personal repository
llama
Inference code for LLaMA models
mteb
MTEB: Massive Text Embedding Benchmark
reinforcement-learning
Minimal and Clean Reinforcement Learning Examples
seacrowd-datahub
A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
sgnlp
Machine learning models from Singapore's NLP research community
standardize
This repository contains the code, data, and website assets for the Standardize paper.
StoryPlot-RewardShaping
Code from the IJCAI 2019 paper "Controllable Neural Story Plot Generation via Reward Shaping"
TSAR-2022-Shared-Task
TSAR2022 Shared Task on Lexical Simplification - Datasets and Evaluation scripts