bjamil / my-reading-list

Publications, books, and articles I've been reading or am planning on reading.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

My Reading List

Publications, books, and web pages I've been reading or am planning on reading.

Why

I've been trying to level up recently on ML, LLMs, NLU, etc. and whenever I read a paper, I feel there are ten others I should read as well :) . This repo is to better track what I've read and what I want to read and jot some learnings along the way.

I also want to give this Learning in Public thing a shot. Let's see how it goes!

ML Reading List

General

Paper Read Date Last Revise Date Notes
Evaluating Large Language Models Trained on Code 2023-03-12
Understanding HTML with Large Language Models 2023-03-12 2022-10-08 Notes
Multi-Task Sequence to Sequence Learning 2016-03-01
Emergent Abilities of Large Language Models 2022-10-06
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model 2022-12-11
Finetuned Language Models Are Zero-Shot Learners 2022-02-08
LLaMA: Open and Efficient Foundation Language Models 2023-03-27
Training language models to follow instructions with human feedback 2022-03-04
HTLM: Hyper-Text Pre-Training and Prompting of Language Models 2021-07-14
Environment Generation for Zero-Shot Compositional Reinforcement Learning 2022-01-21
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
LaMDA: Language Models for Dialog Applications

Training Speedups/Scaling

Paper Read Date Last Revise Date Notes
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
PaLM: Scaling Language Modeling with Pathways
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Training Compute-Optimal Large Language Models 2022-03-29
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts 2021-12-13

Non-LLMs

Paper Read Date Last Revise Date Notes
World of Bits: An Open-Domain Platform for Web-Based Agents 2017
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
User-Driven Automation of Web Form Filling 2013
Learning Transferable Visual Models from Natural Language Supervision 2021-02-26
Learning to Generate Reviews and Discovering Sentiment 2017-04-06
WikiGraphs: A Wikipedia Text - Knowledge Graph Paired Dataset 2021-07-20
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Extracting Structured Data from Templatic Documents 2020-06-12

Bloom Filters

Read Date Resource Notes
2024-01-30 Bloom Filters by ByteByteGo Gives a decent intuition
2024-01-30 What are Bloom Filters? Not the best example. prev vid was better
2024-01-30 Advancing Spark - Bloom Filter Indexes in Databricks Delta Interesting, but more about delta than spark, as the title implies
The Case for Learned Index Structures
Optimizing Learned Bloom Filters by Sandwiching

Quantization, Model Compression & Optimization

Read Date Resource Notes
Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks
2024-01-30 How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUs (Roblox) Interesting. Always nice to read actual case studies. I'd like to see how ONNX compares to their benchmarks.

Blog Posts I Liked

Read Date Post Notes
2024-01-30 How we reduced our text similarity runtime by 99.96% (Microsoft) I skimmed through it. Seems interesting and worth a reread
2024-01-30 How Roblox Reduces Spark Join Query Costs With Machine Learning Optimized Bloom Filters I wonder if this can be applied to other use cases too and not just fact/dim tables. Interesting read.

Blog Posts to Read

Post Notes
Using machine learning to index text from billions of images (Dropbox) Curious abouth the OCR/PDF text extraction part here. Need some caffiene in me to read this.

About

Publications, books, and articles I've been reading or am planning on reading.