Daniel Furman's repositories
polyglot-or-not
Are foundation LMs multilingual knowledge bases? (EMNLP 2023)
Python-species-distribution-modeling
A brief Python tutorial for geospatial classification.
awesome-chatgpt-prompts-clustering
Text clustering: HDBSCAN is probably all you need.
chat-all-in
RAG chatbot built on top of trending M&A news.
evals-with-chat-formats
Experiments applying chat templates to generative language model evaluations.
chat-gpt-3.5-turbo
Lightweight demo of gpt-3.5-turbo conversation completion.
flask-boiler
Docker backend running on flask, mysql, and redis queue. Intended as lightweight boilerplate.
HyperSpectralDRL
Deep RL for unsupervised hyperspectral band selection.
llm-reasoning-pop-quiz
Do open-sourced LLMs reason as well as closed-sourced ones?
daniel-furman.github.io
Portfolio and blog running on Hugo pages.
CV-feature-eng-experiments
Hugging Face models are all you need for “vanilla” image classification
danielryanfurman
My personal website & portfolio.
NLP-dataset-mixing-experiments
Data-source mixing for social media caption classification
blockCV
The blockCV package creates spatially or environmentally separated training and testing folds for cross-validation to provide a robust error estimation in spatially structured environments. See
cellpose-training-kaggle-data21
Data for the Sartorius Cell Instance Segmentation comp with cellpose transforms.
computational-mathematics
A collection of MATLAB scripts from undergraduate math classes. Focus on numerical methods and computational linear algebra.
dash-rq-demo
Long running tasks in Dash using RQ
DS-case-prep
DS case interview questions alongside frame-worked answers. Questions sourced from various resources, primarily FAANGs.
lm-evaluation-harness
A framework for few-shot evaluation of autoregressive language models.
online-dating-field-experiment
Final project for info 241 @ UC Berkeley, Spring 22
Random-recipes
A variety of ML utils across different clouds and frameworks.
scattertext
Beautiful visualizations of how language differs among document types.
stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
Stanford_Penn_MIDRC_Deidentifier
A deidentifier / deidentification pipeline developed by Stanford and Penn as part of the MIDRC organization.
test
Measuring Massive Multitask Language Understanding | ICLR 2021
transformers_llama
Code and models for BERT on STILTs