OpenMOSS's repositories
Say-I-Dont-Know
[ICML'2024] Can AI Assistants Know What They Don't Know?
Language-Model-SAEs
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.
OpenMOSS shares a collection of our research in large language models. The team is affiliated to the FudanNLP lab.
China
[ICML'2024] Can AI Assistants Know What They Don't Know?
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.