ejnnr / cupbearer

A library for mechanistic anomaly detection

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add more example notebooks

ejnnr opened this issue · comments

We should at least cover:

  • Showing how to use WaNet correctly without shooting yourself in the foot
  • Demonstrating adversarial examples
  • Demonstrating a detector that uses some untrusted training data

Less crucial, but ideally we'd also have some actually interesting examples, e.g. non-trivial ablation evals or running a bunch of task/detector combinations at once.