FabienRoger / control-poison

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

done

  • add uncertain protocol
  • Change ref away from davinci

TODOs

  • fix the bug in the highconf protocol that leads to vastly prefer the first answer
  • more epochs/data --> so that dumb labels are not so good, maybe always start by ft on dumb to get the right fmt?
  • better red team spurious cues
  • shut downs
  • add spurious cue detection

About


Languages

Language:Python 100.0%