Beast code in Giters

Simon Lermen's repositories

redteaming

redteaming a simple language model like gpt2. based on anthropic redteaming paper

Language:Python7 30

exploring_modelgraded_evaluation

exploring model-graded evaluation

Language:TeX4 20

arena-ldn

London in Person exercises

Language:Jupyter Notebook2 10

gpt-tools

tools for the openai api

Language:Python1 20

safety_benchmarks

Safety Benchmarks such as Refusal Bench

1 10

SVDInterpretTransformer

Apply SVD to Transformer weights

Language:Jupyter Notebook1 20

a3cacerdemo

Language:Python020

ACER

Actor-critic with experience replay

Language:PythonMIT010

al-folio

A beautiful, simple, clean, and responsive Jekyll theme for academics

Language:HTMLMIT000

aneurysm-segment

Language:Jupyter Notebook010

arena

My solutions for the arena course

Language:Jupyter Notebook010

PySvelte

A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizations

Language:HTMLApache-2.0000

ActivationDirectionAnalysis

Language:Python000

chat-langchain

Language:Python000

DalasNoin

010

DecisionTransformerInterpretability

Interpreting how transformers simulate agents performing RL tasks

Language:PythonMIT000

diffusion-neo

Language:Jupyter Notebook020

GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ

Language:PythonApache-2.0000

langchain

⚡ Building applications with LLMs through composability ⚡

Language:PythonMIT000

LM-exp

LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces

Language:Jupyter Notebook000

Minigrid

Simple and easily configurable grid world environments for reinforcement learning

Language:PythonNOASSERTION000

mlab

Machine Learning for Alignment Bootcamp

Language:Jupyter Notebook000

MLAB-Transformers-From-Scratch

Reimplementing transformers from scratch (from Redwood Research's Machine Learning for Alignment Bootcamp).

Language:Python000

python-binance

Binance Exchange API python implementation for automated trading

Language:PythonMIT010

reference_chatbot

In-Context Retrieval-Augmented Language Models AI21labs Implementation

010

refusal_direction

Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".

Language:PythonApache-2.0000

setup_cloud_machine

010

simple-llama-finetuner

Simple UI for LLaMA Model Finetuning

Language:Jupyter Notebook000

TextWorld

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

Language:Jupyter NotebookNOASSERTION000

weblm

Drive a browser with a language model

Language:PythonMIT000