Sebastian Müller (snimu)

snimu

Geek Repo

Company:BMW

Location:Munich

Twitter:@omouamoua

Github PK Tool:Github PK Tool

Sebastian Müller's repositories

rebasin

Apply methods described in "Git Re-basin"-paper [1] to arbitrary models --- [1] Ainsworth et al. (https://arxiv.org/abs/2209.04836)

Language:PythonLicense:MITStargazers:10Issues:2Issues:2

rebasin-results

Results for snimu/rebasin

Language:PythonLicense:MITStargazers:5Issues:1Issues:0

grokfast

Trying out the grokfast algorithm on LLMs

Language:PythonLicense:Apache-2.0Stargazers:4Issues:0Issues:0

kan

Ablate KAN and Fourier KAN vs. normal Linear Layers in LLMs

Language:PythonLicense:Apache-2.0Stargazers:3Issues:1Issues:0

dspy-redteam-tests

Red-Teaming Language Models with DSPy

Language:PythonStargazers:1Issues:0Issues:0

hlb-gpt-cli

CLI controllable version of hlb-gpt by tysam-code

Language:PythonLicense:Apache-2.0Stargazers:1Issues:0Issues:0

attention-experiments

I'm playing around with Attention mechanisms

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

dspy

DSPy: The framework for programming—not prompting—foundation models

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

etbl-vision

Embracing the bitter lesson (vision)

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

gradient-rounding

Round the gradient during LLM training to different degrees; compare "scaling" of rounding to different significant digits to parameter scaling

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

guarantees

Python: guarantee test coverage, guarantee type and runtime-guarantees

Language:PythonLicense:MITStargazers:0Issues:2Issues:0

hlb-CIFAR10

Train to 94% on CIFAR-10 in ~6.84 seconds on a single A100, the current world speed record. Or ~95.78% in ~114 seconds (or less!)

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

hlb-gpt

Minimalistic, fast, and experimentation-friendly researcher's toolbench for GPT-like models in ~<365 lines of code. Reaches <3.8 validation loss on wikitext-103 on a single A100 in ~138 seconds.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

llm-parameter-stats

How do parameter statistics change over training in LLMs?

Language:PythonLicense:Apache-2.0Stargazers:0Issues:2Issues:0

neuralsort

Sort lists with the help of an ANN to allow maximal parallelism in execution.

Language:PythonStargazers:0Issues:1Issues:0

parameter-checks

Extend typehints to include dynamic checks (that might otherwise be dealt with by assertions) in Python.

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

px4-simulation-ignition

Fix issue #19981 on PX4-Autopilot

Language:C++Stargazers:0Issues:0Issues:0

torch-benchmarks

Performance benchmark for PyTorch models

Language:PythonLicense:MITStargazers:0Issues:2Issues:0

hlb-gpt-value-activation

Check out how much of a difference the activation of the value makes vs. keeping it linear as in standard attention

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

llm-small-to-large

1. Train small LLM; 2. Use its outputs on the training data as labels for training large LLM, where their argmax agrees with the training data.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

mask

Some experiments with Attention masks

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

plan-act

A better way for LLMs to plan before acting.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

sglang

SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with LLMs faster and more controllable.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

torch-nested

Easily manipulate torch.Tensors inside highly nested data-structures.

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

torchinfo

View model summaries in PyTorch!

License:MITStargazers:0Issues:0Issues:0

typing-exe

Executable typehints for Python: make assertions about and/or modify parameters & return values

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

ul2

How much information can we extract from one token?

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0