torch-hacker

exploring how to make models that are fast and easily to use on:

status

cog run -p 5000 python -m cog.server.http

python samples.py models/phi2

python -m venv .venv
source .venv/bin/activate
pip install cog

Unfortunately there is already something listening on port 5000, so we need to use a different port.

PORT=4999 python -m cog.server.http

We can run a prediction with the Phi-2 language model using this command:

PORT=4999 python sample.py models/phi2

export REPLICATE_API_KEY=your-api-key

python sample.py anotherjesse/torch-hacker:0894e3d48c047d5cd2578375ab3f90d76ce6e693eac84fc829523d8a42a5a491 models/phi2

put the inner model in its own virtualenv? (especially for macos, but perhaps for replicate too)
test the brew/apt mapping/installer on macos
support Path inputs (whisper, llava, etc) -- seems to be kinda working (at least with https based "Paths")
how to "compile" an optimized cog on replicate
support models that leverage github code bases (example moondream)
"vibe" tests for input/outputs?
"streaming" support
support non-torch models