Junghwan Park's starred repositories
ml-mobileclip
This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" CVPR 2024
sd-forge-layerdiffuse
[WIP] Layer Diffusion for WebUI (via Forge)
Free-GPT-Actions
A listing of Free GPT actions available for public use
Beyond-INet
Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"
YOLO-World
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
ai-infra-landscape
This is a landscape of the infrastructure that powers the generative AI ecosystem
llama_parse
Parse files for optimal RAG
professional-programming
A collection of learning resources for curious software engineers
ArchiveBox
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
obsidian-web-clipper
Obsidian Web Clipper is a simple Browser extension for Obsidian, a popular note-taking application. With this extension, you can quickly capture notes directly from your web browser and save them to your Obsidian vaults.
Video-LLaVA
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
Video-ChatGPT
"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
GroundingGPT
GroundingGPT: Language-Enhanced Multi-modal Grounding Model
Video-LLaVA
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection