There are 0 repository under vlm topic.
Aircraft design optimization made fast through modern automatic differentiation. Composable analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
A reading list for large models safety, security, and privacy.
[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model
Ptera Software is a fast, easy-to-use, and open-source software package for analyzing flapping-wing flight.
A curated list of awesome papers on Embodied AI and related research/industry-driven resources.
[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.
Matlab implementation to simulate the non-linear dynamics of a fixed-wing unmanned areal glider. Includes tools to calculate aerodynamic coefficients using a vortex lattice method implementation, and to extract longitudinal and lateral linear systems around the trimmed gliding state.
Famous Vision Language Models and Their Architectures
🧘🏻♂️KarmaVLM (相生):A family of high efficiency and powerful visual language model.
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
This repo is a live list of papers on game playing and large multimodality model - "A Survey on Game Playing Agents and Large Models: Methods, Applications, and Challenges".
[ICRA 2024] Dream2Real: Zero-Shot 3D Object Rearrangement with Vision-Language Models
A system for prompted weak supervision.
Vortex lattice method for inviscid lifting-surface aerodynamics
Python companion to Low Speed Aerodynamics by Joseph Katz and Allen Plotkin
This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"
Fluid-Structure Interaction Analysis Using FEM and UVLM
Customizing Visual-Language Foundation Models for Multi-modal Anomaly Detection and Reasoning
Python scripts to use for captioning images with VLMs
python implementation of a 3D Vortex Lattice Method