thanhtin.nguyen's repositories
owlvit_segment_anything
Combining OwlViT with Segment Anything - Open-vocabulary Detection and Segmentation (Text-conditioned, and Image-conditioned)
Line_Segmentation
Line Segmentation Based on Bi-variate Gauss Statistic and Distance Metric; and Handwritten Recognition
VLSP_ImageCaptioning
VLSP2021 vieCap4H Challenge: Automatic image caption generation for healthcare domains in Vietnamese
Deforestation_Segmentation
Deforestation Segmentation
Coarse-To-Fine-Fusion-for-Language-Grounding-in-3D-Navigation
Auto Encoder Enhanced Vision Language Navigation in Vizdoom, KBS 2023
Data-structures-And-Algorithms
Data structures and Algorithms
Multigoal_VLN_Vizdoom
Fourier Transform Enhanced Vision Language Multi-goal Navigation
Automate_Research
Try to automate everything for research
Asymmetric-CL
A PyTorch implementation of an asymmetric version of the (focal) contrastive loss
blogs
Jupyter notebooks that support my graph data science blog posts at https://bratanic-tomaz.medium.com/
CLIPQ
A simple implementation of a CLIP that splits up an image into quandrants and then gets the embeddings for each quandrant
Elementary-Math-Solving-Zalo-AI-2023
Baseline for ZaloAI Challenge 2023 Elementary Math Solving
Everything-about-LLMs
A work in progress. Trying to write about all interesting or necessary pieces in the current development of LLMs and generative AI. Gradually adding more topics.
ImageBind-LoRA
Fine-tuning "ImageBind One Embedding Space to Bind Them All" with LoRA
ivadomed
Repository on the collaborative IVADO medical imaging project between the Mila and NeuroPoly labs.
kaggle_benetech
Solution for the Benetech - Making Graphs Accessible Kaggle Competition
LaVy
Pioneering in Vietnamese Multimodal Large Language Model
llm-claude-3
LLM plugin for interacting with the Claude 3 family of models
llm-course
Course with a roadmap and notebooks to get into Large Language Models (LLMs).
LoRA-ViT
Low rank adaptation for Vision Transformer
Macaw-LLM
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
sg2im
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 2018
TSVLC
Repository for the paper: Teaching Structured Vision & Language Concepts to Vision & Language Models
Vehicle-CV-ADAS
The project can achieve FCWS, LDWS, and LKAS functions solely using only visual sensors. using YOLOv5 / YOLOv5-lite / YOLOv8 and Ultra-Fast-Lane-Detection-v2 .