crazyboystop's repositories
david-share
graph rag,graphrag开源项目地址:https://github.com/microsoft/graphrag 配置过程中的避坑经验以及配置步骤:https://github.com/davidsajare/david-share.git 下的LLMs/graphrag
awesome-public-datasets
A topic-centric list of HQ open datasets. 一个以主题为中心的HQ开放数据集列表
graphrag
A modular graph-based Retrieval-Augmented Generation (RAG) system
DeepBI
LLM based data scientist, AI native data application. AI-driven infinite thinking redefines BI.
Financial-Statement-Data-Analysis
利用python来分析一些财务报表数据
opendaylight-controller
Mirror of the OpenDaylight controller gerrit project
supersonic
SuperSonic is the next-generation BI platform that integrates Chat BI (powered by LLM) and Headless BI (powered by semantic layer) paradigms.
datasets
A collection of datasets of ML problem solving
winutils
winutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows
docker-ce-for-win
Bug reports for Docker Desktop for Windows
ckeditor5-vue
Official CKEditor 5 Vue.js component.
ragflow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
AISystem
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术 AI infra
Generate_PPT_using_llama2
Use AI to create your Presentations using llama2-7B
QAnything
Question and Answer based on Anything.
cookbook
Open-source AI cookbook
MNBVC
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
PowerBI-Developer-Samples
A collection of Power BI samples for developer use.
Awesome-LLMs-Datasets
Summarize existing representative LLMs text datasets. 最新综述全面梳理LLM各环节数据集:444个数据集,774.5TB (附下载链接)数据集汇总及下载链接: https://github.com/lmmlzn/Awesome-LLMs-Datasets论文链接:https://arxiv.org/pdf/2402.18041.pdf2024-02-28的最新综述文章“Datasets for Large Language Models: A Comprehensive Survey”,该综述提供了现有可用LLM数据集资源的全面整理,包括444个数据集的统计数据,涵盖8种语言、32个领域,数据量超过774.5TB的预训练预料
data_sciences_campaign
【数据科学家系列课程】
awesome-pretrained-chinese-nlp-models
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
llm-course
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks. 强化学习 机器 学习 大语言模型课程
cupy
NumPy & SciPy for GPU
robotframework-RPA
Generic automation framework for acceptance testing and RPArobotframework - 是最专业、最先进的开源RPA工具之一
wechaty
Conversational RPA SDK for Chatbot Makers. Join our Discord: https://discord.gg/7q8NBZbQzt
Smart_Construction
Base on YOLOv5 Head Person Helmet Detection on Construction Sites,基于目标检测工地安全帽和禁入危险区域识别系统,🚀😆附 YOLOv5 训练自己的数据集超详细教程🚀😆2021.3新增可视化界面❗❗
InfoSpider
INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱🧰,旨在安全快捷的帮助用户拿回自己的数据,工具代码开源,流程透明。支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、**移动、**联通、**电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源**博客、简书。
PaddleNLP
👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
RoadSafety-Gpt
交通垂直领域微调大模型
llm-universe
本项目是一个面向小白开发者的大模型应用开发教程,在线阅读地址:https://datawhalechina.github.io/llm-universe/