Beast code in Giters

Ludwing's starred repositories

build-your-own-x

Master programming by recreating your favorite technologies from scratch.

Language:Markdown304840 5415 683

llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

Language:LLVMNOASSERTION28577 585 77042

pydantic

Data validation using Python type hints

Language:PythonMIT20841 117 4563

clash-rules

🦄️ 🎃 👻 Clash Premium 规则集(RULE-SET)，兼容 ClashX Pro、Clash for Windows 等基于 Clash Premium 内核的客户端。

GPL-3.018818 101 268

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.

Language:PythonApache-2.012048 105 3596

grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.

Language:ANTLRMIT10145 227 1447

doccano

Open source annotation tool for machine learning practitioners.

Language:PythonMIT9501 133 1524

cozo

A transactional, relational-graph-vector database that uses Datalog for query. The hippocampus for AI!

Language:RustMPL-2.03375 42 145

roapi

Create full-fledged APIs for slowly moving datasets without writing a single line of code.

Language:RustApache-2.03198 43 157

pypika

PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.

Language:PythonApache-2.02507 36 433

projects

🪐 End-to-end NLP workflows from prototype to production

Language:PythonMIT1315 320

CMeKG_tools

Language:PythonMIT1059 9 22

souffle

Soufflé is a variant of Datalog for tool designers crafting analyses in Horn clauses. Soufflé synthesizes a native parallel C++ program from a logic specification.

Language:C++UPL-1.0915 41 855

text_analysis_tools

中文文本分析工具包（包括- 文本分类 - 文本聚类 - 文本相似性 - 关键词抽取 - 关键短语抽取 - 情感分析 - 文本纠错 - 文本摘要 - 主题关键词-同义词、近义词-事件三元组抽取）

Language:PythonApache-2.0675 8 5

MarkTool

DoTAT 是一款基于web、面向领域的通用文本标注工具，支持大规模实体标注、关系标注、事件标注、文本分类、基于字典匹配和正则匹配的自动标注以及用于实现归一化的标准名标注，同时也支持迭代标注、嵌套实体标注和嵌套事件标注。标注规范可自定义且同类型任务中可“一次创建多次复用”。通过分级实体集合扩大了实体类型的规模，并设计了全新高效的标注方式，提升了用户体验和标注效率。此外，本工具增加了审核环节，可对多人的标注结果进行一致性检验、自动合并和手动调整，提高了标注结果的准确率。

Language:VueApache-2.0592 13 18

graphql-compiler

Turn complex GraphQL queries into optimized database queries.

Language:PythonApache-2.0552 24 166

rule-engine

A lightweight, optionally typed expression language with a custom grammar for matching arbitrary Python objects.

Language:PythonBSD-3-Clause461 7 66

OmniEvent

A comprehensive, unified and modular event extraction toolkit.

Language:PythonMIT342 10 29

problog

ProbLog is a Probabilistic Logic Programming Language for logic programs with probabilities.

Language:Python311 26 89

awesome-ontology

A curated list of ontology things

CC-BY-4.0274 11 1

Web-crawler

调研药品数据网站。基于网络爬虫爬取药源网药物数据，搭建药品数据库。含中成药和化学药品信息共计10万余条。爬取国家食品药品监督管理局药品数据对药源网数据进行修正。基于Selenium等工具应对反爬，爬取ICD10等数据共研究使用。

Language:Python102 3 2

my-bookshelf

Collection of books/papers that I've read/I'm going to read/I would remember that they exist/It is unlikely that I'll read/I'll never read.

Language:HTMLMIT74 60

radb

RA (radb): A relational algebra interpreter over relational databases

Language:PythonNOASSERTION62 10 6

Elasticsearch-7.0-Cookbook

Elasticsearch 7.0 Cookbook, Fourth-Edition, published by packt publishing

Language:ShellMIT54 6 2

nmpa-data

国家药监局药品数据

Language:C#43 1 1

NER-RE

A Named Entity Recognition + Entity Linker + Relation Extraction Pipeline built using spacy v3. Given a text, the pipeline will extract entities from the text as trained and will disambiguate the entities to its normalized form through an Entity Linker connected to a Knowledge Base and will assign a relation between the entities, if any.

Language:Python35 3 7

Ludwing