jdphilius's starred repositories

milvus

A cloud-native vector database, storage for next generation AI applications

Language:GoLicense:Apache-2.0Stargazers:26648Issues:273Issues:10503

twint

An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.

Language:PythonLicense:MITStargazers:15534Issues:323Issues:1173

spark-joy

✨😂 2000+ ways to add design flair, user delight, and whimsy to your product.

doccano

Open source annotation tool for machine learning practitioners.

Language:PythonLicense:MITStargazers:8936Issues:128Issues:1489

pycaret

An open-source, low-code machine learning library in Python

Language:Jupyter NotebookLicense:MITStargazers:8385Issues:131Issues:2267

txtai

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

Language:PythonLicense:Apache-2.0Stargazers:6912Issues:80Issues:659

Memex

Browser extension to curate, annotate, and discuss the most valuable content and ideas on the web. As individuals, teams and communities.

gpt-code-ui

An open source implementation of OpenAI's ChatGPT Code interpreter

Language:PythonLicense:MITStargazers:3473Issues:42Issues:29

texthero

Text preprocessing, representation and visualization from zero to hero.

Language:PythonLicense:MITStargazers:2862Issues:42Issues:119

tslearn

The machine learning toolkit for time series analysis in Python

Language:PythonLicense:BSD-2-ClauseStargazers:2776Issues:60Issues:311

TextAttack

TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/

Language:PythonLicense:MITStargazers:2733Issues:36Issues:265

High-Frequency-Trading-Model-with-IB

A high-frequency trading model using Interactive Brokers API with pairs and mean-reversion in Python

Language:PythonLicense:MITStargazers:2405Issues:245Issues:19

examples

Jupyter Notebooks to help you get hands-on with Pinecone vector databases

Language:Jupyter NotebookLicense:MITStargazers:2396Issues:51Issues:39

scattertext

Beautiful visualizations of how language differs among document types.

Language:PythonLicense:Apache-2.0Stargazers:2196Issues:56Issues:100

pigeon

🐦 Quickly annotate data from the comfort of your Jupyter notebook

Language:PythonLicense:Apache-2.0Stargazers:765Issues:11Issues:11

maubot

A plugin-based Matrix bot system.

Language:PythonLicense:AGPL-3.0Stargazers:669Issues:17Issues:143

ekphrasis

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

Language:PythonLicense:MITStargazers:656Issues:18Issues:28

codequestion

🔎 Semantic search for developers

Language:PythonLicense:Apache-2.0Stargazers:508Issues:16Issues:30

wtpsplit

Code for Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation

Language:PythonLicense:MITStargazers:489Issues:10Issues:56

googlesearch

A Python library for scraping the Google search engine.

Language:PythonLicense:MITStargazers:393Issues:6Issues:50

AquilaDB

An easy to use Neural Search Engine. Index latent vectors along with JSON metadata and do efficient k-NN search.

Language:PythonLicense:Apache-2.0Stargazers:372Issues:22Issues:41

docTTTTTquery

docTTTTTquery document expansion model

Language:PythonLicense:Apache-2.0Stargazers:342Issues:15Issues:33

vectorai

Vector AI — A platform for building vector based applications. Encode, query and analyse data using vectors.

Language:PythonLicense:Apache-2.0Stargazers:305Issues:11Issues:17

textpipe

Textpipe: clean and extract metadata from text

Language:PythonLicense:MITStargazers:300Issues:22Issues:40

notebooks

Code examples and jupyter notebooks for the Cohere Platform

Language:Jupyter NotebookStargazers:262Issues:15Issues:7

sentence-splitter

Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.

Language:PythonLicense:NOASSERTIONStargazers:216Issues:7Issues:7

doppel-bot

Train a language model to answer Slack messages as you.

Language:PythonLicense:MITStargazers:190Issues:4Issues:6

relevanceai

Home of the AI workforce - Multi-agent system, AI agents & tools

Language:PythonLicense:Apache-2.0Stargazers:95Issues:10Issues:8

sentence-doctor

Many Natural Language Processing tasks rely on sentence boundary detection (SBD). Although amazing libraries like spacy provide state of the art SBD, they often depend on text extractors (e.g pdf text extractors or OCR). The quality of these extractors greatly influence the quality of SBD libraries and as a consequence, the performance of downstream models as well. To help address this problem, we fine-tuned a T5 model from the hugging face hub that attempts to reconstruct “broken sentences”

niacin

Enrich your data

Language:PythonLicense:BSD-3-ClauseStargazers:16Issues:3Issues:8