Beast code in Giters

Michael Günther's repositories

postgres-word2vec

utils to use word embedding models like word2vec vectors in a PostgreSQL database

Language:CMIT141 9 7

table-embeddings

Tools for training schema-aware Web table embedding for unsupervised and supervised machine learning on tabular data

Language:PythonMIT13 2 1

the-movie-database-import

Script to import data from the The Movie Database to PostgreSQL (Dataset URL: https://www.kaggle.com/rounakbanik/the-movies-dataset

Language:PythonMIT10 1 1

postgres-retrofit

Tools to create database-specific text value embeddings from word embedding datasets

Language:PythonMIT7 30

dwtc-geo-parser

Language:Python3 20

docarray

🧬 The data structure for unstructured multimodal data · Neural Search · Vector Search · Document Store

Language:PythonApache-2.0100

google-play-dataset-import

Script to import data from a Google Play Store Apps dataset to a PostgreSQL database (Dataset URL: https://www.kaggle.com/lava18/google-play-store-apps)

Language:PythonMIT1 10

open-food-facts-postgresql-import

Script to import data from the Open Food Facts to PostgreSQL (Dataset URL: https://www.kaggle.com/openfoodfacts/world-food-facts)

Language:PythonMIT1 10

fast_minh

Python package for fast MinHash calculation and operations

Language:C++Apache-2.0010

mteb

MTEB: Massive Text Embedding Benchmark

Language:PythonApache-2.0000

NLP-OSS

Democratizing NLP!

CC0-1.0000

SimilarityMeasure

Compute for one node in a graph the most similar one

Language:C++010

test-gradient-cache

Small test script of gradient cache (https://github.com/luyug/GradCache) applied to train a model for a retrieval task on the SciFact dataset (https://allenai.org/data/scifact)

Language:Python010