boscoj2008

John Bosco's starred repositories

flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)

Language:PythonNOASSERTION13720 201 2290

umap

Uniform Manifold Approximation and Projection

Language:PythonBSD-3-Clause7211 128 777

ml-interviews-book

https://huyenchip.com/ml-interviews-book/

Language:HTML3238 43 14

AmpliGraph

Python library for Representation Learning on Knowledge Graphs https://docs.ampligraph.org

Language:PythonApache-2.02120 66 221

usearch

Fast Open-Source Search & Clustering engine × for Vectors & 🔜 Strings × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍

Language:C++Apache-2.01938 23 124

Data-Engineering-Projects

Personal Data Engineering Projects

Language:Jupyter Notebook774 80

BERT4doc-Classification

Code and source for paper ``How to Fine-Tune BERT for Text Classification?``

Language:PythonApache-2.0603 9 21

Data-Engineering-with-Python

Data Engineering with Python, published by Packt

Language:PythonMIT578 17 3

DeCLUTR

The corresponding code from our paper "DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations". Do not hesitate to open an issue if you run into any trouble!

Language:PythonApache-2.0377 12 83

open_lm

A repository for research on medium sized language models.

Language:PythonMIT345 21 60

star-clustering

A clustering algorithm that automatically determines the number of clusters and works without hyperparameter fine-tuning.

Language:PythonApache-2.0213 5 5

ChatGPT-vs.-BERT

🎁[ChatGPT4NLU] A Comparative Study on ChatGPT and Fine-tuned BERT

Language:Python192 5 1

SparseLSH

A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.

Language:PythonNOASSERTION138 9 5

HBMP

Sentence Embeddings in NLI with Iterative Refinement Encoders

Language:PythonMIT78 6 9

paris

Hierarchical graph clustering

Language:Jupyter Notebook37 20

Ensemble-Clustering-for-Graphs

Code, notebooks and examples with ECG: Ensemble Clustering for Graphs

Language:Jupyter NotebookMIT30 5 2

VGLM

Versatile Generative Language Model

Language:PythonMIT26 20

nlp_text_summarization_implementation

Three modules of extractive text summarization, including implementation of Kmeans clustering using BERT sentence embedding

Language:Jupyter Notebook1100

deeper-lite

deep entity resolution lite version

Language:Python1000

ExCut

Implementation of ExCut: Explainable Embedding-based Clustering over Knowledge Graphs

Language:PythonApache-2.010 20

Customer-Segmentation-using-Unsupervised-Learning

This project shows how to perform customers segmentation using Machine Learning algorithms. Three techniques will be presented and compared: KMeans, Agglomerative Clustering ,Affinity Propagation and DBSCAN.

Language:Jupyter NotebookMIT8 10

InferSent

Language:Jupyter NotebookNOASSERTION7 30

LinkedInJobAnalytics

•Scraped LinkedIn data using Selenium, cleaned and created schema in Excel. •Analyzed data using SQL, and presented insights via Power BI dashboard. •Used natural language processing to improve skill matching feature, and developed Clustering ML Model. •Developed website using HTML, CSS, and Flask for a user-friendly experience.

Language:Jupyter Notebook400

NLP_Determining_Authorship_of_Hebrew_Bible

Identifying authorship of ancient hebrew texts via word embeddings (skip-gram, LSTM, BERT), unsupervised clustering and evaluation.

Language:Jupyter Notebook300

Empirical-Study-of-Entity-Resolution-Using-Word-Embedding

Performed entity resolution/record linkage using different types of word embedding techniques on E-Commerce datasets.

Language:Jupyter Notebook300

infersent-train-2021

contains files and scripts for training InferSent algorithm

Language:Jupyter Notebook2 20

Density-Based-Clustering_method_with_python

The first type of clustering algorithm discussed in this course used the spatial distribution of points to determine cluster centers and membership. The most prominent implementation of this concept is the K-means cluster algorithm. This approach is conceptually simple and often fast, however, it requires knowledge of the number of clusters ahead of time. While there are automated methods for determining 𝑘 algorithmically, this requirement is still an impediment for some applications. An alternative, density-based clustering technique called Density-Based Spatial Clustering of Applications with Noise (DBSCAN) can be used instead. The DBSCAN algorithm has several advantages over the K-means algorithm. First, DBSCAN automatically determines the number of clusters within a data set. Second, since the DBSCAN algorithm is a density-based clustering algorithm, the discovered clusters can have arbitrary shapes. On the other hand, since the clusters and their membership are defined by the density, the hyperparameters used to specify the target density can dramatically affect the cluster determination. Thus, hyperparameter tuning may be required to achieve optimal results.

Language:Jupyter Notebook200

ContextualBlocker-for-EM

A Graph-Based Blocking Approach for Entity Matching Using Contrastively Learned Embeddings

Language:Python1 20

Combine_BERT_with_GloVe

Combining BERT with Static Word Embedding for Categorizing Social Media

Language:Python1 1 3