John Bosco (boscoj2008)

boscoj2008

Geek Repo

Company:National Institute of Advanced Industrial Science & Technology

Location:Tsukuba, Japan

Home Page:boscoj2008.github.io

Github PK Tool:Github PK Tool

John Bosco's starred repositories

flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)

Language:PythonLicense:NOASSERTIONStargazers:13720Issues:201Issues:2290

umap

Uniform Manifold Approximation and Projection

Language:PythonLicense:BSD-3-ClauseStargazers:7211Issues:128Issues:777

ml-interviews-book

https://huyenchip.com/ml-interviews-book/

AmpliGraph

Python library for Representation Learning on Knowledge Graphs https://docs.ampligraph.org

Language:PythonLicense:Apache-2.0Stargazers:2120Issues:66Issues:221

usearch

Fast Open-Source Search & Clustering engine × for Vectors & 🔜 Strings × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍

Language:C++License:Apache-2.0Stargazers:1938Issues:23Issues:124

Data-Engineering-Projects

Personal Data Engineering Projects

Language:Jupyter NotebookStargazers:774Issues:8Issues:0

BERT4doc-Classification

Code and source for paper ``How to Fine-Tune BERT for Text Classification?``

Language:PythonLicense:Apache-2.0Stargazers:603Issues:9Issues:21

Data-Engineering-with-Python

Data Engineering with Python, published by Packt

Language:PythonLicense:MITStargazers:578Issues:17Issues:3

DeCLUTR

The corresponding code from our paper "DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations". Do not hesitate to open an issue if you run into any trouble!

Language:PythonLicense:Apache-2.0Stargazers:377Issues:12Issues:83

open_lm

A repository for research on medium sized language models.

Language:PythonLicense:MITStargazers:345Issues:21Issues:60

star-clustering

A clustering algorithm that automatically determines the number of clusters and works without hyperparameter fine-tuning.

Language:PythonLicense:Apache-2.0Stargazers:213Issues:5Issues:5

ChatGPT-vs.-BERT

🎁[ChatGPT4NLU] A Comparative Study on ChatGPT and Fine-tuned BERT

SparseLSH

A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.

Language:PythonLicense:NOASSERTIONStargazers:138Issues:9Issues:5

HBMP

Sentence Embeddings in NLI with Iterative Refinement Encoders

Language:PythonLicense:MITStargazers:78Issues:6Issues:9

paris

Hierarchical graph clustering

Language:Jupyter NotebookStargazers:37Issues:2Issues:0

Ensemble-Clustering-for-Graphs

Code, notebooks and examples with ECG: Ensemble Clustering for Graphs

Language:Jupyter NotebookLicense:MITStargazers:30Issues:5Issues:2

VGLM

Versatile Generative Language Model

Language:PythonLicense:MITStargazers:26Issues:2Issues:0
Language:Jupyter NotebookStargazers:22Issues:2Issues:0

nlp_text_summarization_implementation

Three modules of extractive text summarization, including implementation of Kmeans clustering using BERT sentence embedding

Language:Jupyter NotebookStargazers:11Issues:0Issues:0

deeper-lite

deep entity resolution lite version

Language:PythonStargazers:10Issues:0Issues:0

ExCut

Implementation of ExCut: Explainable Embedding-based Clustering over Knowledge Graphs

Language:PythonLicense:Apache-2.0Stargazers:10Issues:2Issues:0

Customer-Segmentation-using-Unsupervised-Learning

This project shows how to perform customers segmentation using Machine Learning algorithms. Three techniques will be presented and compared: KMeans, Agglomerative Clustering ,Affinity Propagation and DBSCAN.

Language:Jupyter NotebookLicense:MITStargazers:8Issues:1Issues:0
Language:Jupyter NotebookLicense:NOASSERTIONStargazers:7Issues:3Issues:0

LinkedInJobAnalytics

•Scraped LinkedIn data using Selenium, cleaned and created schema in Excel. •Analyzed data using SQL, and presented insights via Power BI dashboard. •Used natural language processing to improve skill matching feature, and developed Clustering ML Model. •Developed website using HTML, CSS, and Flask for a user-friendly experience.

Language:Jupyter NotebookStargazers:4Issues:0Issues:0

NLP_Determining_Authorship_of_Hebrew_Bible

Identifying authorship of ancient hebrew texts via word embeddings (skip-gram, LSTM, BERT), unsupervised clustering and evaluation.

Language:Jupyter NotebookStargazers:3Issues:0Issues:0

Empirical-Study-of-Entity-Resolution-Using-Word-Embedding

Performed entity resolution/record linkage using different types of word embedding techniques on E-Commerce datasets.

Language:Jupyter NotebookStargazers:3Issues:0Issues:0

infersent-train-2021

contains files and scripts for training InferSent algorithm

Language:Jupyter NotebookStargazers:2Issues:2Issues:0

Density-Based-Clustering_method_with_python

The first type of clustering algorithm discussed in this course used the spatial distribution of points to determine cluster centers and membership. The most prominent implementation of this concept is the K-means cluster algorithm. This approach is conceptually simple and often fast, however, it requires knowledge of the number of clusters ahead of time. While there are automated methods for determining 𝑘 algorithmically, this requirement is still an impediment for some applications. An alternative, density-based clustering technique called Density-Based Spatial Clustering of Applications with Noise (DBSCAN) can be used instead. The DBSCAN algorithm has several advantages over the K-means algorithm. First, DBSCAN automatically determines the number of clusters within a data set. Second, since the DBSCAN algorithm is a density-based clustering algorithm, the discovered clusters can have arbitrary shapes. On the other hand, since the clusters and their membership are defined by the density, the hyperparameters used to specify the target density can dramatically affect the cluster determination. Thus, hyperparameter tuning may be required to achieve optimal results.

Language:Jupyter NotebookStargazers:2Issues:0Issues:0

ContextualBlocker-for-EM

A Graph-Based Blocking Approach for Entity Matching Using Contrastively Learned Embeddings

Language:PythonStargazers:1Issues:2Issues:0

Combine_BERT_with_GloVe

Combining BERT with Static Word Embedding for Categorizing Social Media