John Bosco's starred repositories
ml-interviews-book
https://huyenchip.com/ml-interviews-book/
AmpliGraph
Python library for Representation Learning on Knowledge Graphs https://docs.ampligraph.org
Data-Engineering-Projects
Personal Data Engineering Projects
BERT4doc-Classification
Code and source for paper ``How to Fine-Tune BERT for Text Classification?``
Data-Engineering-with-Python
Data Engineering with Python, published by Packt
star-clustering
A clustering algorithm that automatically determines the number of clusters and works without hyperparameter fine-tuning.
ChatGPT-vs.-BERT
🎁[ChatGPT4NLU] A Comparative Study on ChatGPT and Fine-tuned BERT
Ensemble-Clustering-for-Graphs
Code, notebooks and examples with ECG: Ensemble Clustering for Graphs
nlp_text_summarization_implementation
Three modules of extractive text summarization, including implementation of Kmeans clustering using BERT sentence embedding
deeper-lite
deep entity resolution lite version
Customer-Segmentation-using-Unsupervised-Learning
This project shows how to perform customers segmentation using Machine Learning algorithms. Three techniques will be presented and compared: KMeans, Agglomerative Clustering ,Affinity Propagation and DBSCAN.
LinkedInJobAnalytics
•Scraped LinkedIn data using Selenium, cleaned and created schema in Excel. •Analyzed data using SQL, and presented insights via Power BI dashboard. •Used natural language processing to improve skill matching feature, and developed Clustering ML Model. •Developed website using HTML, CSS, and Flask for a user-friendly experience.
NLP_Determining_Authorship_of_Hebrew_Bible
Identifying authorship of ancient hebrew texts via word embeddings (skip-gram, LSTM, BERT), unsupervised clustering and evaluation.
Empirical-Study-of-Entity-Resolution-Using-Word-Embedding
Performed entity resolution/record linkage using different types of word embedding techniques on E-Commerce datasets.
infersent-train-2021
contains files and scripts for training InferSent algorithm
Density-Based-Clustering_method_with_python
The first type of clustering algorithm discussed in this course used the spatial distribution of points to determine cluster centers and membership. The most prominent implementation of this concept is the K-means cluster algorithm. This approach is conceptually simple and often fast, however, it requires knowledge of the number of clusters ahead of time. While there are automated methods for determining 𝑘 algorithmically, this requirement is still an impediment for some applications. An alternative, density-based clustering technique called Density-Based Spatial Clustering of Applications with Noise (DBSCAN) can be used instead. The DBSCAN algorithm has several advantages over the K-means algorithm. First, DBSCAN automatically determines the number of clusters within a data set. Second, since the DBSCAN algorithm is a density-based clustering algorithm, the discovered clusters can have arbitrary shapes. On the other hand, since the clusters and their membership are defined by the density, the hyperparameters used to specify the target density can dramatically affect the cluster determination. Thus, hyperparameter tuning may be required to achieve optimal results.
ContextualBlocker-for-EM
A Graph-Based Blocking Approach for Entity Matching Using Contrastively Learned Embeddings
Combine_BERT_with_GloVe
Combining BERT with Static Word Embedding for Categorizing Social Media