Beast code in Giters

Fefe's starred repositories

private-gpt

Interact with your documents using the power of GPT, 100% privately, no data leaks

Language:PythonApache-2.05314900

Python-Implementation-of-LSA

A Jupyter notebook on implementation of Latent Semantic Analysis (A Topic Modelling Algorithm) in python.

Language:Jupyter NotebookMIT800

Real Yelp review data, cosine similarity ranking of query review in Vector Space, TF-IDF model. Unigram, Bigram Language model with linear interpolation smoothing, absolute discounting smoothing, Dirichlet smoothing. Perplexity analysis. Evaluations of six language models, including boolean, TF-IDF, Okapi BM25, Pivoted Length Normalization, Jelinek-Mercer smoothing, Dirichlet Prior Smoothing. The evaluation methods include Mean Average Precision, P@K, Reciprocal rank, Normalized Discount Cumulative Gain (NDCG).

Language:Java200

BM25

A complete implementation of Okapi BM25 with five evaluation methods (precision, recall, MAP, P at N and NDCG at N), using only standard Python libraries.

Language:PythonMIT100

pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.

Language:PythonApache-2.0157400

nlp

Natural Language Processing

Language:Python9600

search-engine-tfidf

Search engine implementation with TF.IDF algorithm using python + flask + mysql

Language:Python700

Intelligent_Document_Finder

Document Search Engine Tool

Language:PythonMIT7000

Coarse-grained-Sentiment-Analysis-on-Swachh-Bharat-using-Tweets

Language:Jupyter Notebook400

TopicBERT

Implementation of EMNLP2020 accepted paper: "TopicBERT: Topic-aware BERT for Efficient Document Classification"

Language:Python4200

Forecast-Daily-Interstate-94-Westbound-Traffic-Volume-for-MN-DoT-ATR-Station-301

Time series forecasting project using SAS

Language:SAS100

deep-learning-keras-tf-tutorial

Learn deep learning with tensorflow2.0, keras and python through this comprehensive deep learning tutorial series. Learn deep learning from scratch. Deep learning series for beginners. Tensorflow tutorials, tensorflow 2.0 tutorial. deep learning tutorial python.

Language:Jupyter Notebook81100

CamembertForFun

Small project of sentiment classification using CamemBERT trained on Allociné reviews and with a webapp interface

Language:Python200

nyt-article-summarizer

New York Times Article Summarization Tool

Language:Jupyter Notebook1600

NLP-image-to-text

code to extract text from images

Language:PythonMIT3500

cord19

a repo for the cord19 challenge

Language:Jupyter NotebookApache-2.03200

annoy

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Language:C++Apache-2.01297500

Checkbox-Table-cell-detection-using-OpenCV-Python

To extract relevant information from unstructured data sources like OMR sheets, scanned invoices, bills, etc into structured data, using Computer Vision and Natural Language Processing. the primary steps we are dependent on are Optical Character Recognition and Document Layout Analysis. Optical Character Recognition (OCR) is for detecting the text from the image where we try to get additional metadata from the documents like identifying headers, paragraphs, lines, words, tables, key-value pairs, etc.

Language:Jupyter Notebook200

Search_Engine_for_Wikipedia

Implementing from scratch a search engine for the French Wikipedia

Language:Jupyter Notebook1100

French-Word-Embeddings

French word embeddings from series sub-titles

Language:Jupyter NotebookMIT2200

bert_semantic_matching

BERT中文语义匹配，基于allennlp。

Language:Python500

lda2vec

Language:PythonMIT314400

Topic-Modeling-BERT-LDA

# Topic modeling with BERT, LDA and Clustering. Latent Dirichlet Allocation(LDA) probabilistic topic assignment and pre-trained sentence embeddings from BERT/RoBERTa.

Language:Jupyter Notebook4900

TopicModelling-LSA-LDA

Retrieving 'Topics' (concept) from corpus using (1) Latent Dirichlet Allocation (Genism) for modelling. Perplexity and Coherence score were used as evaluation models. (2) Latent Semantic Analysis using Term Frequency- Inverse Document Frequency and Truncated Singular Value Decomposition.

Language:Jupyter Notebook1200

frldj