Fefe's starred repositories
private-gpt
Interact with your documents using the power of GPT, 100% privately, no data leaks
Python-Implementation-of-LSA
A Jupyter notebook on implementation of Latent Semantic Analysis (A Topic Modelling Algorithm) in python.
InformationRetrieval
Real Yelp review data, cosine similarity ranking of query review in Vector Space, TF-IDF model. Unigram, Bigram Language model with linear interpolation smoothing, absolute discounting smoothing, Dirichlet smoothing. Perplexity analysis. Evaluations of six language models, including boolean, TF-IDF, Okapi BM25, Pivoted Length Normalization, Jelinek-Mercer smoothing, Dirichlet Prior Smoothing. The evaluation methods include Mean Average Precision, P@K, Reciprocal rank, Normalized Discount Cumulative Gain (NDCG).
search-engine-tfidf
Search engine implementation with TF.IDF algorithm using python + flask + mysql
Intelligent_Document_Finder
Document Search Engine Tool
Forecast-Daily-Interstate-94-Westbound-Traffic-Volume-for-MN-DoT-ATR-Station-301
Time series forecasting project using SAS
deep-learning-keras-tf-tutorial
Learn deep learning with tensorflow2.0, keras and python through this comprehensive deep learning tutorial series. Learn deep learning from scratch. Deep learning series for beginners. Tensorflow tutorials, tensorflow 2.0 tutorial. deep learning tutorial python.
CamembertForFun
Small project of sentiment classification using CamemBERT trained on Allociné reviews and with a webapp interface
nyt-article-summarizer
New York Times Article Summarization Tool
NLP-image-to-text
code to extract text from images
Checkbox-Table-cell-detection-using-OpenCV-Python
To extract relevant information from unstructured data sources like OMR sheets, scanned invoices, bills, etc into structured data, using Computer Vision and Natural Language Processing. the primary steps we are dependent on are Optical Character Recognition and Document Layout Analysis. Optical Character Recognition (OCR) is for detecting the text from the image where we try to get additional metadata from the documents like identifying headers, paragraphs, lines, words, tables, key-value pairs, etc.
Search_Engine_for_Wikipedia
Implementing from scratch a search engine for the French Wikipedia
French-Word-Embeddings
French word embeddings from series sub-titles
bert_semantic_matching
BERT中文语义匹配,基于allennlp。
Topic-Modeling-BERT-LDA
# Topic modeling with BERT, LDA and Clustering. Latent Dirichlet Allocation(LDA) probabilistic topic assignment and pre-trained sentence embeddings from BERT/RoBERTa.
TopicModelling-LSA-LDA
Retrieving 'Topics' (concept) from corpus using (1) Latent Dirichlet Allocation (Genism) for modelling. Perplexity and Coherence score were used as evaluation models. (2) Latent Semantic Analysis using Term Frequency- Inverse Document Frequency and Truncated Singular Value Decomposition.
semantic-search-through-wikipedia-with-weaviate
Semantic search through a vectorized Wikipedia (SentenceBERT) with the Weaviate vector search engine
vector_engine
Build a semantic search engine with Transformers and Faiss
Information_retrieval_system
Information retrieval system ,python, Text Mining, bag f words, web Mining, word2Vec, jupyter, IPYNB