There are 10 repositories under bangla-nlp topic.
This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla" accpeted in Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: NAACL-2022.
This repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 16 - November 20, 2020.
Deep learning Bangla resources with TensorFlow
This repository contains the official release of the model "BanglaT5" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaNLG: Benchmarks and Resources for Evaluating Low-Resource Natural Language Generation in Bangla".
Bangla-Bert is a pretrained bert model for Bengali language
✍️ Bengali alphabet (বাংলা বর্ণমালা)
Awesome datasets for Bangla language computing.
Bangla Machine Translator
Bangla news classification and generation
Automatic Context Sensitive Spelling Correction for Bangla Text Using Bert and Levenstein Distance
BNLTK(Bangla Natural Language Processing Toolkit): a python package for NLP in Bangla
Bangla word2vec using skipgram approach
This repository contains the code, data, and associated models of the paper titled "BanglaParaphrase: A High-Quality Bangla Paraphrase Dataset", accepted in Proceedings of the Asia-Pacific Chapter of the Association for Computational Linguistics: AACL 2022.
Different bangla datasets for sentiment analysis on bangla text
The default auto correct dictionary added in avro Bangla keyboard doesn't contain enough word. So, this is my approach to enrich the dictionary. This file contains the correct spelling of commonly used Bangla words.
A collection of Bangla newspaper and blog crawlers. Can be used to mine bangla text data for Natural Language Processing tasks.
Chatbot Solution for Resource-Poor Languages. Contains code and data for Journal Article 'Focused domain contextual AI chatbot framework for resource poor languages'.
A collection of colab trainer for NLP tasks.
Word-level language identification for Bangla-English code-mixed social media data, using a BiLSTM with subword embeddings.
Dataset for Bangla named entity recognition
Bengali News Summarization - BengaliGPT & T5
This module helps to analyze Bengali sentences. It can analyze various entities. Can do non contextual PoS tagging. Is capable of returning the lemmas present in a sentence.
With the rapid growth of Bangla music industry huge volume of Bangla songs are produced every day. Immense number of producers, lyricists, singers and artists are involved in production of songs from different genres. Among many genres of Bangla music; classical, folk, baul, modern music, Rabindra Sangeet, Nazrul Geeti, film music, rock music and fusion music has gained the highest popularity. Lyricists try to express their feelings and views towards any situation or subject through their writings. Therefore, each lyricist have their own dictionary of thoughts to put on music lyrics. In this paper, we have presented “BanglaMusicStylo”, the very first stylometric dataset of Bangla music lyrics. We have collected 2824 Bangla song lyrics of 211 lyricists in a digital form. All the lyrics are stored in text format for further use. This dataset could be used for stylometric analysis such as authorship attribution, linguistic forensics, gender identification from textual data, Bangla music genre classification, vandalism detection, emotion classification etc. Identifying the significant research opportunities in this area, we have formalized this dataset which could be used for stylometric analysis.
বাংলায় ন্যাচারাল ল্যাঙ্গুয়েজ প্রসেসিং এর উপর লেখা সিরিজের জন্য কোড রিপোজিটরি
The data and code of 'NERvous About My Health: Constructing a Bengali Medical Named Entity Recognition Dataset', published in the Findings of the Association for Computational Linguistics, EMNLP 2023.
Converts a Bangla numeric string to literal words.
Bangla NLP toolkit: Bangla text normalization, punctuation generation and augmentation for Bangla NLP tasks. This project is available on PyPi as well.
The data and code of 'BanglaCHQ-Summ: An Abstractive Summarization Dataset for Medical Queries in Bangla Conversational Speech', published in the Proceedings of the First Workshop on Bangla Language Processing, EMNLP 2023.