There are 10 repositories under bangla-nlp topic.
This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla" accpeted in Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: NAACL-2022.
This repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 16 - November 20, 2020.
Deep learning Bangla resources with TensorFlow
This repository contains the official release of the model "BanglaT5" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaNLG: Benchmarks and Resources for Evaluating Low-Resource Natural Language Generation in Bangla".
Bangla-Bert is a pretrained bert model for Bengali language
✍️ Bengali alphabet (বাংলা বর্ণমালা)
Awesome datasets for Bangla language computing.
Bangla Machine Translator
Nirmol is an open-source dataset and API for detecting Bangla slang words. Detect offensive/bad/slang words in Bangla/Bengali/Banglish sentences. A helpful API and dataset for developers and researchers.
Bangla news classification and generation
Automatic Context Sensitive Spelling Correction for Bangla Text Using Bert and Levenstein Distance
BNLTK(Bangla Natural Language Processing Toolkit): a python package for NLP in Bangla
Bangla word2vec using skipgram approach
This repository contains the code, data, and associated models of the paper titled "BanglaParaphrase: A High-Quality Bangla Paraphrase Dataset", accepted in Proceedings of the Asia-Pacific Chapter of the Association for Computational Linguistics: AACL 2022.
A collection of Bangla newspaper and blog crawlers. Can be used to mine bangla text data for Natural Language Processing tasks.
Different bangla datasets for sentiment analysis on bangla text
The default auto correct dictionary added in avro Bangla keyboard doesn't contain enough word. So, this is my approach to enrich the dictionary. This file contains the correct spelling of commonly used Bangla words.
Word-level language identification for Bangla-English code-mixed social media data, using a BiLSTM with subword embeddings.
Chatbot Solution for Resource-Poor Languages. Contains code and data for Journal Article 'Focused domain contextual AI chatbot framework for resource poor languages'.
A collection of colab trainer for NLP tasks.
Dataset for Bangla named entity recognition
Bengali News Summarization - BengaliGPT & T5
This module helps to analyze Bengali sentences. It can analyze various entities. Can do non contextual PoS tagging. Is capable of returning the lemmas present in a sentence.
This is the official repository of the paper titled "BnPC: A Gold Standard Corpus for Paraphrase Detection in Bangla, and its Evaluation", accepted in The 17th Workshop on Building and Using Comparable Corpora (BUCC 2024) co-located with LREC-COLING 2024. It contains the codes and the dataset.
With the rapid growth of Bangla music industry huge volume of Bangla songs are produced every day. Immense number of producers, lyricists, singers and artists are involved in production of songs from different genres. Among many genres of Bangla music; classical, folk, baul, modern music, Rabindra Sangeet, Nazrul Geeti, film music, rock music and fusion music has gained the highest popularity. Lyricists try to express their feelings and views towards any situation or subject through their writings. Therefore, each lyricist have their own dictionary of thoughts to put on music lyrics. In this paper, we have presented “BanglaMusicStylo”, the very first stylometric dataset of Bangla music lyrics. We have collected 2824 Bangla song lyrics of 211 lyricists in a digital form. All the lyrics are stored in text format for further use. This dataset could be used for stylometric analysis such as authorship attribution, linguistic forensics, gender identification from textual data, Bangla music genre classification, vandalism detection, emotion classification etc. Identifying the significant research opportunities in this area, we have formalized this dataset which could be used for stylometric analysis.
বাংলায় ন্যাচারাল ল্যাঙ্গুয়েজ প্রসেসিং এর উপর লেখা সিরিজের জন্য কোড রিপোজিটরি
The data and code of 'NERvous About My Health: Constructing a Bengali Medical Named Entity Recognition Dataset', published in the Findings of the Association for Computational Linguistics, EMNLP 2023.
Converts a Bangla numeric string to literal words.
Bangla NLP toolkit: Bangla text normalization, punctuation generation and augmentation for Bangla NLP tasks. This project is available on PyPi as well.
This is the official repository containing all codes used to generate the results reported in the paper titled "An Empirical Study on the Characteristics of Bias upon Context Length Variation for Bangla" accepted in Findings of the Association for Computational Linguistics: ACL 2024