adimyth / iplbot

A retrieval based chat bot - BotVic. Engage with BotVic about IPL and have fun

https://ipl-qa-bot.herokuapp.com/

heroku nlp nltk-python question-answering tf-idf

IPL BOT

A retreival based question & answering bot trained on IPL wikipedia pages. Built using Streamlit and deployed using Heroku - https://ipl-qa-bot.herokuapp.com/.

Libraries Used

Streamlit - For creating the web app
Scikit Learn - For training a Tfidf vectorizer
BeautifulSoup, Request - For extracting and parsing data

How to run the app

Clone the repository

git clone https://github.com/adimyth/iplbot.git

Install the requirements

pip install -r requirements.txt

Run the app

cd iplbot
streamlit run app.py

How it works

Run extractor.py to extract text from the following list of wikipedia pages
- Indian Premier League
- Category:Indian Premier League coaches
- Mumbai Indians
- Chennai Super Kings
- Kolkata Knight Riders
- Rajasthan Royals
- Sunrisers Hyderabad
- Kings XI Punjab
- Delhi Capitals This will create a file called ipl.txt which will contain the text from all the above links
Given an input sentence, generate_response function in bot.py does the following
- Lowercase the entire string
- Removing punctuation marks
- Word tokenization
- Lemmatization
- Train a tfidf vectorizer on the sentences generated in step 1 as well as on the input sentence
- Uses cosine similarity to find the two closest vectors
- Sorts the vector similarity in decreasing order & chooses the first vector
- Gets the corresponding sentence & capitalizes it

Please ⭐ the repo and share it

About

A retrieval based chat bot - BotVic. Engage with BotVic about IPL and have fun

https://ipl-qa-bot.herokuapp.com/

heroku nlp nltk-python question-answering tf-idf

Languages

Language:TeX 73.6%Language:Python 12.3%Language:Shell 11.4%Language:CSS 2.7%