A retreival based question & answering bot trained on IPL wikipedia pages. Built using Streamlit and deployed using Heroku - https://ipl-qa-bot.herokuapp.com/.
- Streamlit - For creating the web app
- Scikit Learn - For training a Tfidf vectorizer
- BeautifulSoup, Request - For extracting and parsing data
Clone the repository
git clone https://github.com/adimyth/iplbot.git
Install the requirements
pip install -r requirements.txt
Run the app
cd iplbot
streamlit run app.py
-
Run extractor.py to extract text from the following list of wikipedia pages
- Indian Premier League
- Category:Indian Premier League coaches
- Mumbai Indians
- Chennai Super Kings
- Kolkata Knight Riders
- Rajasthan Royals
- Sunrisers Hyderabad
- Kings XI Punjab
- Delhi Capitals This will create a file called ipl.txt which will contain the text from all the above links
-
Given an input sentence,
generate_response
function in bot.py does the following- Lowercase the entire string
- Removing punctuation marks
- Word tokenization
- Lemmatization
- Train a tfidf vectorizer on the sentences generated in step 1 as well as on the input sentence
- Uses cosine similarity to find the two closest vectors
- Sorts the vector similarity in decreasing order & chooses the first vector
- Gets the corresponding sentence & capitalizes it
Please ⭐ the repo and share it