Description: TerminalChat is a chatbot application using natural language process (NLP) and a classification model to predict a question and generate an approriate answer.
Dataframes Summary:
-
Question
- ID - unique identity of each question
- Question - An original text of a question
- Summary - A tag to classify a question
-
Answer
- ID - unique indeity of each answer (corresponding to a particular question)
- Answer - An original text of a question
Dependencies: Pandas, Numpy, sklearn, nltk, re.
How does the process of finding an answer from a question work?:
- Add user input to a question model
- Predict tag based on input
- Filter question dataframe based on predicted tag
- Find the best match by using Cosing Similiarity algorithms
- Find the answer based on best-match index and a predicted tag
How is data trained and fitted into the classification model?
- Clean and process text
- Convert all corpus to count vector (corpus is a list of question after processing) as X-axis
- Labeling each row as y-axis
- Split data to train and test set
- Fit train data to the given classifier
How does the model predict random inputs?
- Check whether an input is valid to predict or not.
- IF invalid, it returns an approriate statement
- IF valid, it moves to step 2
- Combine an existing corpus and user input into a "local" corpus
- Convert all corpust o count vector as X_pred
- Using a trained model to predict a "local" corpus and take the last one as y_pred
- Based on y_pred, generating an appropriate tag from a vocabulary dictionary of tags
- Clean user input