20IP / Automatic_Ticket_classify

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Automatic Ticket Classify

Table of Contents

  1. General Info
  2. Pipline
  3. Prepare
  4. Technologies Used
  5. Results
  6. Acknowledgements
  7. Contact

General Information

Based on customer complaint data in the fields of finance, services... The classification of complaints with negative, positive, banking, debit card complaints is a necessary issue. classified automatically and requires high accuracy.

Based on the above requirements, we divide into 5 types of complaints as below.

  • Credit card / Prepaid card

  • Bank account services

  • Theft/Dispute reporting

  • Mortgages/loans

  • Others 

With the knowledge learned about NLP, apply to classify complaints according to specific steps and apply training algorithms. From there get the model with the best results.

Pipelines that needs to be performed:

The steps to build a specific model follow the steps below:

  1. Data loading

  2. Text preprocessing

  3. Exploratory data analysis (EDA)

  4. Feature extraction

  5. Topic modelling 

  6. Model building using supervised learning

  7. Model training and evaluation

  8. Model inference

Prepare the text for topic modeling

  • Make the text lowercase
  • Remove text in square brackets
  • Remove punctuation
  • Remove words containing numbers
  • Remove symbols are XX, digits...

Cleaning operations and perform the following:

  • Lemmatize the texts
  • Extract POS tags of lemma text and remove all words not in tag = ['JJ', 'NN', 'NNP', 'UH', 'VB', 'VBG', 'WDT' ]- denoted throughout as multiPOS
  • Extract the POS tags of the lemmatized text and remove all the words which have tags other than NN[tag == "NN"]- Is denoted throughout as uniquePOS
  • Dataset: Provided by the Upgrad educational institution. You can consult and download from this.

Technologies Used

  • In this project we use some of the following libraries, you can install it according to the version noted in the requirement.txt file.

Pandas version 1.4.2 Numpy version 1.22.3 spacy version 3.4.3 scikit-learn version 0.23.1 wordcloud version 1.8.2.2 seaborn version version 0.11.2 swifter version 1.3.4

Results

  • The performance results and accuracy of the models are listed in the table below

Acknowledgements

Contact

Created by :
Pham Van Thai: phamthai.ats@gmail.com
Feel free to contact us!

  • We do not have any constraints about License on the use of our results. You can use it for free.

About


Languages

Language:Jupyter Notebook 100.0%