To develop a plagiarism checker I’m following microservice architecture. Here I have 5 types of service in the backend.
- Data Scrapper service: To collect data from the different sources we need a service that is only responsible for web scraping and sending data for data classification.
- Data Classification Service: From here we will develop our incremental model to classify our data.
- Input Data classification Service: This service will classify user input and define their family.
- Main Server: responsible for accepting data from client/user.
- Plagiarism Service: Here we will implement multiple types of plagiarism checker algorithms and analyze their performance.
Here I've used three different approach for checking plagiarism .
- Cosine Similarity
- Jaccard Similarity
- Bert
For Bert, we've used Universal Sentence Encoder model which is a model that encodes text into 512-dimensional embeddings and tensorflow/tfjs-node for native TensorFlow execution in backend JavaScript applications under the Node.js runtime.