LoRA-SpamFilter

A project aimed at filtering spam messages in SMS with miniature language models and part-of-speech tagging techniques. Work with 'gemma:7b' or 'qwen:14b'.

Note: In the Chinese usage scenario, you need to use the ‘qwen:14b’ model, and in the English scenario, you need to use ‘gemma:7b’

Consequent

🕐 Under the conditions of more than ten text messages and repeated 5 times, a single message will be responded to in seconds.
🚦 Achieved 100% accuracy for For English.
😊 The Chinese recognition logic is still being fine-tuned, so stay tuned.

install

Install 'ollama' on its official page and pip:

pip install ollama

Download Gemma/QWen Full version with ollama:

ollama pull gemma:7b
ollama pull qwen:14b

Clone this repo:

git clone https://github.com/Gloridust/LoRA-SpamFilter.git

custom model:

ollama create gemma-7b-spam -f ./modelfile_en
ollama create qwen-14b-spam -f ./modelfile_cn

Edit code: If you'd like to use with Chinese, you have to edit the code in 'spam_detector.py':

# modelname = 'gemma-7b-spam'
  modelname = 'qwen-14b-spam'

Run and try it:

cd ./LoRA-SpamFilter
python start.py

Then you can input SMS to test it.

Test

Run 'python ./test.py' can test and count the accuracy of results. The program will run the input SMS multiple times and count the results and determine the reasons for each time.

API

I believe that 'start.py' is the best demo to show how to use 'spam_detector.py'. Just:

from spam_detector import detect_spam
is_spam,reason = detect_spam(sms_content)

'detect_spam' will return two things:

is_spam: If it is spam?
reason:Why?

Usually, in scenarios other than debugging, we only use the first parameter 'is_spam'. So you can use it like this:

is_spam, _ = detect_spam(sms_content)

Enjoy it!

About

A project aimed at filtering spam messages in SMS with miniature language models and part-of-speech tagging techniques. 利用微型语言模型和词性标记技术过滤短信中的垃圾邮件。

Mozilla Public License 2.0

Languages

Language:Python 100.0%