A project aimed at filtering spam messages in SMS with miniature language models and part-of-speech tagging techniques. Work with 'gemma:7b' or 'qwen:14b'.
Note: In the Chinese usage scenario, you need to use the ‘qwen:14b’ model, and in the English scenario, you need to use ‘gemma:7b’
🕐 Under the conditions of more than ten text messages and repeated 5 times, a single message will be responded to in seconds.
🚦 Achieved 100% accuracy for For English.
😊 The Chinese recognition logic is still being fine-tuned, so stay tuned.
- Install 'ollama' on its official page and pip:
pip install ollama
- Download Gemma/QWen Full version with ollama:
ollama pull gemma:7b
ollama pull qwen:14b
- Clone this repo:
git clone https://github.com/Gloridust/LoRA-SpamFilter.git
- custom model:
ollama create gemma-7b-spam -f ./modelfile_en
ollama create qwen-14b-spam -f ./modelfile_cn
- Edit code: If you'd like to use with Chinese, you have to edit the code in 'spam_detector.py':
# modelname = 'gemma-7b-spam'
modelname = 'qwen-14b-spam'
- Run and try it:
cd ./LoRA-SpamFilter
python start.py
Then you can input SMS to test it.
Run 'python ./test.py' can test and count the accuracy of results. The program will run the input SMS multiple times and count the results and determine the reasons for each time.
I believe that 'start.py' is the best demo to show how to use 'spam_detector.py'. Just:
from spam_detector import detect_spam
is_spam,reason = detect_spam(sms_content)
'detect_spam' will return two things:
- is_spam: If it is spam?
- reason:Why?
Usually, in scenarios other than debugging, we only use the first parameter 'is_spam'. So you can use it like this:
is_spam, _ = detect_spam(sms_content)
Enjoy it!