man3kin3ko/SimpleAntifraud

adversarial-machine-learning antifraud speech-recognition machine-learning security

Simple Antifraud

This is a simplified voice antifraud system created as part of bachelor's thesis at Moscow Polytechnic University. The system is based on a pre-trained DeepSpeech model, Naive Bayes classifier and TF-IDF vectorizer.

Project was done to illustrate the impact of performing adversarial attacks on this type of systems so it should not be used in production. Even if you think that DeepSpeech is protected enough, the classifier is vulnerable to the Bayesian poisoning itself.

This is some kind of Damn-Vulnerable Service so you can get a flag if you will properly abuse it.

Project structure

checkpoints contains .ckpt files of pretrained DeepSpeech models. Pretrained models can be found here.
training includes notebook with data preparation and fitting for NB Classifier and vectorizer.
pickles folder are used to store them.
example.ipynb can be used as a quick-start guide.

Installation

Install deepspeech.pytorch:

git clone https://github.com/SeanNaren/deepspeech.pytorch
cd deepspeech.pytorch
pip install -r requirements.txt
pip install -e .

Clone this repository and run within it to install remaining dependencies:

pip install -r requirements.txt

Mitigations

The robustness of original LibriSpeech model can be increased using adversarial retraining with gaussian data augmentation. The example models can be found in Releases. You can also try to use another controls, described here.

To retrain a model with a new data original trainig script can be used. Simply replace

model = DeepSpeech(
        labels=labels,
        model_cfg=cfg.model,
        optim_cfg=cfg.optim,
        precision=cfg.trainer.precision,
        spect_cfg=cfg.data.spect
    )

with

    model = DeepSpeech.load_from_checkpoint(
        cfg.checkpoint.filepath,
        freeze=True,
        learning_rate=0.0001
    )

so you can retrain it like python3 train.py checkpoint.filepath=/path/to/file.ckpt.

About

Voice antifraud system that is vulnerable to adversarial attacks. Bachelor's thesis

adversarial-machine-learning antifraud speech-recognition machine-learning security

Languages

Language:Jupyter Notebook 79.3%Language:Python 20.3%Language:Shell 0.5%