anonymization binary-classification distilbert-model doxing named-entity-recognition natural-language-processing nudging twitter

Prevention and Anonymization of Dox Content on Twitter

This work was done as part of the course IST-597 Fundamentals of Privacy at Penn State.

This repository contains the python code, trained model and the demo of the tool I developed.

Demo 1 - Classification of a Tweet to identify doxing
Demo 2 - Nudging the user to not post the Tweet that contains sensitive content
Demo 3 - Anonymizing the Tweet content if the user ignores the nudge.

Summary:

Trained a Machine Learning model for automatic detection of doxed data on Twitter.
Developed a nudge-based technology to alert the doxer that the tweet contains private information.
Created a prototype that add noise to tweets with doxed information in real-time by implementation of hashing algorithm.

Part I: Detection of doxed content

Using Twitter's streaming API, we collected 2000 tweets. We input the following keywords to the search API: IP address, SSN, Social Security Number, and SSA and manually annotated the dataset to look for doxing. We exclude the tweets with invalid (e.g., 8.780.255.255) or local/public IP addresses (e.g., 127.0.0.1, 192.168.x.x, 8.8.8.8). We model our problem as a binary classification task, with two classes being doxed tweet and non-doxed/benign tweets. We used huggingface's implementaiton of DistilBert for Sequence Classification for this task.

Table 1: The results of binary classification task generated using DistilBERT based embeddings.

Part II: Real-time Nudging

We present the prototype of this nudging mechanism in below figures. Figure 1 is the initial prompt from the system, emulating a Twitter platform and containing the text box to curate the tweet. If a tweet is classified as non-doxing, the sequence is terminated. However, if our machine learning model identifies it as a doxed tweet, the author is nudged. Figure 2 displays the nudging message and the text box where the author inputs his/her choice to proceed. Figure 3 shows if the response is N.

Figure 1: The assumed landing page of Twitter.

Figure 2: The nudge shared with the user to reconsider the content of the tweet

Figure 3: A response from the system if the user discards the draft.

Part III: Data Anonymization

We anonymize the tweet if the author decides to continue to post it. We leverage regular expressions and spacy's Named Entity Recognition (NER) API. We generate two regexes to extract the IP address and the URLs (We observed that in some cases, the sensitive information is not explicitly written in the tweet but could be accessible through a URL. Therefore, we add noise to URLs as well). We also observed that along with IP address and SSN, tweets contain other types of personal information too (e.g., Full Name, Location Coordinates). Although our model is not trained to detect such PIIs, we attempt to anonymize this sensitive information by a pre-trained NER model provided by Spacy to extract entities. We mask all the entities identified by the NER.

Figure 4: Anonymized response from the system if user discards the nudge.

Note:

Due to privacy concerns, I couldn't share the dataset. The dataset contains user's SSN and IP address information.
I acknowledge that the inter-annotator reliability of our annotation process is difficult to establish, and in future work, every tweet should receive at least three annotations.

Acknowledgement: I thank Younes Karimi, a PhD candidate in the Information Systems and Technology Department of Penn State, for his inputs.

About

This repository contains my work on the prevention and anonymization of dox content on Twitter. It contains python code and demo of the proposed solution.

anonymization binary-classification distilbert-model doxing named-entity-recognition natural-language-processing nudging twitter

MIT License

Languages

Language:Jupyter Notebook 100.0%