stephenhaunts / ProfanityDetector

This is a simple library for detecting profanities within a text string.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

If a profanity has a 1 on the end it isn't detected

DavidJBerman opened this issue · comments

var mf = samueljacksonsfavoriteword + "1"; // "mtherf**er1"
bool isProfanity = ProfanityFilter.IsProfanity(); // isProfanity is true
var censored = ProfanityFilter.CensorString(mf);
bool areSame = (censored == mf); // Returns true.

Censor does not censor m**********r out of the string even though ProfanityFilter recognizes the word as a profanity.

Hi, thanks for this. I will try and get this resolved this week.

This is a tricky one. The logic it falls into is the mitigation for the Scunthorpe problem which is a common problem for profanity detectors. The word scunthorpe (a town in the UK) is not a profanity, but because it contains the word c*nt it normally gets flagged. One common way of resolving this is to use a whitelist and whitelist scunthorpe, but that isn't a great solution as you then have to try and whitelist all cases like this.

My library tries to be a bit more intelligent about it. It will detect the word cnt, then it will look at the surrounding letters that contain the word cnt and then check if that whole word is profane. In this example, the surrounding word is scunthorpe which is not rude. The town of penistown has the same problem.

In your example of motherfcker1, it is the same issue. The logic detects the motherfcker, and then looks for the surrounding word which is motherf*cker1, and that word is not profane in-terms of the profanity list.

In the spirit of the solution, this is "as designed" behaviour, but I also see that from your point of view we have a rogue motherf*cker running around, which isn't great.

This one needs a little more thought to solve it in a way that doesn't completely stink.

I have checked in a fix and update the NuGet package to version 0.1.4

This was an odd one to fix as technically the side effect you were seeing was as designed, but your use case was also valid. So a bug that's not a bug that needs to be treated as a bug.

I have made the fix user selectable for the moment as I am still trying to decide how much it smells.

I have added a new overload to CensorString that takes a bool to ignore numbers in the string for the moment as demonstrated in the following unit test.

    [TestMethod]
    public void CensoredStringReturnsCensoredStringMotherfucker()
    {
        var filter = new ProfanityFilter();

        var censored = filter.CensorString("You are a motherfucker1", '*', true);
        var result = "You are a *************";

        Assert.AreEqual(censored, result);
    }

I have tested lots of edge cases around it too such as:

    [TestMethod]
    public void CensoredStringReturnsCensoredStringMotherfucker11()
    {
        var filter = new ProfanityFilter();

        var censored = filter.CensorString("You are a motherfucker1 and a 'fucking twat3'.", '*', true);
        var result = "You are a ************* and a '******* *****'.";

        Assert.AreEqual(censored, result);
    }

I class this as a temporary fix at the moment until I decide the best thing to do with it. What are your thoughts?

Thanks

Steve

When I say it ignores numbers, it is a little smarter than that. It ignores numbers that are joined to another word, so you can still have numbers in a sentence, as illustrated in the following test.

    [TestMethod]
    public void CensoredStringReturnsCensoredStringMotherfucker12()
    {
        var filter = new ProfanityFilter();

        var censored = filter.CensorString("I've had 10 beers, and you are a motherfucker1 and a 'fucking twat3'.", '*', true);
        var result = "I've had 10 beers, and you are a ************* and a '******* *****'.";

        Assert.AreEqual(censored, result);
    }