Yelp / detect-secrets

Hello! I was trying out the DiscordBotTokenDetector and noticed it wasn't flagging some tokens (i.e. producing false negatives).

Examples:

OTUyNED5MDk2MTMxNzc2MkEz.YjESug.UNf-1GhsIG8zWT409q2C7Bh_zWQ
MTAyOTQ4MTN5OTU5MTDwMEcxNg.GSwJyi.sbaw8msOR3Wi6vPUzeIWy_P0vJbB0UuRVjH8l8

(These tokens are slightly fudged and I've also invalidated the unfudged versions, so there's no danger in sharing them here. 😉)

Expected Behavior:

These Discord bot tokens are flagged as such by detect-secrets.

Actual Behavior:

detect-secrets does not recognize them as Discord bot tokens.

Root Cause:

detect-secrets/detect_secrets/plugins/discord.py

Lines 14 to 16 in 0dcd54c

    
           # Discord Bot Token ([M|N]XXXXXXXXXXXXXXXXXXXXXXX.XXXXXX.XXXXXXXXXXXXXXXXXXXXXXXXXXX) 
        
           # Reference: https://discord.com/developers/docs/reference#authentication 
        
           re.compile(r'[MN][a-zA-Z\d_-]{23}\.[a-zA-Z\d_-]{6}\.[a-zA-Z\d_-]{27}'),

This regex is a bit too restrictive. Specifically, it only recognizes M or N as a valid first character, and it limits the following substring to 23 characters. From what I've observed recently, Discord bot tokens can sometimes begin with O, and can have a substring of up to length 25 following the first character.

Proposed Solution:

At the bare minimum, I would suggest widening the regex as follows, to handle the specific false negatives mentioned above:

re.compile(r'[MNO][a-zA-Z\d_-]{23,25}\.[a-zA-Z\d_-]{6}\.[a-zA-Z\d_-]{27}')

(Note: The segment of the token matched by [a-zA-Z\d_-]{27} may actually be longer than 27 characters, as is the case in the second example token above. It isn't strictly necessary to account for this in the regex, since it's sufficient to match a substring of that segment.)

Additional test cases should also be included to properly capture the correct behavior.

Optional Enhancements:

Improving Readability: Add the re.ASCII flag to the regex and replace each instance of [a-zA-Z\d_-] with [\w-].
Future-Proofing: Remove the [MNO] restriction to prevent more FNs if/when Discord "runs out" of tokens that begin with O.
Fuzz Testing: Generate random token values each test run, to simulate "user inputs" and ensure they're properly detected.

Here are some examples from my own Discord bot token library to illustrate what the regex and token randomizer could look like.

Action Items

Create test cases to expose current incorrect behavior
Open PR containing new test cases + "bare minimum" fix: #628
Ensure all CI checks are green so that the PR can be merged (Note: Approval to run workflows was granted after 22 days.)
~~Determine if "optional enhancements" are appropriate for this project, and open a new PR if so~~ (Update: Probably not worthwhile, as this project doesn't seem to be actively maintained. Clarification from maintainer(s) is welcome.)

Hi @nuztalgia,

I am the first contributor of this plugin... Nice catch! When I decided to propose this changes I check the official documentation and some personal tokens, and in both of cases the regex matches. In many cases, is better be more restrictive and detect some of them than not have the plugin 😆

Reading your issue description, it makes sense, so thanks for improve it!

Regards!

@syn-4ck - Thanks so much for the nice comment and for contributing the plugin! I absolutely agree with you ❤️ What's interesting is that your original regex worked perfectly for all of the older bot tokens I had saved. The specific "false negative" examples in the issue description are based on tokens that were generated in the past month or so (if I'm remembering correctly). So it could very well have been a recent change on Discord's end that made this change necessary 😄

	# Discord Bot Token ([M\|N]XXXXXXXXXXXXXXXXXXXXXXX.XXXXXX.XXXXXXXXXXXXXXXXXXXXXXXXXXX)
	# Reference: https://discord.com/developers/docs/reference#authentication
	re.compile(r'[MN][a-zA-Z\d_-]{23}\.[a-zA-Z\d_-]{6}\.[a-zA-Z\d_-]{27}'),

DiscordBotTokenDetector failing to detect some Discord bot tokens