berzerk0 / Probable-Wordlists

Version 2 is live! Wordlists sorted by probability originally created for password generation and testing - make sure your passwords aren't popular!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Suggestion: Human passwords only

Ho52198 opened this issue · comments

Hello again :)
I think there is a way to generate human-generated wordlists only, but I am sure will be tricky :).

What I mean - we already have plenty of human words + names + city names, etc.
We already know how people put 4 instead of A, 1, instead of i, etc.
if you search (no case sensitive + number replacement option) for all human words in the current biggest file and extract all matches you will find (still ordered by probability) all passwords, that for sure are NOT generated by random password generator.

I am sure there are people who can provide good analysis what is word in general - there are specific patterns, that can be found only in human words, no matter of the language. This way all kind of slang, jargon, and street offensive words can be included, and for some funny reason, they are HUGE % of all passwords :)

I believe this new list will be far more probable, especially for WPA.

In addition:
The dictionary file would be with words 3+ or 4+ long words only, because 2-3 strings in too common combination and there will be too many false positive results.

You can safely remove words that contain each other from the dictionary. Like apple and pineapple. You need only the shortest possible word - in this case - apple. When you filter by apple you will find all passwords containing pineapple anyway, but if you look for pineapple too, this will slow you down and generate duplicated.

These way the dictionary file will be reduced dramatically by size and speed-up the process. I can do something like this, but grep is not the most appropriate command when you operate with billions :). No idea how to do it time-effective :)

commented

@Ho52198 These are interesting ideas, but they would be a completely different project.

Are you talking about generating a big list of words that belong to spoken languages?
If so, then I have something just like that coming up in Rev 2, an International Dictionary. This will contain words from latin-based alphabets that are likely to be seeds for passwords. That being said, they are not considered passwords themselves - per se. They might be useful for guessing at passwords, but the lines created could not be used as source material for the Probable-Wordlists unless they were confirmed to be used in some sort of leak.

We can make predictions on what is most probably a password, and they might even be pretty good. However, the goal of my project isn't to make a list of what are "probably" passwords. This project uses passwords found repeatedly in leaks, if they fit my ideas of what a password looks like or not. Again, creating password candidates based on patterns is a good way of guessing at passwords, but these guesses would not be added to the Probable-Wordlist unless they were found in a leak.

It sounds like you are interested in generating guesses at passwords. The files in this project may provide a good source for that. I recommend Netmux's Hash Crack for learning more about this.

In addition, check out my other project, Bull's Eye Wordlist Generator - BEWGor to generate password seeds.

Basically my idea is to extract from the current actual password list only the passwords that contain real words. Sorry if it was not clear.

Generating passwords is different story :)

Most of the wpa passwords is something memorable - word or name with some numbers and symbols around them. Usually people pick something easy to remember after all :). This will exclude all random generated passwords that are usually used for other types of accounts but not for wifi

commented

@Ho52198, what you are theorizing sounds reasonable. However, it is an assumption that I cannot support with any data. The leaks used in generation of these files are mostly the result of database dumps, which are totally different beasts than Wifi passwords. Wifi passwords are meant to be shared, where most accounts are not.

I can't say with any certainty that account passwords trends are inherently different than Wifi password trends. But, I can say that it is a lot easier to find dumps of databases containing thousands of passwords than it is to find aggregated lists that have to be generated one WLAN at a time.

I'll be including hashcat rules and masks in Rev 2 which can be used to make password candidates like you are describing.