Document the encoding used when hashing strings
veigaribo opened this issue · comments
The README mentions
Per bcrypt implementation, only the first 72 bytes of a string are used. Any extra bytes are ignored when matching passwords. Note that this is not the first 72 characters. It is possible for a string to contain less than 72 characters, while taking up more than 72 bytes (e.g. a UTF-8 encoded string containing emojis).
But that is of little value because this library accepts JavaScript strings as parameters, e.g. for hash
, so how are those strings being represented as bytes? Does it use UTF-8, like in the example? My investigation and tests suggest so.
This is some very important information that I can't find written anywhere. The README even seems to dodge this question since it also mentions
Compatibility with hashes generated by other languages is not 100% guaranteed due to difference in character encodings. However, it should not be an issue for most cases.
Again keeping it vague.
These are the lines I think are the most relevant and confirm that UTF-8 is being used, in case it saves time for someone.
node.bcrypt.js/src/bcrypt_node.cc
Line 170 in 2a3c445
https://github.com/nodejs/node-addon-api/blob/7e1aa06132558fcc3de4ef5f4f6b84ff10c32502/napi-inl.h#L1105
Yes we use utf-8, you are welcome to add an entry to the README to make it explicit