Samsung / CredSweeper

CredSweeper is a tool to detect credentials in any directories or files. CredSweeper could help users to detect unwanted exposure of credentials (such as token, passwords, api keys etc.) in advance. By scanning lines, filtering, and using AI model as option, CredSweeper reports lines with possible credentials, where the line is, and expected type o

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Too much info from `WARNING | util | UnicodeError`

meanrin opened this issue · comments

Running scanner on repo with binary files can easily transform CLI out to

2021-12-07 16:17:11,870 | WARNING | util | UnicodeError: Can't read content from "/mnt/data/datasets/validate_repos/node/test/fixtures/wpt/encoding/legacy-mb-tchinese/big5/big5_errors.html" as utf16.
2021-12-07 16:17:11,903 | WARNING | util | UnicodeError: Can't read content from "/mnt/data/datasets/validate_repos/node/test/fixtures/wpt/encoding/legacy-mb-tchinese/big5/big5_chars-csbig5.html" as utf8.
2021-12-07 16:17:11,903 | WARNING | util | UnicodeError: Can't read content from "/mnt/data/datasets/validate_repos/node/test/fixtures/wpt/encoding/legacy-mb-tchinese/big5/big5_chars-csbig5.html" as utf16.
2021-12-07 16:17:11,918 | WARNING | util | UnicodeError: Can't read content from "/mnt/data/datasets/validate_repos/node/test/wasi/wasm/create_symlink.wasm" as utf8.
2021-12-07 16:17:11,930 | WARNING | util | UnicodeError: Can't read content from "/mnt/data/datasets/validate_repos/node/test/wasi/wasm/create_symlink.wasm" as utf16.
2021-12-07 16:17:11,960 | WARNING | util | UnicodeError: Can't read content from "/mnt/data/datasets/validate_repos/node/test/fixtures/wpt/encoding/resources/two-boms-utf-16be.html" as utf8.
2021-12-07 16:17:11,960 | WARNING | util | UnicodeError: Can't read content from "/mnt/data/datasets/validate_repos/node/test/fixtures/wpt/encoding/resources/two-boms-utf-16le.html" as utf8.
2021-12-07 16:17:11,964 | WARNING | util | UnicodeError: Can't read content from "/mnt/data/datasets/validate_repos/node/test/fixtures/wpt/encoding/resources/utf-32-big-endian-bom.html" as utf8.

This is not as useful, especially due to the fact that utf-16 files would always create this log message even thou they would be read correctly

Propose to change this log level to INFO or DEBUG

logging.warning(f"UnicodeError: Can't read content from \"{path}\" as {encoding}.")

so it would not be spammed during the scan with default (warning) log level

I agree to change the log level too.