Using std::isprint to determine may miss some strings
axhlzy opened this issue · comments
Line 479 in 807b257
It is proposed to amend
if (!std::isprint(static_cast<unsigned char>(c)) && c != '\n') {
This is just a suggestion that I happened to come across, just to show that there's a problem here, but it's up to you to add on those special symbols, or add a method to get strings in a broader sense.
![1715099758554](https://private-user-images.githubusercontent.com/20512058/328604325-339056fb-fa33-4566-b9cd-1240f2928c85.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4NjcxODMsIm5iZiI6MTcyMTg2Njg4MywicGF0aCI6Ii8yMDUxMjA1OC8zMjg2MDQzMjUtMzM5MDU2ZmItZmEzMy00NTY2LWI5Y2QtMTI0MGYyOTI4Yzg1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzI1VDAwMjEyM1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTgxYmYyNDNjMjk5OGQ2NTE4ODM5YzAyNDIxNjAzODcyZWI1NjJlM2U3MGE3ZTMyZTQxNzhlYmRkNzZkODA1YjAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.gwoC0TsbPNzXMeKnna4q6ies_fgLCcEjwEprMReQRJk)
There is also the problem of suggesting that strings should be given the start address of each string0.
auto binary = LIEF::ELF::Parser::parse(elf_file_path);
const auto &segments = binary->segments();
auto rodata_section = binary->get_section(".rodata");
if (!rodata_section) {
std::cerr << "No .rodata section found in the ELF file" << std::endl;
return 1;
}
const uint64_t rodata_start = rodata_section->virtual_address();
const uint64_t rodata_size = rodata_section->size();
auto rodata_data = rodata_section->content();
std::map<uint64_t, std::string> string_map;
for (uint64_t offset = 0; offset < rodata_size; ++offset) {
const uint8_t *current_ptr = rodata_data.begin() + offset;
if (*current_ptr != '\0') {
const char *string_start = reinterpret_cast<const char *>(current_ptr);
const char *string_end = string_start;
while (*string_end != '\0') {
++string_end;
}
if (string_end != string_start) {
std::string str(string_start, string_end - string_start);
string_map.emplace(rodata_start + offset, str);
}
offset += string_end - string_start;
}
}
std::map<uint64_t, std::string> string_map_temp;
for (const auto &entry : string_map) {
const std::string &str = entry.second;
bool is_string = true;
for (char c : str) {
// If the character is not a printable character and is not a newline character, it is judged as a non-string
if (!std::isprint(static_cast<unsigned char>(c)) && c != '\n') {
is_string = false;
break;
}
}
if (is_string) {
string_map_temp.emplace(entry.first, str);
}
}
string_map = string_map_temp
// ..........
// return this string_map
Hi @axhlzy
Thank you for taking time to raise your proposition. Actually I prefer to keep the current behavior (which does not include \n
) since it is closer to the strings
behavior.
Regarding the address resolution, it is interesting but I prefer to avoid introducing this logic in LIEF which is first and foremost focused on executable format.
Thanks!