lief-project / LIEF

LIEF - Library to Instrument Executable Formats

Home Page:https://lief-project.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Using std::isprint to determine may miss some strings

axhlzy opened this issue · comments

if (std::isprint(c) == 0) {

It is proposed to amend

if (!std::isprint(static_cast<unsigned char>(c)) && c != '\n') {

This is just a suggestion that I happened to come across, just to show that there's a problem here, but it's up to you to add on those special symbols, or add a method to get strings in a broader sense.

1715099758554

There is also the problem of suggesting that strings should be given the start address of each string0.

        auto binary = LIEF::ELF::Parser::parse(elf_file_path);

        const auto &segments = binary->segments();

        auto rodata_section = binary->get_section(".rodata");
        if (!rodata_section) {
            std::cerr << "No .rodata section found in the ELF file" << std::endl;
            return 1;
        }

        const uint64_t rodata_start = rodata_section->virtual_address();
        const uint64_t rodata_size = rodata_section->size();
        auto rodata_data = rodata_section->content();

        std::map<uint64_t, std::string> string_map;

        for (uint64_t offset = 0; offset < rodata_size; ++offset) {
            const uint8_t *current_ptr = rodata_data.begin() + offset;

            if (*current_ptr != '\0') {
                const char *string_start = reinterpret_cast<const char *>(current_ptr);
                const char *string_end = string_start;
                while (*string_end != '\0') {
                    ++string_end;
                }

                if (string_end != string_start) {
                    std::string str(string_start, string_end - string_start);
                    string_map.emplace(rodata_start + offset, str);
                }

                offset += string_end - string_start;
            }
        }

        std::map<uint64_t, std::string> string_map_temp;
        for (const auto &entry : string_map) {
            const std::string &str = entry.second;
            bool is_string = true;
            for (char c : str) {
                // If the character is not a printable character and is not a newline character, it is judged as a non-string
                if (!std::isprint(static_cast<unsigned char>(c)) && c != '\n') {
                    is_string = false;
                    break;
                }
            }
            if (is_string) {
                string_map_temp.emplace(entry.first, str);
            }
        }

        string_map = string_map_temp
        
        // ..........
        // return this string_map

Hi @axhlzy
Thank you for taking time to raise your proposition. Actually I prefer to keep the current behavior (which does not include \n) since it is closer to the strings behavior.

Regarding the address resolution, it is interesting but I prefer to avoid introducing this logic in LIEF which is first and foremost focused on executable format.

Thanks!