forrestthewoods / lib_fts

single-file public domain libraries

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

fts_fuzzy_match: fts_fuzzy_match.h (C++): Ensure proper match arrays initialization

octavian-nita opened this issue · comments

After compiling the C++ matcher using gcc (which comes with the latest version of Code::Blocks) and running the resulting program, I noticed that the final matches array contains (at some positions, different from the match locations) random uint8_t values.

I also noticed that some match arrays don't seem to be properly initialized: uint8_t matches[256]; , uint8_t bestRecursiveMatches[256]; and uint8_t recursiveMatches[256];. Now, it's been a long while since I last touched C/C++ but I think the idea is that auto/static arrays get allocated but not automatically initialized to 0 or whatever (as opposed to class member arrays). I may, of course, be mistaken... :)

The code:

#include <iostream>
#include <iomanip>
#include <cstdlib>

using namespace std;

#define FTS_FUZZY_MATCH_IMPLEMENTATION
#include "fts_fuzzy_match.h"

using namespace fts;

void match(const char *pattern, const char *s)
{
    int score, max_matches = 64;
    uint8_t *matches = (uint8_t *) calloc(max_matches, sizeof(uint8_t));

    bool matched = fuzzy_match(pattern, s, score, matches, max_matches);
    cout << boolalpha << "Matches: " << matched << "; score: " << score << endl;
    for (int i = 0; i < max_matches; i++) {
        if (i % 15 == 0) {
            cout << endl;
        }
        cout << setw(5) << right << unsigned(matches[i]);
    }

    cout << endl;
}

int main()
{
    char const *s1 = "MockAI.h";
    char const *s2 = "MacroCallback.cpp";
    char const *s3 = "MockGameplayTasks.h";
    char const *s4 = "MovieSceneColorTrack.cpp";

    char const *pattern = "Mock";

    match(pattern, s1);
    match(pattern, s2);
    match(pattern, s3);
    match(pattern, s4);

    return 0;
}

I'm running into similar thing. Normally I get 0 for the remaining non-matches, but in another case I get 204s.

One way to address it would be to have an num_matches out parameter that let the API user know how far to read in the resulting matches set. (Or simply return matches as a vector, allowing for results without a max match set.)

The number of matches is equal to the length of the pattern if fuzzy_match() returns true. Otherwise the number of matches stored is lower than the length of the pattern and the match is considered invalid anyway IMHO.