Wordle Solver, Tutorial, Testing & Benchmarks

High level goals:

Build something! Brute force first
Create a language agnostic tutorial so others can build a solver
Showcase Rust tho, live code the solver on YouTube (https://www.youtube.com/@robertking)
Create some tests so others can test
Create some benchmarks so we can compare
Let people find optimal initial guesses that use words they know.
Optimise based on average expected number of guesses
Higher quality code than previous terminal game wordle project (https://github.com/robert-king/rust-wordle)

Tutorial Steps and Hints:

Prerequisite:

This isn't easy, so have some tenacity, feel free to look at src/simple.rs for hints to unblock you, or watch the youtube video of RusyRob coding it up first and then try to repeat.
You should be comfortable with recursion and some casework
Understand taking the average value from a set of possible outcomes.
Understand taking the minimum value of a set of possible options and use that as an outcome.

Steps: (see hints in the next section)

Thinks of a few approaches and write them down
Create a method called evaluate_guess that returns the average number of guesses required for a word. How does it calculate the average, what helper functions does it need to call.
Create fn is_valid(w: &str, guess: &str, ans: &str) -> bool, to simplify the logic which can be difficult, only consider words that don't have duplicate characters, later, filter your word-list to exclude words with duplicate characters.
Create evaluate_next_guess(words: Vec<&'static str>) -> f64
Test evaluate_guess() on a few examples, the avg score of a guess should be between ~2.0 and 5.0, depending on how many initial words you have in your input
Think of some ways to improve performance
Improve accuracy by allowing duplicate characters and by allowing guesses from a larger list of words
Optimise performance further using Flame Graph
when Optimising, create a struct called FastSolver that houses the cache and other structures used for optimisation, create a new file called fast.rs

Hints:

use src/words.rs and src/simple.rs as a guide, as well as the youtube video.

Go with a brute force approach (using recursion), we can optimise later with caching since there aren't many states due to limited colourings (3^5) of each pattern.
use the Signature evaluate_guess(guess: &str, words: &Vec<&'static str>) -> f64 and return the average number of guesses across all possible answers from words. If an answer is equal to the guess, it should contribute 1.0/words.len() towards the average, otherwise, use that answer to narrow down the list of candidate words, using a helper is_valid(word, guess, ans), and call another helper e.g. next_guess_avg = evaluate_next_guess(narrowed_words). Don't worry about the 5 guess limit, although you could return infinity if you reached a certain depth, we didn't bother.
for each of the 5 characters in guess, is it orange, green or black? In each of these three case, what does it tell us about if the current word is valid or not? N.B. in the orange case, it tells us at least two things :) (it depends on if you're allowing duplicate characters or not)
In this method, for each possible guess, we should see which guess is the best, and return that as the best average. We can call our previous method evaluate_guess().
You should start out with just a few words for testing and slowly increase the number of words to see where performance degrades. You can use ["cigar", "rebut", "blush", "focal", "trace"] to start with.
The easiest way is to memoize evaluate_next_guess using a hashmap, e.g. evaluate_next_guess(words: Vec<&'static str>, cache: &mut HashMap<Vec<&'static str>, f64>) -> f64 e.g. if let Some(&ans) = cache.get(&words) { return ans; }
See the source code for the two list of words (one is a list of valid answers, the other is more obscure words that can be used as guesses, however I advise you use words that are part of your own vocabulary as they're most useful for your game play)
Lets discus ways to optimise together, however, here are a few ideas I have: reduce allocations, e.g. don't allocate more memory when caching or when filtering the words. Prune out bad words early, e.g. run your algorithm on a small subset of words and remove words that did badly, they will never be good guesses. (this technique is similar to simulated annealing?)

Testing:

run cargo test use fn test_simple() to check your logic.
more test cases are welcome! :)

Benchmarking:

You must use this command to allow printing: cargo test -- --nocapture
could use some help here, but test_simple contains a benchmark for now using std::time::Instant;
It will print Elapsed for 80 words: 625.51ms
try "cargo install flamegraph" and "cargo flamegraph" (https://github.com/flamegraph-rs/flamegraph for help)

Pull Requests:

are most welcome :)

Flamegraph iterations:

flamegraph1.svg: no optimisations
flamegraph2.svg: reduce allocations by using a buffer pool and using u64 instead of vec as hash key for memoization.
flamegraph3.svg: precompute is_valid for all triples.
flamegraph4.svg: run flamegraph in dev mode
flamegraph5.svg: add rayon and anneal
flamegraph6.svg: make is_valid 5x faster (we could still make get_valid_cache() another 5x faster, e.g. by divide an concor to words that have 'a' and words that don't, and so on)
flamegraph7.svg: use rayon to make get_valid_cache() faster, tidy aneal code

Results:

If we disregard uncommon words (since you're unlikely to use those), and if we filter out words with duplicate characters (just to simplify the is_valid logic slightly), then we get the following:

runtime, 8 seconds: (3.5140485312899146, "tripe") (3.517241379310358, "trace") (3.5185185185185435, "crate") (3.5217113665389674, "slant") (3.526181353767567, "leant")

note that without filtering out uncommon words, and by allowing duplicate characters, it would take longer to run, the following is the result should be expected in that case: SALET 3.42117 REAST 3.42246 TRACE 3.42376 CRATE 3.42376 SLATE 3.42462 see https://auction-upload-files.s3.amazonaws.com/Wordle_Paper_Final.pdf for details

robert-king / rust-wordle-solver