Settings Guessr 🤔

Context

By default, every field is considered searchable. Ex: URLs, prices, dates.
By default, no field is considered filterable. Ex: users will not be able to sort by price.
Users raised issues related to indexing performance, and best practices with regards to how the index settings may impact the experience
Limited knowledge of the different index settings lead to user confusion, anger, and sometimes violence

Rules to detect common field values, such as URLs, UUIDs
Field name lookups to understand the usage of each field, such as id, datetimes, and so on
Entropy used to measure the amount of randomness of certain values over the whole dataset
- if we observe a small amount of different values in the entire dataset, the field is probably a "category" (color, genres, ...), and thus should be filterable
- if we observe a large amount of different values, the field is likely to represent unique content, thus should be searchable

Base library in Rust
- can be shipped on the web (fast client-side validation with WASM)
- can be embedded inside of Meilisearch in the future
- directly embedded in the Cloud 🦭

Users do not need to be aware of new settings and features enable via settings: we will offer them the best experience by default.

dataset	count	default	generated	Faster by
All cities	1M	40.02s	37.38s	6.6%
Hacker News	1M	184.51s	170.99s	7.33%
Movies	32k	4.12s	3.72s	9.71%
Magic The Gathering card game	74k	35.71s	19.91s	44.25%
Twitch messages	5M	248.58s	111.80s	55.03%
Music Brainz	100k	83.80s	35.86s	57.21%

MrBeast 🫶

A tool that guess your settings by using the dataset

Language:Rust 79.8%Language:CSS 11.3%Language:HTML 8.9%