Support for statistics of discrete numerical data
BeHalcyon opened this issue · comments
Hi. The architecture of UniqueTFDV benefits me greatly. I have the following needs:
- Statistics of unique values of discrete numerical types, while the existing functions only support calculating unique values of string type data.
- Calculate the frequency distribution of discrete numerical data.
- Calculate the replication rate of several feature values in a set of data.
Where should I modify the codes?
You can follow Custom Data Validation which guides on implementation of custom data validation.
Thank you!
You can follow Custom Data Validation which guides on implementation of custom data validation.
Thank you!
Sorry, the question is about how to extend statistics, not validation. As I understand, custom data validation is used for extending the monitoring indicators.
Hi - If you set is_categorical to true in the schema for a given numeric feature, and pass that schema when you generate statistics with TFDV, TFDV will calculate top-k and unique stats for that feature.
Closing this due to inactivity. Please take a look into the answers provided above, feel free to reopen and post your comments(if you still have queries on this). Thank you!