tensorflow / data-validation

Library for exploring and validating machine learning data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support for statistics of discrete numerical data

BeHalcyon opened this issue · comments

Hi. The architecture of UniqueTFDV benefits me greatly. I have the following needs:

  • Statistics of unique values of discrete numerical types, while the existing functions only support calculating unique values of string type data.
  • Calculate the frequency distribution of discrete numerical data.
  • Calculate the replication rate of several feature values in a set of data.

Where should I modify the codes?

@BeHalcyon,

You can follow Custom Data Validation which guides on implementation of custom data validation.

Thank you!

@BeHalcyon,

You can follow Custom Data Validation which guides on implementation of custom data validation.

Thank you!

Sorry, the question is about how to extend statistics, not validation. As I understand, custom data validation is used for extending the monitoring indicators.

Hi - If you set is_categorical to true in the schema for a given numeric feature, and pass that schema when you generate statistics with TFDV, TFDV will calculate top-k and unique stats for that feature.

Closing this due to inactivity. Please take a look into the answers provided above, feel free to reopen and post your comments(if you still have queries on this). Thank you!