Jensen Shannon implementation
Alpha009 opened this issue · comments
What is taken as input here to find out Jensen shannon Divergence.
Is it Probabilities for the pandas column(numerical) or probability density function of the column?
Like in this code--
tfdv.get_feature(schema1, 'duration').drift_comparator.jensen_shannon_divergence.threshold = 0.01
The duration column here is first converted into what? Before feeding to find out the JS divergence value
Sorry for the delay on this. We use the standard histogram and calculate the JSD as shown here:
Please feel free to reopen if more information is needed.