tensorflow / data-validation

What is taken as input here to find out Jensen shannon Divergence.
Is it Probabilities for the pandas column(numerical) or probability density function of the column?

Like in this code--

tfdv.get_feature(schema1, 'duration').drift_comparator.jensen_shannon_divergence.threshold = 0.01

The duration column here is first converted into what? Before feeding to find out the JS divergence value

@Alpha009 , thanks for bringing this up.
I feel like we need the pdf of the 'duration' column before feeding out the JS divergence value.
Let me forward this to @caveness.

Sorry for the delay on this. We use the standard histogram and calculate the JSD as shown here:

data-validation/tensorflow_data_validation/anomalies/metrics.cc

Line 266 in 9fbc050

// JSD(P||Q) = (D(P||M) + D(Q||M))/2

Please feel free to reopen if more information is needed.

Jensen Shannon implementation