eltonlaw / impyute

Data imputations library to preprocess datasets with missing data

Home Page:http://impyute.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

There is a bug in EM code

ahmedhshahin opened this issue · comments

Hello,

Thanks for the repo.
I believe that there is a bug in the current implementation of EM, this line in particular.

It calculates the relative difference between the previous and current predictions. However, it should calculate the absolute difference, ie
delta = np.abs(col[x_i]-previous)/previous

The current implementation will just break if the current is less than the previous value, as this will result in a negative value which is clearly less than 10%.

For example, if:
col[x_i] = 1
previous = 1000
delta = -0.999 # delta < 0.1 will return True hence it will assume convergence despite the huge difference

Best regards