eltonlaw / impyute

Data imputations library to preprocess datasets with missing data

Home Page:http://impyute.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multivariate Imputation by Chained Equations is going to return only mean value of the Column.

PavanTejaDokku opened this issue · comments

Issue:
I
n the module "impyute.mice" we were expecting to return the imputed value for each column based on the linear equation converged for the column. In contrast we are getting only the mean values of the column.

Reason:
Here we have a logic of entering into loop only if we satisfy below condition, however we have a glitch in the condition which is making us to skip it and return the mean values imputed data set straight away.

Condition Failing:
converged = [False] * len(null_xyv)
while all(converged):
.....................
....................

Resolution:

converged = [False] * len(null_xyv)
while not(all(converged)):
.....................
....................

woah thanks for catching that, do you want to send in a PR?

sure, will let u know when I am ready

PavanTejaDokku#1
Raised pull request, Let me know if u need more details.

Fantastic, thanks for doing that. It seems you uncovered some other issues with the implementation as well. Nice find on the zero division and the absolute value of the delta. The PDF explaining the changes which was very helpful. Before I merge this though, there are a few changes I would like you to make:

  1. Remove the first 6 lines of the commit with the @author stuff, I'd prefer not to have that in the source code. Please add yourself to the AUTHORS.rst instead. This, as is everything is, is up for discussion, I don't know... what are your thoughts. We can also have credits in the README or something like that.
  2. With the docstrings please follow the numpydoc specification. The documentation for readthedocs is generated from the docstrings and it looks for numpydoc format.
  3. In general the use of globals() is frowned upon, it pollutes the global namespace and can cause unexpected issues down the line. For more info, check out this [stack overflow discussion] (https://stackoverflow.com/questions/19158339/why-are-global-variables-evil). But why are you
  4. At line 76, what was the rationale behind swapping out the for i, x_i, y_i, value in enumerate(null_xyv): with for i, z in enumerate(null_xyv): and manually unwrapping it. They should be equivalent right? And commenting out the #temp.append([x_i, y_i, new_value]) would just leave the temp variable as an empty list making null_xyv an empty list at the end of the first inner loop null_xyv = temp
  5. Lines 92-93: The else: converged = False lines are unneccesary I think.
  6. Could you update the unit tests for the function with the values you have in the PDF? To run the unit tests you will need to run make test in the console and have Docker installed. I can help guide you through the process if you can't figure it out.

By the way, the pull request is only on your fork at the moment. I can't merge it unless you send it back to this repo. Check out Part 2 of the accepted answer of this stack overflow discussion on how to send your pull request back to the original repo. Also there's a CONTRIBUTING guide here

Thanks again for all your work, I really appreciate it!

Submitted pull request, please let me know the feedback.