Weighted histogram differs from numpy unless the data is explicity copied.
LucasCampos opened this issue · comments
First, thanks for the great library. I suspect I am hitting a memory issue. Unless I explictly copy the arrays before passing them to histogram2D, the results differ from numpy. So far, I only saw this happen when I am using the weighted version of the 2Dhistogram. Code and output follows
#! /usr/bin/env python
import numpy as np
import fast_histogram as ft
print(f"Numpy version {np.__version__}")
print(f"Fast-histogram version {ft.__version__}")
# Initial data
#######################################################
nbins=100
max_val=17.64
min_val=0.0
set_range=[[min_val, max_val],[min_val, max_val]]
with np.load("data.npz") as d:
pot = d["pot"]
displ_new = d["displ_new"]
bond_ind = d["bond_ind"]
# Without copy
print("NOT EXPLICITLY COPYING THE DATA")
pot_array = pot[:, bond_ind[0,0], bond_ind[0,1]]
arr1 = displ_new[:,bond_ind[0,0]]
arr2 = displ_new[:,bond_ind[0,1]]
#Run the histograms
h_np = np.histogram2d(arr1, arr2, nbins, range=set_range, weights=pot_array)[0]
h_ft = ft.histogram2d(arr1, arr2, nbins, range=set_range, weights=pot_array)
print("===FAST-HIST===")
print(h_ft)
print("===NUMPY===")
print(h_np)
# With copy
print("EXPLICITLY COPYING THE DATA")
pot_array = pot[:, bond_ind[0,0], bond_ind[0,1]].copy()
arr1 = displ_new[:,bond_ind[0,0]].copy()
arr2 = displ_new[:,bond_ind[0,1]].copy()
#Run the histograms
h_np = np.histogram2d(arr1, arr2, nbins, range=set_range, weights=pot_array)[0]
h_ft = ft.histogram2d(arr1, arr2, nbins, range=set_range, weights=pot_array)
print("===FAST-HIST===")
print(h_ft)
print("===NUMPY===")
print(h_np)
with output
Numpy version 1.18.1
Fast-histogram version 0.8
NOT EXPLICITLY COPYING THE DATA
===FAST-HIST===
[[0. 0. 0. ... 0. 0. 0. ]
[0.0425809 0. 0. ... 0. 0. 0. ]
[0. 0. 0.01108671 ... 0. 0. 0. ]
...
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]]
===NUMPY===
[[0. 0. 0. ... 0. 0. 0. ]
[0.00524697 0.11627543 0.00218829 ... 0. 0. 0. ]
[0. 0.0233833 0.08906353 ... 0. 0. 0. ]
...
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]]
EXPLICITLY COPYING THE DATA
===FAST-HIST===
[[0. 0. 0. ... 0. 0. 0. ]
[0.00524697 0.11627543 0.00218829 ... 0. 0. 0. ]
[0. 0.0233833 0.08906353 ... 0. 0. 0. ]
...
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]]
===NUMPY===
[[0. 0. 0. ... 0. 0. 0. ]
[0.00524697 0.11627543 0.00218829 ... 0. 0. 0. ]
[0. 0.0233833 0.08906353 ... 0. 0. 0. ]
...
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]]
The data file necessary to run the code is attached. Due to github restrictions, I had to zip it first.
data.npz.zip
Thanks for reporting this, this definitely seems like a bug! I'm going to be able to address this in the next few weeks but if anyone has a chance to investigate and open a pull request, I'll happily review it!
Glancing at the code, I think the issue is that we currently assume all arrays have the same memory layout since we use a single NpyIter_AdvancedNew
for all arrays. This should clearly be generalized to avoid the kind of issue described above.
I found the issue, PR forthcoming
@LucasCampos - could you check whether things work fine with the latest developer version?
Hey! Thanks for suck a quick reponse. I can confirm that the new code works locally as well!
Numpy version 1.18.1
Fast-histogram version 0.9.dev2+g82f140b
NOT EXPLICITLY COPYING THE DATA
===FAST-HIST===
[[0. 0. 0. ... 0. 0. 0. ]
[0.00524697 0.11627543 0.00218829 ... 0. 0. 0. ]
[0. 0.0233833 0.08906353 ... 0. 0. 0. ]
...
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]]
===NUMPY===
[[0. 0. 0. ... 0. 0. 0. ]
[0.00524697 0.11627543 0.00218829 ... 0. 0. 0. ]
[0. 0.0233833 0.08906353 ... 0. 0. 0. ]
...
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]]
EXPLICITLY COPYING THE DATA
===FAST-HIST===
[[0. 0. 0. ... 0. 0. 0. ]
[0.00524697 0.11627543 0.00218829 ... 0. 0. 0. ]
[0. 0.0233833 0.08906353 ... 0. 0. 0. ]
...
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]]
===NUMPY===
[[0. 0. 0. ... 0. 0. 0. ]
[0.00524697 0.11627543 0.00218829 ... 0. 0. 0. ]
[0. 0.0233833 0.08906353 ... 0. 0. 0. ]
...
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]
[0. 0. 0. ... 0. 0. 0. ]]
@LucasCampos - thanks! I'll make a new release.