astrofrog / fast-histogram

:zap: Fast 1D and 2D histogram functions in Python :zap:

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Weighted histogram differs from numpy unless the data is explicity copied.

LucasCampos opened this issue · comments

First, thanks for the great library. I suspect I am hitting a memory issue. Unless I explictly copy the arrays before passing them to histogram2D, the results differ from numpy. So far, I only saw this happen when I am using the weighted version of the 2Dhistogram. Code and output follows

#! /usr/bin/env python
import numpy as np
import fast_histogram as ft

print(f"Numpy version {np.__version__}")
print(f"Fast-histogram version {ft.__version__}")

# Initial data
#######################################################
nbins=100
max_val=17.64
min_val=0.0
set_range=[[min_val, max_val],[min_val, max_val]]
with np.load("data.npz") as d:
    pot = d["pot"]
    displ_new = d["displ_new"]
    bond_ind = d["bond_ind"]


# Without copy
print("NOT EXPLICITLY COPYING THE DATA")
pot_array = pot[:, bond_ind[0,0], bond_ind[0,1]]
arr1 = displ_new[:,bond_ind[0,0]]
arr2 = displ_new[:,bond_ind[0,1]]

#Run the histograms
h_np = np.histogram2d(arr1, arr2, nbins, range=set_range, weights=pot_array)[0]
h_ft = ft.histogram2d(arr1, arr2, nbins, range=set_range, weights=pot_array)

print("===FAST-HIST===")
print(h_ft)
print("===NUMPY===")
print(h_np)


# With copy
print("EXPLICITLY COPYING THE DATA")
pot_array = pot[:, bond_ind[0,0], bond_ind[0,1]].copy()
arr1 = displ_new[:,bond_ind[0,0]].copy()
arr2 = displ_new[:,bond_ind[0,1]].copy()

#Run the histograms
h_np = np.histogram2d(arr1, arr2, nbins, range=set_range, weights=pot_array)[0]
h_ft = ft.histogram2d(arr1, arr2, nbins, range=set_range, weights=pot_array)

print("===FAST-HIST===")
print(h_ft)
print("===NUMPY===")
print(h_np)

with output

Numpy version 1.18.1
Fast-histogram version 0.8
NOT EXPLICITLY COPYING THE DATA
===FAST-HIST===
[[0.         0.         0.         ... 0.         0.         0.        ]
 [0.0425809  0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.01108671 ... 0.         0.         0.        ]
 ...
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]]
===NUMPY===
[[0.         0.         0.         ... 0.         0.         0.        ]
 [0.00524697 0.11627543 0.00218829 ... 0.         0.         0.        ]
 [0.         0.0233833  0.08906353 ... 0.         0.         0.        ]
 ...
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]]
EXPLICITLY COPYING THE DATA
===FAST-HIST===
[[0.         0.         0.         ... 0.         0.         0.        ]
 [0.00524697 0.11627543 0.00218829 ... 0.         0.         0.        ]
 [0.         0.0233833  0.08906353 ... 0.         0.         0.        ]
 ...
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]]
===NUMPY===
[[0.         0.         0.         ... 0.         0.         0.        ]
 [0.00524697 0.11627543 0.00218829 ... 0.         0.         0.        ]
 [0.         0.0233833  0.08906353 ... 0.         0.         0.        ]
 ...
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]]

The data file necessary to run the code is attached. Due to github restrictions, I had to zip it first.
data.npz.zip

Thanks for reporting this, this definitely seems like a bug! I'm going to be able to address this in the next few weeks but if anyone has a chance to investigate and open a pull request, I'll happily review it!

Glancing at the code, I think the issue is that we currently assume all arrays have the same memory layout since we use a single NpyIter_AdvancedNew for all arrays. This should clearly be generalized to avoid the kind of issue described above.

I found the issue, PR forthcoming

@LucasCampos - could you check whether things work fine with the latest developer version?

Hey! Thanks for suck a quick reponse. I can confirm that the new code works locally as well!

Numpy version 1.18.1
Fast-histogram version 0.9.dev2+g82f140b
NOT EXPLICITLY COPYING THE DATA
===FAST-HIST===
[[0.         0.         0.         ... 0.         0.         0.        ]
 [0.00524697 0.11627543 0.00218829 ... 0.         0.         0.        ]
 [0.         0.0233833  0.08906353 ... 0.         0.         0.        ]
 ...
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]]
===NUMPY===
[[0.         0.         0.         ... 0.         0.         0.        ]
 [0.00524697 0.11627543 0.00218829 ... 0.         0.         0.        ]
 [0.         0.0233833  0.08906353 ... 0.         0.         0.        ]
 ...
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]]
EXPLICITLY COPYING THE DATA
===FAST-HIST===
[[0.         0.         0.         ... 0.         0.         0.        ]
 [0.00524697 0.11627543 0.00218829 ... 0.         0.         0.        ]
 [0.         0.0233833  0.08906353 ... 0.         0.         0.        ]
 ...
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]]
===NUMPY===
[[0.         0.         0.         ... 0.         0.         0.        ]
 [0.00524697 0.11627543 0.00218829 ... 0.         0.         0.        ]
 [0.         0.0233833  0.08906353 ... 0.         0.         0.        ]
 ...
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]]

@LucasCampos - thanks! I'll make a new release.