vc1492a / PyNomaly

Anomaly detection using LoOP: Local Outlier Probabilities, a local density based outlier detection method providing an outlier score in the range of [0,1].

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

parallelize

maxcw opened this issue · comments

It would be great if there's an option for embarrassingly parallel computations, especially if all N^2 distances are calculated.

@maxcw Thanks for opening the issue, I agree that it would be a nice option to provide parallelism as part of the available options for computation. I believe this is available via numba, the JIT-compilation library that's an option when using PyNomaly.

Since parallel computation is an option when using numba, it may be pretty straight-forward to try and test the following implementation, more specifically take this line:

compute = numba.jit(self._compute_distance_and_neighbor_matrix,

In pass the parameter parallel in the following way:

# parallel is some boolean parameter set earlier, e.g. 
parallel = True
compute = numba.jit(self._compute_distance_and_neighbor_matrix,
                            cache=True, parallel=parallel) if self.use_numba else \
            self._compute_distance_and_neighbor_matrix

I'll mark this as an enhancement to take a look at for a future release (or please feel free to try it yourself and submit a PR). Thanks!

Work on this issue can now be tracked in #43.

May be helpful to use a tracing tool like pyinstrument to gauge the effect of certain code changes.

Implemented in the branch feature/numba parallel but performance is not improved.