UCBerkeleySETI / blimpy

Breakthrough Listen I/O Methods for Python

Home Page:https://blimpy.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Negative selection size on laptop with little free memory

telegraphic opened this issue · comments

Madeline will be adding some more details, here's the bit that I think needs to change:

# Calculate the max data array size from available memory

Here is the error I'm getting when I create a FindDoppler() object: "blimpy.io.base_reader WARNING Selection size is 0.06 GB, which exceeds the memory usage limit of -0.4209861755371094 GB. Keeping data on disk." It sends the same message when I try to run the search routine on the object.

@madeleine-king

It would be useful to know the following:

  • Make and model of the laptop? See the bottom of the machine for the model code.
  • How old?
  • How much total RAM does your laptop have?
  • On a quiet machine, how much RAM was available while the messages appeared?
  • O/S: Windows, Linux, or MacOS?

Please do not reproduce this with an Internet browser and jupyter. Use the turboSETI command line or the app I already shared with you. A reboot before running will create a cleaned up RAM.

My machine has 16 GB RAM so its difficult to reproduce anything like this but I am inspired to develop a memory-occupier C app that runs by allocating a specified block of RAM and just sits idle until killed. Will also desk check the code @telegraphic highlighted.

IMPORTANT too: In spite of the open issue, can you get turboSETI or the app I gave you to run successfully?

  • I've got a 2020 Macbook Air, Model A2337, total RAM is 8 GB, running MacOS
  • 2 GB RAM looks to be available while the messages appear.

After this message comes up, turboSETI hits an error and doesn't run successfully (same with the app)

Here's what happens when running turboSETI and it sends the memory usage limit message: memoryError.txt

It doesn't want to allocate more ram when there is less than 1G available, and it looks like you have around 600M of ram available, according to what this Python process sees. This is just a warning, not an error. This base_reader object is supposed to heuristically guess if it can load data into memory or not, so blimpy is working as intended AFAICT. The actual stack trace is coming from turbo_seti/find_doppler/data_handler.py, line 281, in load_data, so I would look there.

The message itself is confusing:
"blimpy.io.base_reader WARNING Selection size is 0.06 GB, which exceeds the memory usage limit of -0.4209861755371094 GB"

A negative "memory usage limit"?

Also, we can do a better job of diagnosis than this in blimpy or turbo_seti:

blimpy.io.base_reader WARNING  Selection size is 0.06 GB, which exceeds the memory usage limit of -0.20230865478515625 GB. Keeping data on disk.
/Users/madeleineking/opt/anaconda3/envs/turboSETI/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3419: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
/Users/madeleineking/opt/anaconda3/envs/turboSETI/lib/python3.9/site-packages/numpy/core/_methods.py:188: RuntimeWarning: invalid value encountered in true_divide
  ret = ret.dtype.type(ret / rcount)
root            ERROR    tuple index out of range
Traceback (most recent call last):
  File "/Users/madeleineking/opt/anaconda3/envs/turboSETI/lib/python3.9/site-packages/turbo_seti/find_doppler/seti_event.py", line 135, in exec
    find_seti_event.search(n_partitions=args.n_parallel,
  File "/Users/madeleineking/opt/anaconda3/envs/turboSETI/lib/python3.9/site-packages/turbo_seti/find_doppler/find_doppler.py", line 205, in search
    search_coarse_channel(dl, self, dataloader=sched, filewriter=filewriter, logwriter=logwriter)
  File "/Users/madeleineking/opt/anaconda3/envs/turboSETI/lib/python3.9/site-packages/turbo_seti/find_doppler/find_doppler.py", line 280, in search_coarse_channel
    data_obj, spectra, drift_indices = dataloader.get()
  File "/Users/madeleineking/opt/anaconda3/envs/turboSETI/lib/python3.9/site-packages/turbo_seti/find_doppler/kernels/Scheduler/__init__.py", line 36, in get
    result = f.result()
  File "/Users/madeleineking/opt/anaconda3/envs/turboSETI/lib/python3.9/concurrent/futures/_base.py", line 445, in result
    return self.__get_result()
  File "/Users/madeleineking/opt/anaconda3/envs/turboSETI/lib/python3.9/concurrent/futures/_base.py", line 390, in __get_result
    raise self._exception
  File "/Users/madeleineking/opt/anaconda3/envs/turboSETI/lib/python3.9/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/madeleineking/opt/anaconda3/envs/turboSETI/lib/python3.9/site-packages/turbo_seti/find_doppler/find_doppler.py", line 229, in load_data
    spectra, drift_indices = data_obj.load_data()
  File "/Users/madeleineking/opt/anaconda3/envs/turboSETI/lib/python3.9/site-packages/turbo_seti/find_doppler/data_handler.py", line 281, in load_data
    if spectra.shape[0] != self.tsteps:
IndexError: tuple index out of range

My guess is that the blimpy library does this "maybe load data" behavior, which some clients use, but turboseti just ignores that and expects the data to always get loaded into memory. We could just drop the "keep 1 GB free" logic and be willing to allocate as much memory as is available, I bet that would solve the problem in practice. That is my recommended solution but I think someone wanted more complicated heuristics for some reason before.

Fixed in PR #209.