thepetabyteproject / your

Your Unified Reader

Home Page:https://thepetabyteproject.github.io/your/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

your_candmaker.py not parsing cands.csv properly with pandas

aaronpearlman opened this issue · comments

Describe the bug

cands.csv is not properly parsed using pandas.read_csv in the line: cand_pars = pd.read_csv(values.cand_param_file)

The first entry is treated as a header and ignored. Something like this could be better:

cand_pars = pd.read_csv(values.cand_param_file, names=["file", "snr", "stime", "dm", "width", "label"])

The keys "file", "snr", "width", "dm", "label", "stime", "chan_mask_path", and "num_files" are undefined otherwise. This creates problems later on in the code.

See:

process_list.append(
            [row['file'], row['snr'], 2 ** row['width'], row['dm'], row['label'], row['stime'],
             row['chan_mask_path'], row['num_files'], values, gpu_id])

To Reproduce

Traceback (most recent call last):
  File "/home/pearlman/miniconda3/envs/fetch/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4736, in get_value
    return libindex.get_value_box(s, key)
  File "pandas/_libs/index.pyx", line 51, in pandas._libs.index.get_value_box
  File "pandas/_libs/index.pyx", line 47, in pandas._libs.index.get_value_at
  File "pandas/_libs/util.pxd", line 98, in pandas._libs.util.get_value_at
  File "pandas/_libs/util.pxd", line 83, in pandas._libs.util.validate_indexer
TypeError: 'str' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/pearlman/.local/bin/your_candmaker.py", line 4, in <module>
    __import__('pkg_resources').run_script('your==0.4.9', 'your_candmaker.py')
  File "/home/pearlman/miniconda3/envs/fetch/lib/python3.7/site-packages/pkg_resources/__init__.py", line 666, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/pearlman/miniconda3/envs/fetch/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1453, in run_script
    exec(code, namespace, namespace)
  File "/home/pearlman/.local/lib/python3.7/site-packages/your-0.4.9-py3.7.egg/EGG-INFO/scripts/your_candmaker.py", line 176, in <module>
    [row['file'], row['snr'], 2 ** row['width'], row['dm'], row['label'], row['stime'],
  File "/home/pearlman/miniconda3/envs/fetch/lib/python3.7/site-packages/pandas/core/series.py", line 1068, in __getitem__
    result = self.index.get_value(self, key)
  File "/home/pearlman/miniconda3/envs/fetch/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4744, in get_value
    raise e1
  File "/home/pearlman/miniconda3/envs/fetch/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4730, in get_value
    return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
  File "pandas/_libs/index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
  File "pandas/_libs/index.pyx", line 88, in pandas._libs.index.IndexEngine.get_value
  File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'file'

Expected behavior

your_candmaker.py should parse the cands.csv and generate .h5 files exactly like candmaker.py.

Unlike previous candmaker, your_candmaker.py needs the csv files with headers. You can use candcsvmaker.py to make your candiate csv files.

Ah, didn't realize this version required a csv file in a slightly different format. My candidate list doesn't come from heimdall, so I'll hack something up myself.

A couple of other questions/suggestions:

It seems that the gpu version of candmaker does not work if the number of channels in the filterbank is < 256. If this condition is not met, IndexError(f"GPU candmaker will not work if nchans < 256.") gets thrown. Can this be fixed? I don't remember having this issue with the CPU version.

Also, I suggest adding to pysigproc.py:

_type['nsamples'] = 'int'

I had some issues with read_header trying to parse this field, and this fixed that problem.

We presently don't plan to make the GPU version for nchans < 256 because with small number of channels its fairly fast on CPU (unless your tsamp is very small). However, if you plan to modify it feel free to send a PR.

We have your_object.your_header.nspectra instead of _type['nsamples'].