your_candmaker.py not parsing cands.csv properly with pandas
aaronpearlman opened this issue · comments
Describe the bug
cands.csv is not properly parsed using pandas.read_csv in the line: cand_pars = pd.read_csv(values.cand_param_file)
The first entry is treated as a header and ignored. Something like this could be better:
cand_pars = pd.read_csv(values.cand_param_file, names=["file", "snr", "stime", "dm", "width", "label"])
The keys "file", "snr", "width", "dm", "label", "stime", "chan_mask_path", and "num_files" are undefined otherwise. This creates problems later on in the code.
See:
process_list.append(
[row['file'], row['snr'], 2 ** row['width'], row['dm'], row['label'], row['stime'],
row['chan_mask_path'], row['num_files'], values, gpu_id])
To Reproduce
Traceback (most recent call last):
File "/home/pearlman/miniconda3/envs/fetch/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4736, in get_value
return libindex.get_value_box(s, key)
File "pandas/_libs/index.pyx", line 51, in pandas._libs.index.get_value_box
File "pandas/_libs/index.pyx", line 47, in pandas._libs.index.get_value_at
File "pandas/_libs/util.pxd", line 98, in pandas._libs.util.get_value_at
File "pandas/_libs/util.pxd", line 83, in pandas._libs.util.validate_indexer
TypeError: 'str' object cannot be interpreted as an integer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/pearlman/.local/bin/your_candmaker.py", line 4, in <module>
__import__('pkg_resources').run_script('your==0.4.9', 'your_candmaker.py')
File "/home/pearlman/miniconda3/envs/fetch/lib/python3.7/site-packages/pkg_resources/__init__.py", line 666, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/pearlman/miniconda3/envs/fetch/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1453, in run_script
exec(code, namespace, namespace)
File "/home/pearlman/.local/lib/python3.7/site-packages/your-0.4.9-py3.7.egg/EGG-INFO/scripts/your_candmaker.py", line 176, in <module>
[row['file'], row['snr'], 2 ** row['width'], row['dm'], row['label'], row['stime'],
File "/home/pearlman/miniconda3/envs/fetch/lib/python3.7/site-packages/pandas/core/series.py", line 1068, in __getitem__
result = self.index.get_value(self, key)
File "/home/pearlman/miniconda3/envs/fetch/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4744, in get_value
raise e1
File "/home/pearlman/miniconda3/envs/fetch/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 4730, in get_value
return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
File "pandas/_libs/index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
File "pandas/_libs/index.pyx", line 88, in pandas._libs.index.IndexEngine.get_value
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'file'
Expected behavior
your_candmaker.py should parse the cands.csv and generate .h5 files exactly like candmaker.py.
Unlike previous candmaker, your_candmaker.py
needs the csv files with headers. You can use candcsvmaker.py to make your candiate csv files.
Ah, didn't realize this version required a csv file in a slightly different format. My candidate list doesn't come from heimdall, so I'll hack something up myself.
A couple of other questions/suggestions:
It seems that the gpu version of candmaker does not work if the number of channels in the filterbank is < 256. If this condition is not met, IndexError(f"GPU candmaker will not work if nchans < 256.")
gets thrown. Can this be fixed? I don't remember having this issue with the CPU version.
Also, I suggest adding to pysigproc.py:
_type['nsamples'] = 'int'
I had some issues with read_header trying to parse this field, and this fixed that problem.
We presently don't plan to make the GPU version for nchans < 256
because with small number of channels its fairly fast on CPU (unless your tsamp
is very small). However, if you plan to modify it feel free to send a PR.
We have your_object.your_header.nspectra
instead of _type['nsamples']
.