Extracts features from PHCX and PFD pulsar candidate files. Not to be confused with the PulsarFeatureLab, which is used for feature extraction and experimentation.
This is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
Its distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
See http://www.gnu.org/licenses/ for more license details.
Author: Rob Lyon
Contact: rob@scienceguyrob.com or robert.lyon@postgrad.manchester.ac.uk
Web: http://www.scienceguyrob.com
- Overview
Script which extracts feature data from pulsar candidates. These features are used as the inputs to machine learning classification algorithms. The code can extract two different types of features:
i. 22 Scores described in Sam Bates' thesis, "Surveys Of The Galactic Plane For Pulsars" 2011.
The scores generated are as follows:
Number | Description of feature | Type |
---|---|---|
1 | Chi squared value from fitting since curve to pulse profile. | Sinusoid Fitting |
2 | Chi squared value from fitting sine-squared curve to pulse profile. | Sinusoid Fitting |
3 | Number of peaks the program identifies in the pulse profile - 1. | Pulse Profile Tests |
4 | Sum over residuals. | Pulse Profile Tests |
5 | Distance between expectation values of Gaussian and fixed Gaussian fits to profile histogram. | Gaussian Fitting |
6 | Ratio of the maximum values of Gaussian and fixed Gaussian fits to profile histogram. | Gaussian Fitting |
7 | Distance between expectation values of derivative histogram and profile histogram. | Gaussian Fitting |
8 | Full-width-half-maximum (FWHM) of Gaussian fit to pulse profile. | Gaussian Fitting |
9 | Chi squared value from Gaussian fit to pulse profile. | Gaussian Fitting |
10 | Smallest FWHM of double-Gaussian fit to pulse profile. | Gaussian Fitting |
11 | Chi squared value from double Gaussian fit to pulse profile. | Gaussian Fitting |
12 | Best period. | Candidate Parameters |
13 | Best SNR value. | Candidate Parameters |
14 | Best DM value. | Candidate Parameters |
15 | Best pulse width (original reported as Duty cycle (pulse width / period)). | Candidate Parameters |
16 | SNR / SQRT( (P-W)/W ). | Dispersion Measure (DM) Curve Fitting |
17 | Difference between fitting factor, Prop, and 1. | Dispersion Measure (DM) Curve Fitting |
18 | Difference between best DM value and optimised DM value from fit, mod(DMfit - DMbest). | Dispersion Measure (DM) Curve Fitting |
19 | Chi squared value from DM curve fit. | Dispersion Measure (DM) Curve Fitting |
20 | RMS of peak positions in all sub-bands. | Sub-band Scores |
21 | Average correlation coefficient for each pair of sub-bands. | Sub-band Scores |
22 | Sum of correlation coefficients between sub-bands and profile. | Sub-band Scores |
ii. 8 Scores described in my own paper, "Fifty Years of Pulsar Candidate Selection: From simple filters to a new
principled real-time classification approach"
Number | Description of feature |
---|---|
1 | Mean of the integrated profile. |
2 | Standard deviation of the integrated profile. |
3 | Excess kurtosis of the integrated profile. |
4 | Skewness of the integrated profile. |
5 | Mean of the DM-SNR curve. |
6 | Standard deviation of the DM-SNR curve. |
7 | Excess kurtosis of the DM-SNR curve. |
8 | Skewness of the DM-SNR curve. |
-
Requirements
The PulsarFeatureExtractor files have the following system requirements:
Python 2.4 or later. SciPy NumPy [matplotlib library] (http://matplotlib.org/)
-
Usage
The main application script ScoreGenerator.py can be executed via:
python ScoreGenerator.py
The script accepts a number of arguments. It requires two of these to execute, and accepts another eight as optional.
Required Arguments
Flag | Type | Description |
---|---|---|
−c | string | Path to the directory containing PHCX or PFD candidates to extract features from. |
−o | string | Full path to the output file to write extracted feature data to. |
Optional Arguments
Flag | Type | Description |
---|---|---|
--pfd | boolean | Flag which indicates that ONLY .pfd files are to be processed. |
--phx | boolean | Flag which indicates that ONLY HTRU .phcx files are to be processed. |
--superb | boolean | Flag which indicates that ONLY SUPERB .phcx files are to be processed. |
--arff | boolean | Flag which indicates that feature data should be written to an ARFF file. |
--profile | boolean | Flag which indicates that profile, rather than score data should be generated as features. |
--dmprof | boolean | Flag which indicates that DM and profile data should be extracted as features. |
-v | boolean | Verbose debugging flag. |
-
Citing this work
Please use the following citation if you make use of tool:
@misc{PulsarFeatureExtractor, author = {Lyon, R. J.}, title = {{Pulsar Feature Extractor}}, affiliation = {University of Manchester}, month = {November}, year = {2014}, howpublished = {World Wide Web Accessed (19/11/2014), \newline \url{https://github.com/scienceguyrob/PulsarFeatureExtractor}}, notes = {Accessed 19/11/2014} }
-
Acknowledgements
This work was supported by grant EP/I028099/1 for the University of Manchester Centre for Doctoral Training in Computer Science, from the UK Engineering and Physical Sciences Research Council (EPSRC).