scienceguyrob / PulsarFeatureExtractor

Extracts eight features from PHCX and PFD pulsar candidate files.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PulsarFeatureExtractor

Extracts features from PHCX and PFD pulsar candidate files. Not to be confused with the PulsarFeatureLab, which is used for feature extraction and experimentation.

This is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Its distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

See http://www.gnu.org/licenses/ for more license details.

Author: Rob Lyon

Contact: rob@scienceguyrob.com or robert.lyon@postgrad.manchester.ac.uk

Web: http://www.scienceguyrob.com

  1. Overview

Script which extracts feature data from pulsar candidates. These features are used as the inputs to machine learning classification algorithms. The code can extract two different types of features:

i. 22 Scores described in Sam Bates' thesis, "Surveys Of The Galactic Plane For Pulsars" 2011.
	The scores generated are as follows:
Number Description of feature Type
1 Chi squared value from fitting since curve to pulse profile. Sinusoid Fitting
2 Chi squared value from fitting sine-squared curve to pulse profile. Sinusoid Fitting
3 Number of peaks the program identifies in the pulse profile - 1. Pulse Profile Tests
4 Sum over residuals. Pulse Profile Tests
5 Distance between expectation values of Gaussian and fixed Gaussian fits to profile histogram. Gaussian Fitting
6 Ratio of the maximum values of Gaussian and fixed Gaussian fits to profile histogram. Gaussian Fitting
7 Distance between expectation values of derivative histogram and profile histogram. Gaussian Fitting
8 Full-width-half-maximum (FWHM) of Gaussian fit to pulse profile. Gaussian Fitting
9 Chi squared value from Gaussian fit to pulse profile. Gaussian Fitting
10 Smallest FWHM of double-Gaussian fit to pulse profile. Gaussian Fitting
11 Chi squared value from double Gaussian fit to pulse profile. Gaussian Fitting
12 Best period. Candidate Parameters
13 Best SNR value. Candidate Parameters
14 Best DM value. Candidate Parameters
15 Best pulse width (original reported as Duty cycle (pulse width / period)). Candidate Parameters
16 SNR / SQRT( (P-W)/W ). Dispersion Measure (DM) Curve Fitting
17 Difference between fitting factor, Prop, and 1. Dispersion Measure (DM) Curve Fitting
18 Difference between best DM value and optimised DM value from fit, mod(DMfit - DMbest). Dispersion Measure (DM) Curve Fitting
19 Chi squared value from DM curve fit. Dispersion Measure (DM) Curve Fitting
20 RMS of peak positions in all sub-bands. Sub-band Scores
21 Average correlation coefficient for each pair of sub-bands. Sub-band Scores
22 Sum of correlation coefficients between sub-bands and profile. Sub-band Scores
ii. 8 Scores described in my own paper, "Fifty Years of Pulsar Candidate Selection: From simple filters to a new
	principled real-time classification approach"
Number Description of feature
1 Mean of the integrated profile.
2 Standard deviation of the integrated profile.
3 Excess kurtosis of the integrated profile.
4 Skewness of the integrated profile.
5 Mean of the DM-SNR curve.
6 Standard deviation of the DM-SNR curve.
7 Excess kurtosis of the DM-SNR curve.
8 Skewness of the DM-SNR curve.
  1. Requirements

    The PulsarFeatureExtractor files have the following system requirements:

    Python 2.4 or later. SciPy NumPy [matplotlib library] (http://matplotlib.org/)

  2. Usage

The main application script ScoreGenerator.py can be executed via:

python ScoreGenerator.py

The script accepts a number of arguments. It requires two of these to execute, and accepts another eight as optional.

Required Arguments

Flag Type Description
−c string Path to the directory containing PHCX or PFD candidates to extract features from.
−o string Full path to the output file to write extracted feature data to.

Optional Arguments

Flag Type Description
--pfd boolean Flag which indicates that ONLY .pfd files are to be processed.
--phx boolean Flag which indicates that ONLY HTRU .phcx files are to be processed.
--superb boolean Flag which indicates that ONLY SUPERB .phcx files are to be processed.
--arff boolean Flag which indicates that feature data should be written to an ARFF file.
--profile boolean Flag which indicates that profile, rather than score data should be generated as features.
--dmprof boolean Flag which indicates that DM and profile data should be extracted as features.
-v boolean Verbose debugging flag.
  1. Citing this work

    Please use the following citation if you make use of tool:

    @misc{PulsarFeatureExtractor, author = {Lyon, R. J.}, title = {{Pulsar Feature Extractor}}, affiliation = {University of Manchester}, month = {November}, year = {2014}, howpublished = {World Wide Web Accessed (19/11/2014), \newline \url{https://github.com/scienceguyrob/PulsarFeatureExtractor}}, notes = {Accessed 19/11/2014} }

  2. Acknowledgements

    This work was supported by grant EP/I028099/1 for the University of Manchester Centre for Doctoral Training in Computer Science, from the UK Engineering and Physical Sciences Research Council (EPSRC).

About

Extracts eight features from PHCX and PFD pulsar candidate files.


Languages

Language:Python 100.0%