A set of python scripts used to match pulsar candidates (in PFD or PHCX format) to known pulsar sources. It compares each user supplied candidate against known sources in the ANTF pulsar catalog.
http://www.atnf.csiro.au/people/pulsar/psrcat/
To do so the ATNF catalog must first be parsed. However the catalog can be downloaded in many formats. The scripts supplied will parse the candidates output from the ANTF web interface standard format, and in the "Long with errors" format, provided you have save this output to a file passed to these scripts.
This is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
Its distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
See http://www.gnu.org/licenses/ for more license details.
Author: Rob Lyon
Contact: rob@scienceguyrob.com or robert.lyon@postgrad.manchester.ac.uk
Web: http://www.scienceguyrob.com
-
Overview
Matches pulsar candidates against known sources from the ANTF pulsar catalog.
The ANTF catalog file contains a number of known sources. Each source has a number of parameters, though the exact number of parameters varies from source to source. The format of the catalog file is as follows (as structured from the start of the file):
PSRJ J0007+7303 aaa+09c
RAJ 00:07:01.7 2 awd+12
DECJ +73:03:07.4 8 awd+12
F0 3.165827392 3 awd+12
F1 -3.6120E-12 5 awd+12
F2 4.1E-23 7 awd+12
F3 5.4E-30 9 awd+12
@-----------------------------------------------------------------PSRB B0021-72G rlm+95
PSRJ J0024-7204G
RAJ 00:24:07.9587 3 fck+03
DECJ -72:04:39.6911 7 fck+03
PMRA 4.2 14 fck+03
PMDEC -3.7 12 fck+03
F0 247.50152509652 2 fck+03@-----------------------------------------------------------------
. . .
The scripts can read this format.
Alternatively, they can also read the "Long with errors" format produced by the web interface.
-
Requirements
The KnownSourceMatcher scripts have the following system requirements:
Python 2.4 or later. SciPy NumPy [matplotlib library] (http://matplotlib.org/)
-
Usage
The application script Matcher.py can be executed via:
python Matcher.py
The script accepts a number of arguments. It requires three of these to execute, and accepts another three as optional.
Required Arguments
Flag | Type | Description |
---|---|---|
−p | string | Path to the directory containing PHCX or PFD candidates to match to known sources. |
−o | string | Full path to the output file to write any matches found to. |
--psrcat | string | Path to the file containing the ANTF pulsar catalog data. |
Optional Arguments
Flag | Type | Description |
---|---|---|
−−v | boolean | Flag that when provided, put the application into validation mode. |
−i | boolean | Puts the application into interactive mode, allowing matching via the terminal. |
-v | boolean | Verbose debugging flag. |
-
Matching Function
The following conditions must hold before a candidate is considered a match for a known source:
-
The candidate period must fall within a user specified range of the known source period. This range is a percentage, i.e. the candidate period must be no greater than, and no less than m % of the known source period. i.e. given m = 5%, if a known source period is 1, then a candidate will only match if its period is in the range 1.05 - 0.95. The default accuracy level is 0.5%.
-
The candidate DM must fall within the same range as above. The default DM accuracy is 5%.
-
The angular separation in degrees between the known source and candidate, must be less than a user specified radius (default is 1 degree).
All these matching settings can be altered in the settings file described below.
-
-
Settings File
The settings for the matching procedure are stored in a file, Settings.txt. This can be modified to tune the matching procedure to give better results. Here is an example of a valid settings file:
accuracy=1.0 radius=2.5 padding=36000
The padding setting is useful for altering the precision of the thresholded comparison function described below.
-
How It Works
There are two options for candidate comparison to known sources. The first is fine if you only have a small number of candidates to compare. The second is better when you have many candidates you wish to compare.
i) NAIVE COMPARISON
It works simply by comparing each candidate to every known source in the pulsar catalog. Given N known sources and M candidates, this will require N x M comparisons. With M =11,000,000 and N = 2008, this equates to 2.2088 x 10^10 or 22,088,000,000 comparisons.
ii) THRESHOLDED COMPARISON:
Compares candidates to known sources using a divide and conquer approach. Instead of looping through all the known sources in the catalog, this recursive procedure compares candidates to known sources which are similar. Similarity is determined using a "sort" attribute computed for each candidate. The "sort" attribute is the RAJ converted to seconds, added to the DECJ converted to seconds. Any two sources which are nearby in the sky should have very similar sort attribute values. i.e. a contrived example: Source 1 RAJ = 00:00:01 DECJ = 00:00:00 Sort attribute = 1 Source 2 RAJ = 00:00:01 DECJ = 00:00:01 Sort attribute = 2 We can use this to compare candidates extremely quickly since it allows known sources to be sorted. Then each new user supplied candidate can be compared to these sorted sources quickly. Example of how this algorithm works: We have a candidate with a sortAttribute == 1. We want to know which known sources from the pulsar catalog are most similar to it. Lets suppose we have 100 known sources against which to compare, which are ordered according to their sort attribute. The first known source has a sort attribute = 1 and the last a sort attribute = 100. So the known source at position 1 has a sort attribute = 1. .. The known source at position 50 has a sort attribute = 50. .. The known source at position 100 has a sort attribute = 100. Here's a simple visualisation. Start Midpoint End | | | V V V ______________________________________ | | | | | | | | | | | | | | | | | | | | -------------------------------------- 0 n-1 If the candidate sort attribute is less than the midpoint sort attribute, then we only need to search: Start Midpoint | | V V ____________________ | | | | | | | | | | | --------------------- 0 (n-1)/2 else we search the other half: Midpoint End | | V V __________________ | | | | | | | | | | ------------------- (n-1)/2 n-1 As the sort attribute here = 1, which is less than the midpoint, we search: Start Midpoint End | | | V V V ____________________ | | | | | | | | | | | --------------------- 0 (n-1)/4 (n-1)/2 This procedure repeats, splitting each time until a single known source closest to the candidate is found. Then it compares the candidate to that known source, and to those immediately to the left and right of it. How many it is compared to, to the left and right of the known source, is determined by the padding parameter stored in the settings file. Given N known sources and M candidates, this will require WORST case N x M comparisons. The worst case only occurs if all the known sources have nearly the same RAJ and DECJ as each candidate being compared. The chances of this happening are as close to zero. The AVERAGE case is M + log2(N) + C , where C is a number of comparisons made between known sources and candidates which have similar RAJ and DECJ values. The true value of C is dependent on the padding parameter supplied by the user and the similarity of the known sources and candidates. i.e. Given M = 11,000,000 and N = 2008 and C=50 (overestimate!), there would be 561,000,011 comparisons, compared to 22,088,000,000 comparisons for the naive approach. To illustrate this difference, comparing 1300 candidates to 2008 known sources using the naive approach took ~5 minutes. This optimized approach took only 10 seconds. The expected numbers of comparisons given varying C and M = 11,000,000 and N = 2008: C = 1 -> ~ 22,000,000 BEST CASE C = 10 -> ~ 121,000,011 C = 20 -> ~ 231,000,011 C = 50 -> ~ 561,000,011 C = 100 -> ~ 1,111,000,011 C = 1000 -> ~ 11,011,000,010 C = 2008 -> ~ 22,099,000,010 WORST CASE only marginally worse than Naive Case = 22,088,000,000
-
Citing this work
Please use the following citation if you make use of tool:
@misc{KnownSourceMatcher, author = {Lyon, R. J.}, title = {{Known Source Matcher}}, affiliation = {University of Manchester}, month = {November}, year = {2014}, howpublished = {World Wide Web Accessed (19/11/2014), \newline \url{https://github.com/scienceguyrob/KnownSourceMatcher}}, notes = {Accessed 19/11/2014} }
-
Acknowledgements
This work was supported by grant EP/I028099/1 for the University of Manchester Centre for Doctoral Training in Computer Science, from the UK Engineering and Physical Sciences Research Council (EPSRC).