Skip to content

Latest commit

 

History

History
442 lines (375 loc) · 23.7 KB

README.md

File metadata and controls

442 lines (375 loc) · 23.7 KB

racs-historical-transients

A module to perform transient searching between SUMSS/NVSS and RACS. Creates a crossmatch catalogue for SUMSS/NVSS -> ASKAP sources, produces diagnostic plots and also includes the ability to create postage stamps of each crossmatch. A Django webserver is also included that will allow you to explore the results and mark candidates for further investigation.

A note

This module was born out of an initial diagnostic script and has grown and grown. As the author, I can only describe the state as 'terrible'. So if you are looking to try and decipher this or use a bit of it I can only apologise, please contact me if you do find yourself here. At this point I would scrap most of it and start again, I had also not quite let go of traditional for loops when using pandas so the performance could also be improved. It's a similar story for the web_server - while it works there is a bit of duplication in the code and some hacky ways around things to make it do what I wanted. At least I made it to Python 3, so there's that.

Dependancies

Module is for Python 3 only (I have tested up to v3.8.1).

The only non-python module dependancy (everything else should be installed from pip install) required is postgres and the postgres plugin Q3C: https://github.com/segasai/q3c. However these are only required for the website. You also only need to install Q3C to postgres as the Django migration will do the Q3C setting up for you.

CASA is required if you wish the pipeline to produce the convolved image.

Warning Python 3 & django-keyboard-shortcuts

On first use django-keyboard-shortcuts will fail with Python 3, though the fix is quite simple. It is a rogue comma in the __init__.py. Edit this as mentioned in this Stack Overflow post and it will work.

SUMSS and NVSS mosaics

Currently you also need a local copy of the SUMSS and NVSS mosaic images. I never got around to pulling a large image directly from SkyView, but through testing this is probably possible.

Installation

I recommend to install the module to a new python environment using, for example, conda or virtualenv.

To install using pip:

pip install git+https://github.com/ajstewart/askap-image-diagnostic.git

Or you can clone the git repository and install using python setup.py install or pip install ..

Creating a Database for the Website

This script was intended to be run on the ada machine which has a installation of postgresql available. To create a database run:

createdb <db name> e.g. createdb racs (if you get a denied message contact the system administrator)

This will create an empty database with the chosen name. Make sure to note down the database settings (port, user, name) for use with the pipeline options. If the pipeline is run, with the db inject option turned on without first initilising the tables, then the tables will be newly created. The easiest way to initilise the tables is by setting up the [website](#Installation of the Website).

Installation of the Website

Included in the repository is web_server which is a basic website built using Django to allow the user to explore the results in a convienient way and for other users to give feedback on the crossmatching.

To install, copy the web_server directory to a location where you wish to host the website from and cd into the web_server directory.

From here rename the web_server/settings.py.template to web_server/settings.py and edit the file with the correct database information as above.

Now run the migrations as so, this will essentially create the tables in the database:

python manage.py makemigrations
python manage.py migrate

Also make a note of the install directory, specifically to the /static/media/ directory as this is used in the pipeline (option --website-media-dir).

Now the server can be launched (in the example below port 8005 is used):

python manage.py runserver 0.0.0.0:8005

Slack Integration

The website has the feature of being able to send message to slack. To set this up you need a Bot API token from Slack for your app and the ID of the channel to send it to. See here for more information.

What does the Pipeline do?

See PIPELINE.md.

Usage

The built pipeline script, available from the command line, is processASKAPimage.py.

By default, which means no askap or sumss csv files are provided, aegean will be run on the ASKAP image to extract a source catalogue and the SUMSS catalogue will be automatically fetched from Vizier. The SUMSS catalogue will be trimmed to only those sources that fall within the image area.

More than one image can be passed through the processing script at once - however currently the manual csv inputs do not support multiple entires. Hence let the script automatically do source finding and SUMSS fetching if you want to run more than one image through.

A range of options exist to influence processing:

usage: processASKAPimage.py [-h] [-c FILE] [--output-tag OUTPUT_TAG] [--log-level {WARNING,INFO,DEBUG}]
                            [--nice NICE] [--clobber CLOBBER] [--sumss-only SUMSS_ONLY] [--nvss-only NVSS_ONLY]
                            [--weight-crop WEIGHT_CROP] [--weight-crop-value WEIGHT_CROP_VALUE]
                            [--weight-crop-image WEIGHT_CROP_IMAGE] [--convolve CONVOLVE]
                            [--convolved-image CONVOLVED_IMAGE]
                            [--convolved-non-conv-askap-csv CONVOLVED_NON_CONV_ASKAP_CSV]
                            [--convolved-non-conv-askap-islands-csv CONVOLVED_NON_CONV_ASKAP_ISLANDS_CSV]
                            [--sourcefinder {aegean,pybdsf,selavy}] [--frequency FREQUENCY] [--askap-csv ASKAP_CSV]
                            [--askap-islands-csv ASKAP_ISLANDS_CSV] [--sumss-csv SUMSS_CSV] [--nvss-csv NVSS_CSV]
                            [--askap-csv-format {aegean,selavy}] [--remove-extended REMOVE_EXTENDED]
                            [--askap-ext-thresh ASKAP_EXT_THRESH] [--sumss-ext-thresh SUMSS_EXT_THRESH]
                            [--nvss-ext-thresh NVSS_EXT_THRESH] [--use-all-fits USE_ALL_FITS]
                            [--write-ann WRITE_ANN] [--produce-overlays PRODUCE_OVERLAYS]
                            [--boundary-value {nan,zero}] [--askap-flux-error ASKAP_FLUX_ERROR]
                            [--diagnostic-max-separation DIAGNOSTIC_MAX_SEPARATION]
                            [--transient-max-separation TRANSIENT_MAX_SEPARATION] [--postage-stamps POSTAGE_STAMPS]
                            [--postage-stamp-selection {all,transients}]
                            [--postage-stamp-ncores POSTAGE_STAMP_NCORES]
                            [--postage-stamp-radius POSTAGE_STAMP_RADIUS]
                            [--postage-stamp-zscale-contrast POSTAGE_STAMP_ZSCALE_CONTRAST]
                            [--sumss-mosaic-dir SUMSS_MOSAIC_DIR] [--nvss-mosaic-dir NVSS_MOSAIC_DIR]
                            [--aegean-settings-config AEGEAN_SETTINGS_CONFIG]
                            [--pybdsf-settings-config PYBDSF_SETTINGS_CONFIG]
                            [--selavy-settings-config SELAVY_SETTINGS_CONFIG] [--transients TRANSIENTS]
                            [--transients-askap-snr-thresh TRANSIENTS_ASKAP_SNR_THRESH]
                            [--transients-large-flux-ratio-thresh TRANSIENTS_LARGE_FLUX_RATIO_THRESH]
                            [--db-inject DB_INJECT] [--db-engine DB_ENGINE] [--db-username DB_USERNAME]
                            [--db-password DB_PASSWORD] [--db-host DB_HOST] [--db-port DB_PORT]
                            [--db-database DB_DATABASE] [--db-tag DB_TAG] [--website-media-dir WEBSITE_MEDIA_DIR]
                            images [images ...]

positional arguments:
  images                Define the images to process

optional arguments:
  -h, --help            show this help message and exit
  -c FILE, --conf_file FILE
                        Specify config file (default: None)
  --output-tag OUTPUT_TAG
                        Add a tag to the output name. (default: )
  --log-level {WARNING,INFO,DEBUG}
                        Set the logging level. (default: INFO)
  --nice NICE           Set the 'nice' level of processes. (default: 10)
  --clobber CLOBBER     Overwrite output if already exists. (default: False)
  --sumss-only SUMSS_ONLY
                        Only use SUMSS in the image analysis. (default: False)
  --nvss-only NVSS_ONLY
                        Only use NVSS in the image analysis. (default: False)
  --weight-crop WEIGHT_CROP
                        Crop image using the weights image. (default: False)
  --weight-crop-value WEIGHT_CROP_VALUE
                        Define the minimum normalised value from the weights image to crop to. (default: 0.04)
  --weight-crop-image WEIGHT_CROP_IMAGE
                        Define the weights image to use. (default: weights.fits)
  --convolve CONVOLVE   Convolve the image using CASA to SUMSS resolution for crossmatching. (default: False)
  --convolved-image CONVOLVED_IMAGE
                        Define a convolved image that has already been produced. (default: None)
  --convolved-non-conv-askap-csv CONVOLVED_NON_CONV_ASKAP_CSV
                        Define the unconvolved catalogue to use when using convolved mode, otherwise it will be
                        generated automatically (if aegaen or pybdsf) (default: None)
  --convolved-non-conv-askap-islands-csv CONVOLVED_NON_CONV_ASKAP_ISLANDS_CSV
                        Define the unconvolved island catalogue to use when using convolved mode, otherwise it will
                        be generated automatically (if aegaen or pybdsf) (default: None)
  --sourcefinder {aegean,pybdsf,selavy}
                        Select which sourcefinder to use (default: aegean)
  --frequency FREQUENCY
                        Provide the frequency of the image in Hz. Use if 'RESTFRQ' is not in the header (default:
                        99)
  --askap-csv ASKAP_CSV
                        Manually define a aegean format csv file containing the extracted sources to use for the
                        ASKAP image. (default: None)
  --askap-islands-csv ASKAP_ISLANDS_CSV
                        Manually define a csv file containing the extracted islands to use for the ASKAP image.
                        (default: None)
  --sumss-csv SUMSS_CSV
                        Manually provide the SUMSS catalog csv. (default: None)
  --nvss-csv NVSS_CSV   Manually provide the NVSS catalog csv. (default: None)
  --askap-csv-format {aegean,selavy}
                        Define which source finder provided the ASKAP catalog (currently only supports aegean).
                        (default: aegean)
  --remove-extended REMOVE_EXTENDED
                        Remove perceived extended sources from the catalogues. Uses the following arguments 'askap-
                        ext-thresh' and 'sumss-ext-thresh' to set the threshold. (default: False)
  --askap-ext-thresh ASKAP_EXT_THRESH
                        Define the maximum scaling threshold of the size of the ASKAP source compared to the PSF.
                        Used to exclude extended sources. Only 1 axis has to exceed. (default: 1.2)
  --sumss-ext-thresh SUMSS_EXT_THRESH
                        Define the maximum scaling threshold of the size of the SUMSS source compared to the PSF.
                        Use to exclude extended sources. Only 1 axis has to exceed. (default: 1.2)
  --nvss-ext-thresh NVSS_EXT_THRESH
                        Define the maximum scaling threshold of the size of the NVSS source compared to the PSF. Use
                        to exclude extended sources. Only 1 axis has to exceed. (default: 1.2)
  --use-all-fits USE_ALL_FITS
                        Use all the fits from Aegean ignoring all flags. Default only those with flag '0' are used.
                        (default: False)
  --write-ann WRITE_ANN
                        Create kvis annotation files of the catalogues. (default: False)
  --produce-overlays PRODUCE_OVERLAYS
                        Create overlay figures of the sources on the ASKAP image. (default: True)
  --boundary-value {nan,zero}
                        Define whether the out-of-bounds value in the ASKAP FITS is 'nan' or 'zero'. (default: nan)
  --askap-flux-error ASKAP_FLUX_ERROR
                        Percentage error to apply to flux errors. (default: 0.0)
  --diagnostic-max-separation DIAGNOSTIC_MAX_SEPARATION
                        Maximum crossmatch distance (in arcsec) to be consdiered when creating the diagnostic plots.
                        (default: 5.0)
  --transient-max-separation TRANSIENT_MAX_SEPARATION
                        Maximum crossmatch distance (in arcsec) to be consdiered when searching for transients.
                        (default: 45.0)
  --postage-stamps POSTAGE_STAMPS
                        Produce postage stamp plots of the cross matched sources within the max separation.
                        (default: False)
  --postage-stamp-selection {all,transients}
                        Select which postage stamps to create. (default: all)
  --postage-stamp-ncores POSTAGE_STAMP_NCORES
                        Select how many cores to use when creating the postage stamps. (default: 6)
  --postage-stamp-radius POSTAGE_STAMP_RADIUS
                        Select the radius of the postage stamp cutouts (arcmin). (default: 13.0)
  --postage-stamp-zscale-contrast POSTAGE_STAMP_ZSCALE_CONTRAST
                        Select the ZScale contrast to use in the postage stamps. (default: 0.2)
  --sumss-mosaic-dir SUMSS_MOSAIC_DIR
                        Directory containing the SUMSS survey mosaic image files. (default: None)
  --nvss-mosaic-dir NVSS_MOSAIC_DIR
                        Directory containing the NVSS survey mosaic image files. (default: None)
  --aegean-settings-config AEGEAN_SETTINGS_CONFIG
                        Select a config file containing the Aegean settings to be used (instead of defaults if none
                        provided). (default: None)
  --pybdsf-settings-config PYBDSF_SETTINGS_CONFIG
                        Select a config file containing the PyBDSF settings to be used (instead of defaults if none
                        provided). (default: None)
  --selavy-settings-config SELAVY_SETTINGS_CONFIG
                        Select a config file containing the Selavy settings to be used (instead of defaults if none
                        provided). (default: None)
  --transients TRANSIENTS
                        Perform a transient search analysis using the crossmatch data. Requires '--max-separation'
                        to be defined. (default: False)
  --transients-askap-snr-thresh TRANSIENTS_ASKAP_SNR_THRESH
                        Define the threshold for which ASKAP sources are considered to not have a SUMSS match baseed
                        upon the estimated SUMSS SNR if the source was placed in the SUMSS image. (default: 5.0)
  --transients-large-flux-ratio-thresh TRANSIENTS_LARGE_FLUX_RATIO_THRESH
                        Define the threshold for which sources are considered to have a large flux ratio. Median
                        value +/- threshold x std. (default: 3.0)
  --db-inject DB_INJECT
                        Turn databse injection on or off. (default: True)
  --db-engine DB_ENGINE
                        Define the database engine. (default: postgresql)
  --db-username DB_USERNAME
                        Define the username to use for the database (default: postgres)
  --db-password DB_PASSWORD
                        Define the password to use for the database (default: postgres)
  --db-host DB_HOST     Define the host for the databse. (default: localhost)
  --db-port DB_PORT     Define the port for the databse. (default: 5432)
  --db-database DB_DATABASE
                        Define the name of the database. (default: postgres)
  --db-tag DB_TAG       The description field in the databased attached to the image. (default: RACS Analysis)
  --website-media-dir WEBSITE_MEDIA_DIR
                        Copy the image directory directly to the static media directory of the website. (default:
                        none)

These options can be entered using a ConfigParser configuration file:

[GENERAL]
output_tag=askap_racs_analysis
log_level=INFO
nice=10
clobber=True

[ANALYSIS]
sumss_only=true
nvss_only=false
frequency=864e6
weight_crop=True
weight_crop_value=0.04
weight_crop_image=../path/to/weight_cropped_image.fits
convolve=True
convolved_image=../path/to/convolved_image.fits
convolved_non_conv_askap_csv=../path/to/preconvolved_catalog.csv_
convolved_non_conv_askap_islands_csv=/path/to/preconvolved_islands_catalog.csv_
sourcefinder=aegean
# aegean_settings_config=None
# pybdsf_settings_config=None
# selavy_settings_config=None
boundary_value=nan
askap_flux_error=0.1

[CATALOGUES]
askap_csv=/path/to/askap_catalog.csv
askap_islands_csv=/path/to/askap_islands_catalog.csv
# sumss_csv=None
# nvss_csv=None
askap_csv_format=aegean
write_ann=True

[CROSSMATCHING]
diagnostic_max_separation=5.0
transient_max_separation=45.0
remove_extended=True
askap_ext_thresh=1.3
sumss_ext_thresh=1.5
nvss_ext_thresh=1.2
use_all_fits=False

[TRANSIENTS]
transients=True
transients_askap_sumss_snr_thresh=5.0
transients_large_flux_ratio_thresh=2.0

[POSTAGESTAMPS]
postage_stamps=False
postage_stamp_selection=all
postage_stamp_ncores=6
postage_stamp_radius=13.0
postage_stamp_zscale_contrast=0.25
sumss_mosaic_dir=/directory/where/sumss/mosaics/are/kept
nvss_mosaic_dir=/directory/where/sumss/mosaics/are/kept

[DATABASE]
db_inject=true
db_engine=postgresql
db_username=user
db_host=localhost
db_port=5432
db_database=RACS
db_tag=Tag to add to pipeline run
website_media_dir=/path/to/the/website/media/dir

Weight Cropping

If the weights.XX.fits file is available and supplied then this can be used to trim the image to remove the edges, leaving just the cleaner part of the image. By default this is set to cut to a value of 0.04 of the maximum weight value.

Image Convolving

The pipeline is able to convolve the supplied ASKAP image to that of the SUMSS or NVSS resolution. If the image has already been convolved it can supplied to the pipeline using the convolved_image argument (note make sure to have 'convolved=True'). In the case of convolving to SUMSS the target beam size follows the 45 x 45 cosec |dec| convention. Cross matching is then done against the convolved image.

If convolving is used then the non-convolved image will also be analysed. The source will be extracted by the pipeline using Aegean or a catalogue can be supplied using the convolved_non_conv_askap_csv argument. This is used when searching for transient sources.

Image Diagnostic Plots

These plots are produced by only using sources that have a crossmatch distance <= the user defined max separation, i.e. good matches. Also if --remove-extended is enabled then extended sources are also removed from the list of crossmatches used to produce the diagnostic plots.

Note the source numbers plot does not make these exclusions.

Transient Searching

The pipeline works by matching each SUMSS source in the image with the nearest ASKAP source extracted.

Good matches are deemed those that are <= the max separation defined by the user. Above this is considered to have no match. This provides 3 different sub-types of cross matches:

  • No ASKAP Match to SUMSS - This defines a SUMSS source that has no ASKAP source matched to it within the max separation limit.
  • No SUMSS Match to ASKAP - This defines an ASKAP source that has not been matched to a SUMSS source within the max separation limit AND has an integrated flux density such that it would be at least a 5 sigma detection in the SUMSS image (this does not yet account for spectral index).
  • Good Matches - The sources that are defined as being a good match (including the large ratio sources).

From here, force extractions are performed using Aegean where a source has not been found. This enables the flux ratio to be computed for each crossmatch source - no matter the sub type. Transient candidates are those sources which have a flux ratio >= 2.0.

Output

In the top level directory will be:

  • log files.
  • png files of diagnostic plots.
  • csv files of askap catalogue, sumss sources and the complete crossmatching result.

Two directories may also be present:

  • postage-stamps - this will store all the postage stamp images which will be sorted into good and bad matches.
  • transients - this will store all the csv files of categorised transient candidates and in sub-directories will also be copies of the postage stamps if these have been created (renamed for the respective transient source and category).

Example

Input: An ASKAP image called image.askap.mosaic.restored.fits. The pixels outside of the image area are NaNs. Our database is on the localhost with the username of user123, on the default port and is called racstest. Our website media directory is /my/website/static/media

Want: To crossmatch the ASKAP image with SUMSS and use only matches that are <= 20 arcsec to perform the analysis. Allow the script to automatically build the catalogues and remove extended sources when creating the diagnostic plots. In this case, these are defined as sources that have one axis that is 1.4 X larger than the associated beam size axis. Also want to produce postage stamp images of the crossmatches along with producing kvis annotation files, and finally perform a transient search. We will mark the image in the database with first test.

Command:

processASKAPimage.py image.askap.mosaic.restored.fits --remove-extended --askap-ext-thresh 1.4 --sumss-ext-thresh 1.4 --max-separation 20.0 --postage-stamps --sumss-mosaic-dir /path/to/sumss_mosaics_dir --write-ann --transients --db-username user123 --db-name racstest --db-tag "first test" --website-media-dir /my/website/static/media

Or we can use configure the parset file to run:

[GENERAL]
output_tag=example
log_level=INFO
nice=10
clobber=True

[ANALYSIS]
frequency=864e6
weight_crop=False
weight_crop_value=0.04
weight_crop_image=../path/to/weight_cropped_image.fits
convolve=False
convolved_image=../path/to/convolved_image.fits
convolved_non_conv_askap_csv=../path/to/preconvolved_catalog.csv_
sourcefinder=aegean
# aegean_settings_config=None
# pybdsf_settings_config=None
# selavy_settings_config=None
boundary_value=nan
askap_flux_error=0.1

[CATALOGUES]
askap_csv=/path/to/askap_catalog.csv
# sumss_csv=None
# nvss_csv=None
askap_csv_format=aegean
write_ann=True

[CROSSMATCHING]
diagnostic_max_separation=5.0
transient_max_separation=20.0
remove_extended=True
askap_ext_thresh=1.4
sumss_ext_thresh=1.4
nvss_ext_thresh=1.2
use_all_fits=False

[TRANSIENTS]
transients=True
transients_askap_sumss_snr_thresh=5.0
transients_large_flux_ratio_thresh=2.0

[POSTAGESTAMPS]
postage_stamps=True
postage_stamp_selection=all
postage_stamp_ncores=2
postage_stamp_radius=13.0
postage_stamp_zscale_contrast=0.25
sumss_mosaic_dir=/directory/where/sumss/mosaics/are/kept
nvss_mosaic_dir=/directory/where/sumss/mosaics/are/kept

[DATABASE]
db_engine=postgresql
db_username=user123
db_host=localhost
db_port=5432
db_database=racstest
db_tag=first test
website_media_dir=/path/to/the/website/media/dir

And then run the pipeline like so:

processASKAPimage.py -c myparset.in image.askap.mosaic.restored.fits

Output: The results will be placed in image.askap.mosaic.restored_results.

Aegean settings

The default aegean settings are:

cores=1
maxsummits=5
seedclip=5
floodclip=4
nocov=True

These can be changed by providing a config file and supplying it to the argument --aegean-settings-config. There should be a standard ConfigParser header [aegean]. E.g.:

[aegean]
cores=12
maxsummits=5
seedclip=6
floodclip=4
autoload=True

To deactivate a setting remove it from the config file.