An investigation of machine-learning algorithms for the estimation of galaxy redshift

Western Sydney University thesis: Master's thesis

Abstract

The next wave of large radio telescopes is being commissioned, with plans to observe deeper, in wider areas than ever before. The Evolutionary Map of the Universe (EMU) project is expected to increase the number of known radio galaxies from ∼2.5 million to ∼70 million, allowing for statistical studies of unprecedented size in the radio regime. However, most of the studies planned by the EMU project require redshift estimates. While the redshift measurements required don't need to be measured to excellent resolution and can be roughly binned, they do require a low level of outliers. Even with recent advancements in multi-object spectroscopy, spectroscopic redshifts will only be possible for a small fraction of sources. The majority of the newly discovered radio sources will have limited multi-wavelength photometry, whereas traditional photometric template fitting methods requires high-quality, complete multiwavelength photometry. Previous research has used machine learning (ML) to estimate redshift, but has primarily focused on trying to match the best results provided by photometric template fitting, using the best, and most complete data available. For the most part, the datasets used are not radio-selected "" which typically fail using photometric template fitting methods "" and are limited in redshift. While Machine Learning (ML) techniques have proved to be effective, most have not been conclusively tested on radio-selected datasets, at the higher redshift ranges expected from the EMU project. In this thesis, I examine the utility of the k-NearestNeighbours (kNN) and Random Forest (RF) regression and classification algorithms for estimating the redshift of a source from its features. The kNN tests include using five different distance metrics. I use a radio-selected dataset, built from the Australia Telescope Large Area Survey (ATLAS) 1.4 GHz radio survey which was completed in anticipation of the EMU project, and has been observed to around the depth of the EMU project. The 1.4 GHz flux ""measured by ATLAS "" was combined with Infrared (IR) fluxes from the Spitzer Wide-area Infrared Extragalactic Survey (SWIRE), optical magnitudes from the DES, and spectroscopic redshi. measurements from the OzDES. Based on the combined multi-wavelength catalogue, I create three datasets. Dataset A consists of all sources with a spectroscopic redshift, with the sources with missing observations included, and those missing values filled with the mean of that feature across the entire dataset. Dataset B is a subset of Dataset A, with those sources without complete multi-wavelength photometry removed. Dataset C is a subset of Dataset B, with the sources removed that have optical or IR photometry below the detection limits of all-sky surveys. To test the generalisation of the algorithms across the sky, I use three different training and test sets. Set 1 uses a training set randomly selected from the dataset. Set 2 uses a training set made up entirely of sources from the European Large Area ISO Survey-South 1 (ELAIS-S1) field, with the test set made up from the Extended Chandra Deep Field South (eCDFS) field. Set 3 uses a training sample made up entirely of sources from the eCDFS field, with the test set made up from the ELAIS-S1 field. This thesis shows that traditionally simple ML algorithms like kNN and RFs can provide acceptable redshift estimations on radio selected data, with the best results coming from redshift binned to a lower resolution. By extending the algorithms to suit the data, the kNN classification algorithm using the Largest Margin Nearest Neighbour (LMNN) learned distance metric provided a decrease in the number of outliers, reaching, an ƞ0:15 outlier rate of ∼5%, with accuracies of σ∆z/(1+zspec) ≈ 0.09. Once completed, the EMU project is expected to have optical and IR counterparts for ≈ 40% of the 70 million detected radio galaxies. By 2020, this is expected to increase to ≈ 70% of the galaxies detected by the EMU project. This thesis shows that the EMU project can be provided with reliable redshift for ≈ 95% of sources with optical and IR photometry "" ∼ 27 million sources when the EMU project is completed, increasing to ∼ 47 million sources by 2020. This will enable many of the key science goals of the EMU project to be completed.
Date of Award2018
Original languageEnglish

Keywords

  • red shift
  • astronomical photometry
  • radio telescopes
  • machine learning
  • algorithms

Cite this

'