InScida workshop organised at AIMS, Muizenburg

My current research is focused on building emulators for weak lensing. Traditional Markov Chain Monte Carlo (MCMC) methods require many forward simulations, which are generally expensive to compute, in order to obtain reliable posterior densities of not only cosmological parameters but also systematic parameters. While neural networks have been explored in the earlier days, my main focus has been on Gaussian Processes, which allow for the full propagation of the uncertainty throughout the likelihood analysis. In addition, I have also been working on the MOPED (Heavens et al. 2000) compression algorithm to show that we can use the Gaussian Process emulator combined to recover the full posterior distribution of cosmological and nuisance parameters.

On a separate note, I am also interested in Machine Learning and Deep Learning (from a probabilistic perspective). I believe uncertainty quantification is important and will be a central focus in the next decade of research in Machine Learning. Moreover, while 'curve-fitting' techniques have been very successful over the past two decades, there is still the requirement to understand and interpret these algorithms.

# Publications

Below is an updated list of publications in which I was involved:

1. Fast Approximate Model for the 3D Matter Power Spectrum
Arrykrishna Mootoovaloo, Andrew Jaffe, Alan Heavens, Florent Leclercq
Accepted for publication in NeurIPS 2021

2. Many Bayesian inference problems in cosmology involve complex models. Despite the fact that these models have been meticulously designed, they can lead to intractable likelihood and each forward simulation itself can be computationally expensive, thus making the inverse problem of learning the model parameters a challenging task. In this paper, we develop an approximate model for the 3D matter power spectrum, $P_{\delta}(k,z)$, which is a central quantity in a weak lensing analysis. An important output of this approximate model, often referred to as surrogate model or emulator, are the first and second derivatives with respect to the input cosmological parameters. Without the emulator, the calculation of the derivatives requires multiple calls of the simulator, that is, the accurate Boltzmann solver, CLASS. We illustrate the application of the emulator in the calculation of different weak lensing and intrinsic alignment power spectra and we also demonstrate its performance on a toy simulated weak lensing dataset.

3. Kernel-Based Emulator for the 3D Matter Power Spectrum from CLASS
Arrykrishna Mootoovaloo, Andrew Jaffe, Alan Heavens, Florent Leclercq
Astronomy and Computing, arXiv:2105.02256, Code, Documentation

4. The 3D matter power spectrum, $P_{\delta}(k,z)$ is a fundamental quantity in the analysis of cosmological data such as large-scale structure, 21cm observations, and weak lensing. Existing computer models (Boltzmann codes) such as CLASS can provide it at the expense of immoderate computational cost. In this paper, we propose a fast Bayesian method to generate the 3D matter power spectrum, for a given set of wavenumbers, $k$ and redshifts, $z$. Our code allows one to calculate the following quantities: the linear matter power spectrum at a given redshift (the default is set to 0); the non-linear 3D matter power spectrum with/without baryon feedback; the weak lensing power spectrum. The gradient of the 3D matter power spectrum with respect to the input cosmological parameters is also returned and this is useful for Hamiltonian Monte Carlo samplers. The derivatives are also useful for Fisher matrix calculations. In our application, the emulator is accurate when evaluated at a set of cosmological parameters, drawn from the prior, with the fractional uncertainty, $\Delta P_{\delta}/P_{\delta}$ centered on 0. It is also $\sim 300$ times faster compared to CLASS, hence making the emulator amenable to sampling cosmological and nuisance parameters in a Monte Carlo routine. In addition, once the 3D matter power spectrum is calculated, it can be used with a specific redshift distribution, $n(z)$ to calculate the weak lensing and intrinsic alignment power spectra, which can then be used to derive constraints on cosmological parameters in a weak lensing data analysis problem. The software ($\texttt{emuPK}$) can be trained with any set of points and is distributed on Github, and comes with a pre-trained set of Gaussian Process (GP) models, based on 1000 Latin Hypercube (LH) samples, which follow roughly the current priors for current weak lensing analyses.

5. Data Augmentation in a Hierarchical-Based Classification scheme for Variable Stars
Zafiirah Hosenie, Robert Lyon, Benjamin Stappers, Arrykrishna Mootoovaloo, Vanessa McBrides
Accepted for publication in NeurIPS 2020

6. The accurate automated classification of variable stars into their respective sub-types is difficult. Machine learning based solutions often fall foul of the imbalanced learning problem, which causes poor generalisation performance in practice, especially on rare variable star sub-types. We attempted to overcome such deficiencies via the development of a hierarchical machine learning classifier. This ‘algorithm-level’ approach to tackling imbalance, yielded promising results on Catalina Real-Time Survey (CRTS) data. We attempt to further improve hierarchical classification performance by applying ‘data-level’ approaches to directly augment the training data so that they better describe under-represented classes. We apply and report results for three data augmentation methods in particular: Randomly Augmented Sampled Light curves from magnitude Error (RASLE), augmenting light curves with Gaussian Process modelling (GpFit) and the Synthetic Minority Over-sampling Technique (SMOTE). When combining the ‘algorithm-level’ (i.e. the hierarchical scheme) together with the ‘data-level’ approach, we further improve variable star classification accuracy by 1-4%. We found that a higher classification rate is obtained when using GpFit in the hierarchical model.

7. Parameter Inference for Weak Lensing using Gaussian Processes and MOPED
Arrykrishna Mootoovaloo, Alan Heavens, Andrew Jaffe, Florent Leclercq
MNRAS, arXiv:2005.06551, Code

8. In this paper, we propose a Gaussian Process (GP) emulator for the calculation of a) tomographic weak lensing band-power spectra, and b) coefficients of summary data massively compressed with the MOPED algorithm. In the former case cosmological parameter inference is accelerated by a factor of $\sim 10$-$30$ compared to explicit calls to the Boltzmann solver CLASS when applied to KiDS-450 weak lensing data. Much larger gains will come with future data, where with MOPED compression, the speed up can be up to a factor of $\sim 10^3$ when the common Limber approximation is used. Furthermore, the GP opens up the possibility of dropping the Limber approximation, without which the theoretical calculations may be unfeasibly slow. A potential advantage of GPs is that an error on the emulated function can be computed and this uncertainty incorporated into the likelihood. If speed is of the essence, then the mean of the Gaussian Process can be used and the uncertainty ignored. We compute the Kullback-Leibler divergence between the emulator likelihood and the CLASS likelihood, and on the basis of this and from analysing the uncertainties on the parameters, we find that in this case, the inclusion of the GP uncertainty does not justify the extra computational expense in the test application. For future weak lensing surveys such as Euclid and Legacy Survey of Space and Time (LSST), the number of summary statistics will be large, up to $\sim 10^{4}$. The speed of MOPED is determined by the number of parameters, not the number of summary data, so the gains are very large. In the non-Limber case, the speed-up can be a factor of $\sim 10^5$, provided that a fast way to compute the theoretical MOPED coefficients is available. The GP presented here provides such a fast mechanism and enables MOPED to be employed.

9. Gaussian Processes and MOPED Compression for Weak Lensing
Arrykrishna Mootoovaloo, Alan Heavens, Andrew Jaffe, Florent Leclercq
ICLR, Video, Paper, Code

10. In this paper, we propose a Gaussian Process (GP) emulator for tomographic weak lensing band power spectra, which speeds up cosmological parameter inference by a factor of $\sim$ 12 − 30 when applied to the KiDS-450 data. The emulator is built upon 8 parameters (6 cosmological and 2 systematic) using 1000 Latin Hypercube training samples. Since the prediction from the emulator is Gaussian distributed, we can also include the theoretical uncertainty in the likelihood function using the predictive uncertainty to marginalise over the latent function. Including the GP uncertainty is more demanding since it requires an $\mathcal{O}(N^{2})$ operation, for train each band power at every step in the MCMC compared to $\mathcal{O}(N_{\textrm{train}})$ for the mean alone, where $N_{\textrm{train}}$ is the number of training points in the emulator. If we use the exact solver, that is CLASS, each likelihood computation takes ∼ 0.65 seconds while each likelihood evaluation with the mean and uncertainty of the GP is $\sim$ 0.03 and $\sim$ 0.09 seconds respectively. The algorithm can be sped further when using MOPED compression algorithm which eliminates the $\mathcal{O}(N^{3})$ cost in the likelihood evaluation, where $N$ is the number of data points. Finally, with the emulator, it is realistic to drop the Limber approximation and train on the very slow but accurate full integrals.

11. Imbalance Learning for Variable Star Classification
Zafiirah Hosenie, Robert Lyon, Benjamin Stappers, Arrykrishna Mootoovaloo, Vanessa McBride
MNRAS, arXiv:2002.12386, Code

12. The accurate automated classification of variable stars into their respective sub-types is difficult. Machine learning based solutions often fall foul of the imbalanced learning problem, which causes poor generalisation performance in practice, especially on rare variable star sub-types. In previous work, we attempted to overcome such deficiencies via the development of a hierarchical machine learning classifier. This 'algorithm-level' approach to tackling imbalance, yielded promising results on Catalina Real-Time Survey (CRTS) data, outperforming the binary and multi-class classification schemes previously applied in this area. In this work, we attempt to further improve hierarchical classification performance by applying 'data-level' approaches to directly augment the training data so that they better describe under-represented classes. We apply and report results for three data augmentation methods in particular: Randomly Augmented Sampled Light curves from magnitude Error (𝚁𝙰𝚂𝙻𝙴), augmenting light curves with Gaussian Process modelling (𝙶𝚙𝙵𝚒𝚝) and the Synthetic Minority Over-sampling Technique (𝚂𝙼𝙾𝚃𝙴). When combining the 'algorithm-level' (i.e. the hierarchical scheme) together with the 'data-level' approach, we further improve variable star classification accuracy by 1-4%. We found that a higher classification rate is obtained when using 𝙶𝚙𝙵𝚒𝚝 in the hierarchical model. Further improvement of the metric scores requires a better standard set of correctly identified variable stars and, perhaps enhanced features are needed.

13. Comparing Multi-class, Binary and Hierarchical Machine Learning Classification schemes for variable stars
Zafiirah Hosenie, Robert Lyon, Benjamin Stappers, Arrykrishna Mootoovaloo
MNRAS, arXiv:1907.08189

14. Upcoming synoptic surveys are set to generate an unprecedented amount of data. This requires an automatic framework that can quickly and efficiently provide classification labels for several new object classification challenges. Using data describing 11 types of variable stars from the Catalina Real-Time Transient Surveys (CRTS), we illustrate how to capture the most important information from computed features and describe detailed methods of how to robustly use Information Theory for feature selection and evaluation. We apply three Machine Learning (ML) algorithms and demonstrate how to optimize these classifiers via cross-validation techniques. For the CRTS dataset, we find that the Random Forest (RF) classifier performs best in terms of balanced-accuracy and geometric means. We demonstrate substantially improved classification results by converting the multi-class problem into a binary classification task, achieving a balanced-accuracy rate of $\sim$99 per cent for the classification of $\delta$-Scuti and Anomalous Cepheids (ACEP). Additionally, we describe how classification performance can be improved via converting a 'flat-multi-class' problem into a hierarchical taxonomy. We develop a new hierarchical structure and propose a new set of classification features, enabling the accurate identification of subtypes of cepheids, RR Lyrae and eclipsing binary stars in CRTS data.

15. Marginal Likelihoods from Monte Carlo Markov Chains
Alan Heavens, Yabebal Fantaye, Arrykrishna Mootoovaloo, Hans Eggers, Zafiirah Hosenie, Steve Kroon, Elena Sellentin
arXiv:1704.03472, Code

16. In this paper, we present a method for computing the marginal likelihood, also known as the model likelihood or Bayesian evidence, from Markov Chain Monte Carlo (MCMC), or other sampled posterior distributions. In order to do this, one needs to be able to estimate the density of points in parameter space, and this can be challenging in high numbers of dimensions. Here we present a Bayesian analysis, where we obtain the posterior for the marginal likelihood, using kth nearest-neighbour distances in parameter space, using the Mahalanobis distance metric, under the assumption that the points in the chain (thinned if required) are independent. We generalise the algorithm to apply to importance-sampled chains, where each point is assigned a weight. We illustrate this with an idealised posterior of known form with an analytic marginal likelihood, and show that for chains of length $∼10^{5}$ points, the technique is effective for parameter spaces with up to ∼20 dimensions. We also argue that $k=1$ is the optimal choice, and discuss failure modes for the algorithm. In a companion paper (Heavens et al. 2017) we apply the technique to the main MCMC chains from the 2015 Planck analysis of cosmic background radiation data, to infer that quantitatively the simplest 6-parameter flat $\Lambda$CDM standard model of cosmology is preferred over all extensions considered.

17. No evidence for extensions to the standard cosmological model
Alan Heavens, Yabebal Fantaye, Elena Sellentin, Hans Eggers, Zafiirah Hosenie, Steve Kroon, Arrykrishna Mootoovaloo
PRL, arXiv:1704.03467, Code

18. We compute the Bayesian Evidence for models considered in the main analysis of Planck cosmic microwave background data. By utilising carefully-defined nearest-neighbour distances in parameter space, we reuse the Monte Carlo Markov Chains already produced for parameter inference to compute Bayes factors $B$ for many different model-dataset combinations. Standard 6-parameter flat $\Lambda$CDM model is favoured over all other models considered, with curvature being mildly favoured only when CMB lensing is not included. Many alternative models are strongly disfavoured by the data, including primordial correlated isocurvature models (ln$B$=−7.8), non-zero scalar-to-tensor ratio (ln$B$=−4.3), running of the spectral index (ln$B$=−4.7), curvature (ln$B$=−3.6), non-standard numbers of neutrinos (ln$B$=−3.1), non-standard neutrino masses (ln$B$=−3.2), non-standard lensing potential (ln$B$=−4.6), evolving dark energy (ln$B$=−3.2), sterile neutrinos (ln$B$=−6.9), and extra sterile neutrinos with a non-zero scalar-to-tensor ratio (ln$B$=−10.8). Other models are less strongly disfavoured with respect to flat $\Lambda$CDM. As with all analyses based on Bayesian Evidence, the final numbers depend on the widths of the parameter priors. We adopt the priors used in the Planck analysis, while performing a prior sensitivity analysis. Our quantitative conclusion is that extensions beyond the standard cosmological model are disfavoured by Planck data. Only when newer Hubble constant measurements are included does ΛCDM become disfavoured, and only mildly, compared with a dynamical dark energy model (ln$B\sim$+2).

19. Bayes Factors via Savage-Dickey Supermodels
Arrykrishna Mootoovaloo, Bruce A. Bassett, Martin Kunz
arXiv:1609.02186

20. We outline a new method to compute the Bayes Factor for model selection which bypasses the Bayesian Evidence. Our method combines multiple models into a single, nested, Supermodel using one or more hyperparameters. Since the models are now nested the Bayes Factors between the models can be efficiently computed using the Savage-Dickey Density Ratio (SDDR). In this way model selection becomes a problem of parameter estimation. We consider two ways of constructing the supermodel in detail: one based on combined models, and a second based on combined likelihoods. We report on these two approaches for a Gaussian linear model for which the Bayesian evidence can be calculated analytically and a toy nonlinear problem. Unlike the combined model approach, where a standard Monte Carlo Markov Chain (MCMC) struggles, the combined-likelihood approach fares much better in providing a reliable estimate of the log-Bayes Factor. This scheme potentially opens the way to computationally efficient ways to compute Bayes Factors in high dimensions that exploit the good scaling properties of MCMC, as compared to methods such as nested sampling that fail for high dimensions.