: InScida workshop organised at AIMS, Muizenburg

My current research is focused on building emulators for weak lensing. Traditional Markov Chain Monte Carlo (MCMC) methods require many forward simulations, which are generally expensive to compute, in order to obtain reliable posterior densities of not only cosmological parameters but also systematic parameters. While neural networks have been explored in the earlier days, my main focus has been on Gaussian Processes, which allow for the full propagation of the uncertainty throughout the likelihood analysis. In addition, I have also been working on the MOPED (Heavens et al. 2000) compression algorithm to show that we can use the Gaussian Process emulator combined to recover the full posterior distribution of cosmological and nuisance parameters.

On a separate note, I am also interested in Machine Learning and Deep Learning (from a probabilistic perspective). I believe uncertainty quantification is important and will be a central focus in the next decade of research in Machine Learning. Moreover, while 'curve-fitting' techniques have been very successful over the past two decades, there is still the requirement to understand and interpret these algorithms.

Publications

Below is an updated list of publications in which I was involved:

emuflow: Normalising Flows for Joint Cosmological Analysis
A. Mootoovaloo, C. García-García, D. Alonso, J. Ruiz-Zapatero
arXiv:2409.01407, MNRAS, NeurIPS 2024

Given the growth in the variety and precision of astronomical datasets of interest for cosmology, the best cosmological constraints are invariably obtained by combining data from different experiments. At the likelihood level, one complication in doing so is the need to marginalise over large-dimensional parameter models describing the data of each experiment. These include both the relatively small number of cosmological parameters of interest and a large number of "nuisance" parameters. Sampling over the joint parameter space for multiple experiments can thus become a very computationally expensive operation. This can be significantly simplified if one could sample directly from the marginal cosmological posterior distribution of preceding experiments, depending only on the common set of cosmological parameters. In this paper, we show that this can be achieved by emulating marginal posterior distributions via normalising flows. The resulting trained normalising flow models can be used to efficiently combine cosmological constraints from independent datasets without increasing the dimensionality of the parameter space under study. We show that the method is able to accurately describe the posterior distribution of real cosmological datasets, as well as the joint distribution of different datasets, even when significant tension exists between experiments. The resulting joint constraints can be obtained in a fraction of the time it would take to combine the same datasets at the level of their likelihoods. We construct normalising flow models for a set of public cosmological datasets of general interests and make them available, together with the software used to train them, and to exploit them in cosmological parameter inference.

Assessment of Gradient-Based Samplers in Standard Cosmological Likelihoods
A. Mootoovaloo, J. Ruiz-Zapatero, C. García-García, D. Alonso
arXiv:2406.04725, MNRAS, NeurIPS 2024

We assess the usefulness of gradient-based samplers, such as the No-U-Turn Sampler (NUTS), by comparison with traditional Metropolis-Hastings algorithms, in tomographic $3\times 2$ point analyses. Specifically, we use the DES Year 1 data and a simulated future LSST-like survey as representative examples of these studies, containing a significant number of nuisance parameters (20 and 32, respectively) that affect the performance of rejection-based samplers. To do so, we implement a differentiable forward model using JAX-COSMO (Campagne et al. 2023), and we use it to derive parameter constraints from both datasets using the NUTS algorithm as implemented in §4, and the Metropolis-Hastings algorithm as implemented in Cobaya (Lewis 2013). When quantified in terms of the number of effective number of samples taken per likelihood evaluation, we find a relative efficiency gain of $\mathcal{O}(10)$ in favour of NUTS. However, this efficiency is reduced to a factor $\sim 2$ when quantified in terms of computational time, since we find the cost of the gradient computation (needed by NUTS) relative to the likelihood to be $\sim 4.5$ times larger for both experiments. We validate these results making use of analytical multi-variate distributions (a multivariate Gaussian and a Rosenbrock distribution) with increasing dimensionality. Based on these results, we conclude that gradient-based samplers such as NUTS can be leveraged to sample high dimensional parameter spaces in Cosmology, although the efficiency improvement is relatively mild for moderate $(\mathcal{O}(50))$ dimension numbers, typical of tomographic large-scale structure analyses.

LimberJack.jl: auto-differentiable methods for angular power spectra analyses
J. Ruiz-Zapatero, D. Alonso, C. García-García, A. Nicola, A. Mootoovaloo, J. M. Sullivan, M. Bonici, P. G. Ferreira
OJA, arXiv:2310.08306

We present LimberJack.jl, a fully auto-differentiable code for cosmological analyses of 2 point auto- and cross-correlation measurements from galaxy clustering, CMB lensing and weak lensing data written in Julia. Using Julia's auto-differentiation ecosystem, LimberJack.jl can obtain gradients for its outputs up to an order of magnitude faster than traditional finite difference methods. This makes LimberJack.jl greatly synergistic with gradient-based sampling methods, such as Hamiltonian Monte Carlo, capable of efficiently exploring parameter spaces with hundreds of dimensions. We first prove LimberJack.jl's reliability by reanalysing the DES Y1 $3\times 2$-point data. We then showcase its capabilities by using a $\mathcal{O}(100)$ parameters Gaussian Process to reconstruct the cosmic growth from a combination of DES Y1 galaxy clustering and weak lensing data, eBOSS QSO's, CMB lensing and redshift-space distortions. Our Gaussian process reconstruction of the growth factor is statistically consistent with the $\Lambda$CDM Planck 2018 prediction at all redshifts. Moreover, we show that the addition of RSD data is extremely beneficial to this type of analysis, reducing the uncertainty in the reconstructed growth factor by $20\%$ on average across redshift. LimberJack.jl is a fully open-source project available on Julia's general repository of packages and GitHub.

Extreme data compression for Bayesian model comparison
Alan F. Heavens, Arrykrishna Mootoovaloo, Roberto Trotta, Elena Sellentin
JCAP, arXiv:2306.15998

We develop extreme data compression for use in Bayesian model comparison via the MOPED algorithm, as well as more general score compression. We find that Bayes factors from data compressed with the MOPED algorithm are identical to those from their uncompressed datasets when the models are linear and the errors Gaussian. In other nonlinear cases, whether nested or not, we find negligible differences in the Bayes factors, and show this explicitly for the Pantheon-SH0ES supernova dataset. We also investigate the sampling properties of the Bayesian Evidence as a frequentist statistic, and find that extreme data compression reduces the sampling variance of the Evidence, but has no impact on the sampling distribution of Bayes factors. Since model comparison can be a very computationally-intensive task, MOPED extreme data compression may present significant advantages in computational time.

Analytical marginalisation over photometric redshift uncertainties in cosmic shear analyses
Jaime Ruiz-Zapatero, Boryana Hadzhiyska, David Alonso, Pedro G. Ferreira, Carlos García-García, Arrykrishna Mootoovaloo
MNRAS, arXiv:2301.11978

As the statistical power of imaging surveys grows, it is crucial to account for all systematic uncertainties. This is normally done by constructing a model of these uncertainties and then marginalizing over the additional model parameters. The resulting high dimensionality of the total parameter spaces makes inferring the cosmological parameters significantly more costly using traditional Monte-Carlo sampling methods. A particularly relevant example is the redshift distribution, $p(z)$, of the source samples, which may require tens of parameters to describe fully. However, relatively tight priors can be usually placed on these parameters through calibration of the associated systematics. In this paper we show, quantitatively, that a linearisation of the theoretical prediction with respect to these calibratable systematic parameters allows us to analytically marginalise over these extra parameters, leading to a factor $\sim 30$ reduction in the time needed for parameter inference, while accurately recovering the same posterior distributions for the cosmological parameters that would be obtained through a full numerical marginalisation over $160\; p(z)$ parameters. We demonstrate that this is feasible not only with current data and current achievable calibration priors but also for future Stage-IV datasets.

Fast Approximate Model for the 3D Matter Power Spectrum
Arrykrishna Mootoovaloo, Andrew Jaffe, Alan Heavens, Florent Leclercq
Accepted for publication in NeurIPS 2021

Many Bayesian inference problems in cosmology involve complex models. Despite the fact that these models have been meticulously designed, they can lead to intractable likelihood and each forward simulation itself can be computationally expensive, thus making the inverse problem of learning the model parameters a challenging task. In this paper, we develop an approximate model for the 3D matter power spectrum, $P_{\delta}(k,z)$, which is a central quantity in a weak lensing analysis. An important output of this approximate model, often referred to as surrogate model or emulator, are the first and second derivatives with respect to the input cosmological parameters. Without the emulator, the calculation of the derivatives requires multiple calls of the simulator, that is, the accurate Boltzmann solver, CLASS. We illustrate the application of the emulator in the calculation of different weak lensing and intrinsic alignment power spectra and we also demonstrate its performance on a toy simulated weak lensing dataset.

Kernel-Based Emulator for the 3D Matter Power Spectrum from CLASS
Arrykrishna Mootoovaloo, Andrew Jaffe, Alan Heavens, Florent Leclercq
Astronomy and Computing, arXiv:2105.02256, Code, Documentation

The 3D matter power spectrum, $P_{\delta}(k,z)$ is a fundamental quantity in the analysis of cosmological data such as large-scale structure, 21cm observations, and weak lensing. Existing computer models (Boltzmann codes) such as CLASS can provide it at the expense of immoderate computational cost. In this paper, we propose a fast Bayesian method to generate the 3D matter power spectrum, for a given set of wavenumbers, $k$ and redshifts, $z$. Our code allows one to calculate the following quantities: the linear matter power spectrum at a given redshift (the default is set to 0); the non-linear 3D matter power spectrum with/without baryon feedback; the weak lensing power spectrum. The gradient of the 3D matter power spectrum with respect to the input cosmological parameters is also returned and this is useful for Hamiltonian Monte Carlo samplers. The derivatives are also useful for Fisher matrix calculations. In our application, the emulator is accurate when evaluated at a set of cosmological parameters, drawn from the prior, with the fractional uncertainty, $\Delta P_{\delta}/P_{\delta}$ centered on 0. It is also $\sim 300$ times faster compared to CLASS, hence making the emulator amenable to sampling cosmological and nuisance parameters in a Monte Carlo routine. In addition, once the 3D matter power spectrum is calculated, it can be used with a specific redshift distribution, $n(z)$ to calculate the weak lensing and intrinsic alignment power spectra, which can then be used to derive constraints on cosmological parameters in a weak lensing data analysis problem. The software ($\texttt{emuPK}$) can be trained with any set of points and is distributed on Github, and comes with a pre-trained set of Gaussian Process (GP) models, based on 1000 Latin Hypercube (LH) samples, which follow roughly the current priors for current weak lensing analyses.

Data Augmentation in a Hierarchical-Based Classification scheme for Variable Stars
Zafiirah Hosenie, Robert Lyon, Benjamin Stappers, Arrykrishna Mootoovaloo, Vanessa McBrides
Accepted for publication in NeurIPS 2020

The accurate automated classification of variable stars into their respective sub-types is difficult. Machine learning based solutions often fall foul of the imbalanced learning problem, which causes poor generalisation performance in practice, especially on rare variable star sub-types. We attempted to overcome such deficiencies via the development of a hierarchical machine learning classifier. This ‘algorithm-level’ approach to tackling imbalance, yielded promising results on Catalina Real-Time Survey (CRTS) data. We attempt to further improve hierarchical classification performance by applying ‘data-level’ approaches to directly augment the training data so that they better describe under-represented classes. We apply and report results for three data augmentation methods in particular: Randomly Augmented Sampled Light curves from magnitude Error (RASLE), augmenting light curves with Gaussian Process modelling (GpFit) and the Synthetic Minority Over-sampling Technique (SMOTE). When combining the ‘algorithm-level’ (i.e. the hierarchical scheme) together with the ‘data-level’ approach, we further improve variable star classification accuracy by 1-4%. We found that a higher classification rate is obtained when using GpFit in the hierarchical model.

Parameter Inference for Weak Lensing using Gaussian Processes and MOPED
Arrykrishna Mootoovaloo, Alan Heavens, Andrew Jaffe, Florent Leclercq
MNRAS, arXiv:2005.06551, Code

In this paper, we propose a Gaussian Process (GP) emulator for the calculation of a) tomographic weak lensing band-power spectra, and b) coefficients of summary data massively compressed with the MOPED algorithm. In the former case cosmological parameter inference is accelerated by a factor of $\sim 10$-$30$ compared to explicit calls to the Boltzmann solver CLASS when applied to KiDS-450 weak lensing data. Much larger gains will come with future data, where with MOPED compression, the speed up can be up to a factor of $\sim 10^3$ when the common Limber approximation is used. Furthermore, the GP opens up the possibility of dropping the Limber approximation, without which the theoretical calculations may be unfeasibly slow. A potential advantage of GPs is that an error on the emulated function can be computed and this uncertainty incorporated into the likelihood. If speed is of the essence, then the mean of the Gaussian Process can be used and the uncertainty ignored. We compute the Kullback-Leibler divergence between the emulator likelihood and the CLASS likelihood, and on the basis of this and from analysing the uncertainties on the parameters, we find that in this case, the inclusion of the GP uncertainty does not justify the extra computational expense in the test application. For future weak lensing surveys such as Euclid and Legacy Survey of Space and Time (LSST), the number of summary statistics will be large, up to $\sim 10^{4}$. The speed of MOPED is determined by the number of parameters, not the number of summary data, so the gains are very large. In the non-Limber case, the speed-up can be a factor of $\sim 10^5$, provided that a fast way to compute the theoretical MOPED coefficients is available. The GP presented here provides such a fast mechanism and enables MOPED to be employed.

Gaussian Processes and MOPED Compression for Weak Lensing
Arrykrishna Mootoovaloo, Alan Heavens, Andrew Jaffe, Florent Leclercq
ICLR, Video, Paper, Code

In this paper, we propose a Gaussian Process (GP) emulator for tomographic weak lensing band power spectra, which speeds up cosmological parameter inference by a factor of $\sim$ 12 − 30 when applied to the KiDS-450 data. The emulator is built upon 8 parameters (6 cosmological and 2 systematic) using 1000 Latin Hypercube training samples. Since the prediction from the emulator is Gaussian distributed, we can also include the theoretical uncertainty in the likelihood function using the predictive uncertainty to marginalise over the latent function. Including the GP uncertainty is more demanding since it requires an $\mathcal{O}(N^{2})$ operation, for train each band power at every step in the MCMC compared to $\mathcal{O}(N_{\textrm{train}})$ for the mean alone, where $N_{\textrm{train}}$ is the number of training points in the emulator. If we use the exact solver, that is CLASS, each likelihood computation takes ∼ 0.65 seconds while each likelihood evaluation with the mean and uncertainty of the GP is $\sim$ 0.03 and $\sim$ 0.09 seconds respectively. The algorithm can be sped further when using MOPED compression algorithm which eliminates the $\mathcal{O}(N^{3})$ cost in the likelihood evaluation, where $N$ is the number of data points. Finally, with the emulator, it is realistic to drop the Limber approximation and train on the very slow but accurate full integrals.

Imbalance Learning for Variable Star Classification
Zafiirah Hosenie, Robert Lyon, Benjamin Stappers, Arrykrishna Mootoovaloo, Vanessa McBride
MNRAS, arXiv:2002.12386, Code

The accurate automated classification of variable stars into their respective sub-types is difficult. Machine learning based solutions often fall foul of the imbalanced learning problem, which causes poor generalisation performance in practice, especially on rare variable star sub-types. In previous work, we attempted to overcome such deficiencies via the development of a hierarchical machine learning classifier. This 'algorithm-level' approach to tackling imbalance, yielded promising results on Catalina Real-Time Survey (CRTS) data, outperforming the binary and multi-class classification schemes previously applied in this area. In this work, we attempt to further improve hierarchical classification performance by applying 'data-level' approaches to directly augment the training data so that they better describe under-represented classes. We apply and report results for three data augmentation methods in particular: Randomly Augmented Sampled Light curves from magnitude Error (𝚁𝙰𝚂𝙻𝙴), augmenting light curves with Gaussian Process modelling (𝙶𝚙𝙵𝚒𝚝) and the Synthetic Minority Over-sampling Technique (𝚂𝙼𝙾𝚃𝙴). When combining the 'algorithm-level' (i.e. the hierarchical scheme) together with the 'data-level' approach, we further improve variable star classification accuracy by 1-4%. We found that a higher classification rate is obtained when using 𝙶𝚙𝙵𝚒𝚝 in the hierarchical model. Further improvement of the metric scores requires a better standard set of correctly identified variable stars and, perhaps enhanced features are needed.

Comparing Multi-class, Binary and Hierarchical Machine Learning Classification schemes for variable stars
Zafiirah Hosenie, Robert Lyon, Benjamin Stappers, Arrykrishna Mootoovaloo
MNRAS, arXiv:1907.08189

Upcoming synoptic surveys are set to generate an unprecedented amount of data. This requires an automatic framework that can quickly and efficiently provide classification labels for several new object classification challenges. Using data describing 11 types of variable stars from the Catalina Real-Time Transient Surveys (CRTS), we illustrate how to capture the most important information from computed features and describe detailed methods of how to robustly use Information Theory for feature selection and evaluation. We apply three Machine Learning (ML) algorithms and demonstrate how to optimize these classifiers via cross-validation techniques. For the CRTS dataset, we find that the Random Forest (RF) classifier performs best in terms of balanced-accuracy and geometric means. We demonstrate substantially improved classification results by converting the multi-class problem into a binary classification task, achieving a balanced-accuracy rate of $\sim$99 per cent for the classification of $\delta$-Scuti and Anomalous Cepheids (ACEP). Additionally, we describe how classification performance can be improved via converting a 'flat-multi-class' problem into a hierarchical taxonomy. We develop a new hierarchical structure and propose a new set of classification features, enabling the accurate identification of subtypes of cepheids, RR Lyrae and eclipsing binary stars in CRTS data.

Marginal Likelihoods from Monte Carlo Markov Chains
Alan Heavens, Yabebal Fantaye, Arrykrishna Mootoovaloo, Hans Eggers, Zafiirah Hosenie, Steve Kroon, Elena Sellentin
arXiv:1704.03472, Code

In this paper, we present a method for computing the marginal likelihood, also known as the model likelihood or Bayesian evidence, from Markov Chain Monte Carlo (MCMC), or other sampled posterior distributions. In order to do this, one needs to be able to estimate the density of points in parameter space, and this can be challenging in high numbers of dimensions. Here we present a Bayesian analysis, where we obtain the posterior for the marginal likelihood, using kth nearest-neighbour distances in parameter space, using the Mahalanobis distance metric, under the assumption that the points in the chain (thinned if required) are independent. We generalise the algorithm to apply to importance-sampled chains, where each point is assigned a weight. We illustrate this with an idealised posterior of known form with an analytic marginal likelihood, and show that for chains of length $∼10^{5}$ points, the technique is effective for parameter spaces with up to ∼20 dimensions. We also argue that $k=1$ is the optimal choice, and discuss failure modes for the algorithm. In a companion paper (Heavens et al. 2017) we apply the technique to the main MCMC chains from the 2015 Planck analysis of cosmic background radiation data, to infer that quantitatively the simplest 6-parameter flat $\Lambda$CDM standard model of cosmology is preferred over all extensions considered.

No evidence for extensions to the standard cosmological model
Alan Heavens, Yabebal Fantaye, Elena Sellentin, Hans Eggers, Zafiirah Hosenie, Steve Kroon, Arrykrishna Mootoovaloo
PRL, arXiv:1704.03467, Code

We compute the Bayesian Evidence for models considered in the main analysis of Planck cosmic microwave background data. By utilising carefully-defined nearest-neighbour distances in parameter space, we reuse the Monte Carlo Markov Chains already produced for parameter inference to compute Bayes factors $B$ for many different model-dataset combinations. Standard 6-parameter flat $\Lambda$CDM model is favoured over all other models considered, with curvature being mildly favoured only when CMB lensing is not included. Many alternative models are strongly disfavoured by the data, including primordial correlated isocurvature models (ln$B$=−7.8), non-zero scalar-to-tensor ratio (ln$B$=−4.3), running of the spectral index (ln$B$=−4.7), curvature (ln$B$=−3.6), non-standard numbers of neutrinos (ln$B$=−3.1), non-standard neutrino masses (ln$B$=−3.2), non-standard lensing potential (ln$B$=−4.6), evolving dark energy (ln$B$=−3.2), sterile neutrinos (ln$B$=−6.9), and extra sterile neutrinos with a non-zero scalar-to-tensor ratio (ln$B$=−10.8). Other models are less strongly disfavoured with respect to flat $\Lambda$CDM. As with all analyses based on Bayesian Evidence, the final numbers depend on the widths of the parameter priors. We adopt the priors used in the Planck analysis, while performing a prior sensitivity analysis. Our quantitative conclusion is that extensions beyond the standard cosmological model are disfavoured by Planck data. Only when newer Hubble constant measurements are included does ΛCDM become disfavoured, and only mildly, compared with a dynamical dark energy model (ln$B\sim$+2).

Bayes Factors via Savage-Dickey Supermodels
Arrykrishna Mootoovaloo, Bruce A. Bassett, Martin Kunz
arXiv:1609.02186

We outline a new method to compute the Bayes Factor for model selection which bypasses the Bayesian Evidence. Our method combines multiple models into a single, nested, Supermodel using one or more hyperparameters. Since the models are now nested the Bayes Factors between the models can be efficiently computed using the Savage-Dickey Density Ratio (SDDR). In this way model selection becomes a problem of parameter estimation. We consider two ways of constructing the supermodel in detail: one based on combined models, and a second based on combined likelihoods. We report on these two approaches for a Gaussian linear model for which the Bayesian evidence can be calculated analytically and a toy nonlinear problem. Unlike the combined model approach, where a standard Monte Carlo Markov Chain (MCMC) struggles, the combined-likelihood approach fares much better in providing a reliable estimate of the log-Bayes Factor. This scheme potentially opens the way to computationally efficient ways to compute Bayes Factors in high dimensions that exploit the good scaling properties of MCMC, as compared to methods such as nested sampling that fail for high dimensions.