Alert Classification for the ALeRCE Broker System: The Anomaly Detector

Pérez-Carrasco et al. 2023, AJ

Astronomical broker systems, such as Automatic Learning for the Rapid Classification of Events (ALeRCE), are currently analyzing hundreds of thousands of alerts per night, opening up an opportunity to automatically detect anomalous unknown sources. In this work, we present the ALeRCE anomaly detector, composed of three outlier detection algorithms that aim to find transient, periodic, and stochastic anomalous sources within the Zwicky Transient Facility data stream. Our experimental framework consists of cross-validating six anomaly detection algorithms for each of these three classes using the ALeRCE light-curve features. Following the ALeRCE taxonomy, we consider four transient subclasses, five stochastic subclasses, and six periodic subclasses. We evaluate each algorithm by considering each subclass as the anomaly class. For transient and periodic sources the best performance is obtained by a modified version of the deep support vector data description neural network, while for stochastic sources the best results are obtained by calculating the reconstruction error of an autoencoder neural network. Including a visual inspection step for the 10 most promising candidates for each of the 15 ALeRCE subclasses, we detect 31 bogus candidates (i.e., those with photometry or processing issues) and seven potential astrophysical outliers that require follow-up observations for further analysis. 16 16 The code and the data needed to reproduce our results are publicly available at

Multi-Class Deep SVDD: Anomaly Detection Approach in Astronomy with Distinct Inlier Categories

Perez-Carrasco et al. 2023, ICML

With the increasing volume of astronomical data generated by modern survey telescopes, automated pipelines and machine learning techniques have become crucial for analyzing and extracting knowledge from these datasets. Anomaly detection, i.e. the task of identifying irregular or unexpected patterns in the data, is a complex challenge in astronomy. In this paper, we propose Multi-Class Deep Support Vector Data Description (MCDSVDD), an extension of the state-of-the-art anomaly detection algorithm One-Class Deep SVDD, specifically designed to handle different inlier categories with distinct data distributions. MCDSVDD uses a neural network to map the data into hyperspheres, where each hypersphere represents a specific inlier category. The distance of each sample from the centers of these hyperspheres determines the anomaly score. We evaluate the effectiveness of MCDSVDD by comparing its performance with several anomaly detection algorithms on a large dataset of astronomical light-curves obtained from the Zwicky Transient Facility. Our results demonstrate the efficacy of MCDSVDD in detecting anomalous sources while leveraging the presence of different inlier categories. The code and the data needed to reproduce our results are publicly available at this https URL.

Multi-scale stamps for real-time classification of alert streams

Reyes-Jainaga et al. 2023, AJ

In recent years, automatic classifiers of image cutouts (also called "stamps") have shown to be key for fast supernova discovery. The Vera C. Rubin Observatory will distribute about ten million alerts with their respective stamps each night, enabling the discovery of approximately one million supernovae each year. A growing source of confusion for these classifiers is the presence of satellite glints, sequences of point-like sources produced by rotating satellites or debris. The currently planned Rubin stamps will have a size smaller than the typical separation between these point sources. Thus, a larger field of view stamp could enable the automatic identification of these sources. However, the distribution of larger stamps would be limited by network bandwidth restrictions. We evaluate the impact of using image stamps of different angular sizes and resolutions for the fast classification of events (AGNs, asteroids, bogus, satellites, SNe, and variable stars), using data from the Zwicky Transient Facility. We compare four scenarios: three with the same number of pixels (small field of view with high resolution, large field of view with low resolution, and a multi-scale proposal) and a scenario with the full stamp that has a larger field of view and higher resolution. Compared to small field of view stamps, our multi-scale strategy reduces misclassifications of satellites as asteroids or supernovae, performing on par with high-resolution stamps that are 15 times heavier. We encourage Rubin and its Science Collaborations to consider the benefits of implementing multi-scale stamps as a possible update to the alert specification.

Persistent and occasional: Searching for the variable population of the ZTF/4MOST sky using ZTF Data Release 11

Sánchez-Sáez et al. 2023, A&A

Aims: We present a variability-, color-, and morphology-based classifier designed to identify multiple classes of transients and persistently variable and non-variable sources from the Zwicky Transient Facility (ZTF) Data Release 11 (DR11) light curves of extended and point sources. The main motivation to develop this model was to identify active galactic nuclei (AGN) at different redshift ranges to be observed by the 4MOST Chilean AGN/Galaxy Evolution Survey (ChANGES). That being said, it also serves as a more general time-domain astronomy study.

Methods: The model uses nine colors computed from CatWISE and Pan-STARRS1 (PS1), a morphology score from PS1, and 61 single-band variability features computed from the ZTF DR11 g and r light curves. We trained two versions of the model, one for each ZTF band, since ZTF DR11 treats the light curves observed in a particular combination of field, filter, and charge-coupled device (CCD) quadrant independently. We used a hierarchical local classifier per parent node approach-where each node is composed of a balanced random forest model. We adopted a taxonomy with 17 classes: non-variable stars, non-variable galaxies, three transients (SNIa, SN-other, and CV/Nova), five classes of stochastic variables (lowz-AGN, midz-AGN, highz-AGN, Blazar, and YSO), and seven classes of periodic variables (LPV, EA, EB/EW, DSCT, RRL, CEP, and Periodic-other).

Results: The macro-averaged precision, recall, and F1-score are 0.61, 0.75, and 0.62 for the g-band model, and 0.60, 0.74, and 0.61, for the r-band model. When grouping the four AGN classes (lowz-AGN, midz-AGN, highz-AGN, and Blazar) into one single class, its precision-recall, and F1-score are 1.00, 0.95, and 0.97, respectively, for both the g and r bands. This demonstrates the good performance of the model in classifying AGN candidates. We applied the model to all the sources in the ZTF/4MOST overlapping sky (−28 ≤ Dec ≤ 8.5), avoiding ZTF fields that cover the Galactic bulge (|gal_b| ≤ 9 and gal_l ≤ 50). This area includes 86 576 577 light curves in the g band and 140 409 824 in the r band with 20 or more observations and with an average magnitude in the corresponding band lower than 20.5. Only 0.73% of the g-band light curves and 2.62% of the r-band light curves were classified as stochastic, periodic, or transient with high probability (Pinit ≥ 0.9). Even though the metrics obtained for the two models are similar, we find that, in general, more reliable results are obtained when using the g-band model. With it, we identified 384 242 AGN candidates (including low-, mid-, and high-redshift AGN and Blazars), 287 156 of which have Pinit ≥ 0.9.

Multiwavelength monitoring of the nucleus in PBC J2333.9-2343: the giant radio galaxy with a blazar-like core

Hernández-García et al. 2023, MNRAS

PBC J2333.9-2343 is a giant radio galaxy at z = 0.047 with a bright central core associated to a blazar nucleus. If the nuclear blazar jet is a new phase of the jet activity, then the small orientation angle suggest a dramatic change of the jet direction. We present observations obtained between September 2018 and January 2019 (cadence larger than three days) with Effeslberg, SMARTS-1.3m, ZTF, ATLAS, Swift, and Fermi-LAT, and between April-July 2019 (daily cadence) with SMARTS-1.3m and ATLAS. Large (>2 ×) flux increases are observed on timescales shorter than a month, which are interpreted as flaring events. The cross correlation between the SMARTS-1.3m monitoring in the NIR and optical shows that these data do not show significant time lag within the measured errors. A comparison of the optical variability properties between non-blazars and blazars AGN shows that PBC J2333.9-2343 has properties more comparable to the latter. The SED of the nucleus shows two peaks, that were fitted with a one zone leptonic model. Our data and modelling shows that the high energy peak is dominated by External Compton from the dusty torus with mild contribution from Inverse Compton from the jet. The derived jet angle of 3 degrees is also typical of a blazar. Therefore, we confirm the presence of a blazar-like core in the center of this giant radio galaxy, likely a Flat Spectrum Radio Quasar with peculiar properties.

Extending time-series models for irregular observational gaps with a moving average structure for astronomical sequences

Ojeda et al. 2023, RAS Techinques and Instruments

In this study, we introduce a novel moving-average model for analyzing stationary time-series observed irregularly in time. The process is strictly stationary and ergodic under normality and weakly stationary when normality is not assumed. Maximum likelihood (ML) estimation can be efficiently carried out through a Kalman algorithm obtained from the state-space representation of the model. The Kalman algorithm has order O(n) (where n is the number of observations in the sequence), from which it is possible to efficiently generate parameter estimators, linear predictors, and their mean-squared errors. Two procedures were developed for assessing parameter estimation errors: one based on the Hessian of the likelihood function and another one based on the bootstrap method. The behaviour of these estimators was assessed through Monte Carlo experiments. Both methods give accurate estimation performance, even with relatively small number of observations. Moreover, it is shown that for non-Gaussian data, specifically for the Student's t and generalized error distributions, the parameters of the model can be estimated precisely by ML. The proposed model is compared to the continuous autoregressive moving average (MA) models, showing better performance when the MA parameter is negative or close to one. We illustrate the implementation of the proposed model with light curves of variable stars from the OGLE and HIPPARCOS surveys and stochastic objects from Zwicky Transient Facility. The results suggest that the irregular MA model is a suitable alternative for modelling astronomical light curves, particularly when they have negative autocorrelation.

Improving the selection of changing-look AGNs through multiwavelength photometric variability

López-Navas et al. 2023, MNRAS

We present second epoch optical spectra for 30 changing-look (CL) candidates found by searching for Type-1 optical variability in a sample of active galactic nuclei (AGNs) spectroscopically classified as Type 2. We use a random-forest-based light-curve classifier and spectroscopic follow-up, confirming 50 per cent of candidates as turning-on CLs. In order to improve this selection method and to better understand the nature of the not-confirmed CL candidates, we perform a multiwavelength variability analysis including optical, mid-infrared (MIR), and X-ray data, and compare the results from the confirmed and not-confirmed CLs identified in this work. We find that most of the not-confirmed CLs are consistent with weak Type 1s dominated by host-galaxy contributions, showing weaker optical and MIR variability. On the contrary, the confirmed CLs present stronger optical fluctuations and experience a long (from five to ten years) increase in their MIR fluxes and the colour W1-W2 over time. In the 0.2-2.3 keV band, at least four out of 11 CLs with available SRG/eROSITA detections have increased their flux in comparison with archival upper limits. These common features allow us to select the most promising CLs from our list of candidates, leading to nine sources with similar multiwavelength photometric properties to our CL sample. The use of machine learning algorithms with optical and MIR light curves will be very useful to identify CLs in future large-scale surveys.

The Type 1 and Type 2 AGN dichotomy according to their ZTF optical variability

López-Navas et al. 2023, MNRAS

The scarce optical variability studies in spectrally classified Type 2 active galactic nuclei (AGNs) have led to the discovery of anomalous objects that are incompatible with the simplest unified models (UMs). This paper focuses on the exploration of different variability features that allow to distinguish between obscured, Type 2 AGNs and the variable, unobscured Type 1s. We analyse systematically the Zwicky Transient Facility, 2.5-yr-long light curves of ~15 000 AGNs from the Sloan Digital Sky Survey Data Release 16, which are generally considered Type 2s due to the absence of strong broad emission lines (BELs). Consistent with the expectations from the UM, the variability features are distributed differently for distinct populations, with spectrally classified weak Type 1s showing one order of magnitude larger variances than the Type 2s. We find that the parameters given by the damped random walk model lead to broader H α equivalent width for objects with τg > 16 d and long-term structure function SF∞, g > 0.07 mag. By limiting the variability features, we find that ~11 per cent of Type 2 sources show evidence for optical variations. A detailed spectral analysis of the most variable sources (~1 per cent of the Type 2 sample) leads to the discovery of misclassified Type 1s with weak BELs and changing-state candidates. This work presents one of the largest systematic investigations of Type 2 AGN optical variability to date, in preparation for future large photometric surveys.

Deep Attention-Based Supernovae Classification of Multi-Band Light-Curves

Pimentel, Estévez, Förster 2023, AJ

In astronomical surveys, such as the Zwicky Transient Facility (ZTF), supernovae (SNe) are relatively uncommon objects compared to other classes of variable events. Along with this scarcity, the processing of multi-band light-curves is a challenging task due to the highly irregular cadence, long time gaps, missing-values, low number of observations, etc. These issues are particularly detrimental for the analysis of transient events with SN-like light-curves. In this work, we offer three main contributions. First, based on temporal modulation and attention mechanisms, we propose a Deep Attention model called TimeModAttn to classify multi-band light-curves of different SN types, avoiding photometric or hand-crafted feature computations, missing-values assumptions, and explicit imputation and interpolation methods. Second, we propose a model for the synthetic generation of SN multi-band light-curves based on the Supernova Parametric Model (SPM). This allows us to increase the number of samples and the diversity of the cadence. The TimeModAttn model is first pre-trained using synthetic light-curves in a semi-supervised learning scheme. Then, a fine-tuning process is performed for domain adaptation. The proposed TimeModAttn model outperformed a Random Forest classifier, increasing the balanced-F1F1score from ≈.525≈.525 to ≈.596≈.596. The TimeModAttn model also outperformed other Deep Learning models, based on Recurrent Neural Networks (RNNs), in two scenarios: late-classification and early-classification. Finally, we conduct interpretability experiments. High attention scores are obtained for observations earlier than and close to the SN brightness peaks, which are supported by an early and highly expressive learned temporal modulation.

DELIGHT: Deep Learning Identification of Galaxy Hosts of Transients using Multiresolution Images

Förster et al. 2022, AJ

We present DELIGHT, or Deep Learning Identification of Galaxy Hosts of Transients, a new algorithm designed to automatically and in real time identify the host galaxies of extragalactic transients. The proposed algorithm receives as input compact, multiresolution images centered at the position of a transient candidate and outputs two-dimensional offset vectors that connect the transient with the center of its predicted host. The multiresolution input consists of a set of images with the same number of pixels, but with progressively larger pixel sizes and fields of view. A sample of 16,791 galaxies visually identified by the Automatic Learning for the Rapid Classification of Events broker team was used to train a convolutional neural network regression model. We show that this method is able to correctly identify both relatively large (10″ < r < 60″) and small (r ≤ 10″) apparent size host galaxies using much less information (32 kB) than with a large, single-resolution image (920 kB). The proposed method has fewer catastrophic errors in recovering the position and is more complete and has less contamination (<0.86%) recovering the crossmatched redshift than other state-of-the-art methods. The more efficient representation provided by multiresolution input images could allow for the identification of transient host galaxies in real time, if adopted in alert streams from new generation of large -etendue telescopes such as the Vera C. Rubin Observatory.

Improving Astronomical Time-series Classification via Data Augmentation with Generative Adversarial Networks

García-Jara, Protopapas, Estévez, ApJ, 2022

Due to the latest advances in technology, telescopes with significant sky coverage will produce millions of astronomical alerts per night that must be classified both rapidly and automatically. Currently, classification consists of supervised machine-learning algorithms whose performance is limited by the number of existing annotations of astronomical objects and their highly imbalanced class distributions. In this work, we propose a data augmentation methodology based on generative adversarial networks (GANs) to generate a variety of synthetic light curves from variable stars. Our novel contributions, consisting of a resampling technique and an evaluation metric, can assess the quality of generative models in unbalanced data sets and identify GAN-overfitting cases that the Fréchet inception distance does not reveal. We applied our proposed model to two data sets taken from the Catalina and Zwicky Transient Facility surveys. The classification accuracy of variable stars is improved significantly when training with synthetic data and testing with real data with respect to the case of using only real data.

Confirming new changing-look AGNs discovered through optical variability using a random forest-based light-curve classifier

López-Navas et al. 2022, MNRAS

Determining the frequency and duration of changing-look (CL) active galactic nuclei (AGNs) phenomena, where the optical broad emission lines appear or disappear, is crucial to understand the evolution of the accretion flow around supermassive black holes. We present a strategy to select new CL candidates starting from a spectroscopic type 2 AGN sample and searching for current type 1 photometric variability. We use the publicly available Zwicky Transient Facility alert stream and the Automatic Learning for the Rapid Classification of Events light-curve classifier to produce a list of CL candidates with a highly automated algorithm, resulting in 60 candidates. Visual inspection reduced the sample to 30. We performed new spectroscopic observations of six candidates of our clean sample, without further refinement, finding the appearance of clear broad Balmer lines in four of them and tentative evidence of type changes in the remaining two, which suggests a promising success rate of ≥66 per cent for this CL selection method.

Searching for changing-state AGNs in massive datasets -- I: applying deep learning and anomaly detection techniques to find AGNs with anomalous variability behaviours

Sánchez-Sáez et al. 2021, AJ

The classic classification scheme for Active Galactic Nuclei (AGNs) was recently challenged by the discovery of the so-called changing-state (changing-look) AGNs (CSAGNs). The physical mechanism behind this phenomenon is still a matter of open debate and the samples are too small and of serendipitous nature to provide robust answers. In order to tackle this problem, we need to design methods that are able to detect AGN right in the act of changing-state. Here we present an anomaly detection (AD) technique designed to identify AGN light curves with anomalous behaviors in massive datasets. The main aim of this technique is to identify CSAGN at different stages of the transition, but it can also be used for more general purposes, such as cleaning massive datasets for AGN variability analyses. We used light curves from the Zwicky Transient Facility data release 5 (ZTF DR5), containing a sample of 230,451 AGNs of different classes. The ZTF DR5 light curves were modeled with a Variational Recurrent Autoencoder (VRAE) architecture, that allowed us to obtain a set of attributes from the VRAE latent space that describes the general behaviour of our sample. These attributes were then used as features for an Isolation Forest (IF) algorithm, that is an anomaly detector for a "one class" kind of problem. We used the VRAE reconstruction errors and the IF anomaly score to select a sample of 8,809 anomalies. These anomalies are dominated by bogus candidates, but we were able to identify 75 promising CSAGN candidates.

Alert Classification for the ALeRCE Broker System: The Real-time Stamp Classifier

Carrasco-Davis et al. 2021, AJ

We present a real-time stamp classifier of astronomical events for the ALeRCE (Automatic Learning for the Rapid Classification of Events) broker. The classifier is based on a convolutional neural network with an architecture designed to exploit rotational invariance of the images, and trained on alerts ingested from the Zwicky Transient Facility (ZTF). Using only the science, reference and difference images of the first detection as inputs, along with the metadata of the alert as features, the classifier is able to correctly classify alerts from active galactic nuclei, supernovae (SNe), variable stars, asteroids and bogus classes, with high accuracy (∼94\%) in a balanced test set. In order to find and analyze SN candidates selected by our classifier from the ZTF alert stream, we designed and deployed a visualization tool called SN Hunter, where relevant information about each possible SN is displayed for the experts to choose among candidates to report to the Transient Name Server database. We have reported 3060 SN candidates to date (9.2 candidates per day on average), of which 394 have been confirmed spectroscopically. Our ability to report objects using only a single detection means that 92\% of the reported SNe occurred within one day after the first detection. ALeRCE has only reported candidates not otherwise detected or selected by other groups, therefore adding new early transients to the bulk of objects available for early follow-up. Our work represents an important milestone toward rapid alert classifications with the next generation of large etendue telescopes, such as the Vera C. Rubin Observatory's Legacy Survey of Space and Time.

The Automatic Learning for the Rapid Classification of Events (ALeRCE) Alert Broker

Förster et al. 2021, AJ

We introduce the Automatic Learning for the Rapid Classification of Events (ALeRCE) broker, an astronomical alert broker designed to provide a rapid and self--consistent classification of large etendue telescope alert streams, such as that provided by the Zwicky Transient Facility (ZTF) and, in the future, the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST). ALeRCE is a Chilean--led broker run by an interdisciplinary team of astronomers and engineers, working to become intermediaries between survey and follow--up facilities. ALeRCE uses a pipeline which includes the real--time ingestion, aggregation, cross--matching, machine learning (ML) classification, and visualization of the ZTF alert stream. We use two classifiers: a stamp--based classifier, designed for rapid classification, and a light--curve--based classifier, which uses the multi--band flux evolution to achieve a more refined classification. We describe in detail our pipeline, data products, tools and services, which are made public for the community (see Since we began operating our real--time ML classification of the ZTF alert stream in early 2019, we have grown a large community of active users around the globe. We describe our results to date, including the real--time processing of 9.7×107 alerts, the stamp classification of 1.9×107 objects, the light curve classification of 8.5×105 objects, the report of 3088 supernova candidates, and different experiments using LSST-like alert streams. Finally, we discuss the challenges ahead to go from a single-stream of alerts such as ZTF to a multi--stream ecosystem dominated by LSST.

Alert Classification for the ALeRCE Broker System: The Light Curve Classifier

Sánchez-Sáez et al. 2021, AJ

We present the first version of the ALeRCE (Automatic Learning for the Rapid Classification of Events) broker light curve classifier. ALeRCE is currently processing the Zwicky Transient Facility (ZTF) alert stream, in preparation for the Vera C. Rubin Observatory. The ALeRCE light curve classifier uses variability features computed from the ZTF alert stream, and colors obtained from AllWISE and ZTF photometry. We apply a Balanced Hierarchical Random Forest algorithm with a two-level scheme, where the top level classifies each source as periodic, stochastic, or transient, and the bottom level further resolve each hierarchical class, yielding a total of 15 classes. This classifier corresponds to the first attempt to classify multiple classes of stochastic variables (including nucleus- and host-dominated active galactic nuclei, blazars, young stellar objects, and cataclysmic variables) in addition to different classes of periodic and transient sources, using real data. We created a labeled set using various public catalogs (such as the Catalina Surveys and {\em Gaia} DR2 variable stars catalogs, and the Million Quasars catalog), and we classify all objects with ≥6 g-band or ≥6 r-band detections in ZTF (868,371 sources as of 2020/06/09), providing updated classifications for sources with new alerts every day. For the top level we obtain macro-averaged precision and recall scores of 0.96 and 0.99, respectively, and for the bottom level we obtain macro-averaged precision and recall scores of 0.57 and 0.76, respectively. Updated classifications from the light curve classifier can be found at