Publications


The Type 1 and Type 2 AGN dichotomy according to their ZTF optical variability

López-Navas et al. 2022, MNRAS

The scarce optical variability studies in spectrally classified Type 2 active galactic nuclei (AGNs) have led to the discovery of anomalous objects that are incompatible with the simplest unified models (UM). This paper focuses on the exploration of different variability features that allows to separate between obscured, Type 2 AGNs, and the variable, unobscured Type 1s. We analyse systematically the Zwicky Transient Facility, 2.5 years long light curves of ~ 15000 AGNs from the Sloan Digital Sky Survey Data Release 16, which are generally considered Type 2s due to the absence of strong broad emission lines (BELs). Consistently with the expectations from the UM, the variability features are distributed differently for distinct populations, with spectrally classified weak Type 1s showing 1 order of magnitude larger variances than the Type 2s. We find that the parameters given by the damped random walk model leads to broader H{\alpha} equivalent width for objects with {\tau}_g > 16 d and long term structure function SF{\infty},g> 0.07 mag. By limiting the variability features, we find that ~ 11 per cent of Type 2 sources show evidence for optical variations. A detailed spectral analysis of the most variable sources (~1 per cent of the Type 2 sample) leads to the discovery of misclassified Type 1s with weak BELs and changing-state candidates. This work presents one of the largest systematic investigations of Type 2 AGN optical variability to date, in preparation for future large photometric surveys.


Deep Attention-Based Supernovae Classification of Multi-Band Light-Curves

Pimentel, Estévez, Förster 2022, AJ

In astronomical surveys, such as the Zwicky Transient Facility (ZTF), supernovae (SNe) are relatively uncommon objects compared to other classes of variable events. Along with this scarcity, the processing of multi-band light-curves is a challenging task due to the highly irregular cadence, long time gaps, missing-values, low number of observations, etc. These issues are particularly detrimental for the analysis of transient events with SN-like light-curves. In this work, we offer three main contributions. First, based on temporal modulation and attention mechanisms, we propose a Deep Attention model called TimeModAttn to classify multi-band light-curves of different SN types, avoiding photometric or hand-crafted feature computations, missing-values assumptions, and explicit imputation and interpolation methods. Second, we propose a model for the synthetic generation of SN multi-band light-curves based on the Supernova Parametric Model (SPM). This allows us to increase the number of samples and the diversity of the cadence. The TimeModAttn model is first pre-trained using synthetic light-curves in a semi-supervised learning scheme. Then, a fine-tuning process is performed for domain adaptation. The proposed TimeModAttn model outperformed a Random Forest classifier, increasing the balanced-F1F1score from ≈.525≈.525 to ≈.596≈.596. The TimeModAttn model also outperformed other Deep Learning models, based on Recurrent Neural Networks (RNNs), in two scenarios: late-classification and early-classification. Finally, we conduct interpretability experiments. High attention scores are obtained for observations earlier than and close to the SN brightness peaks, which are supported by an early and highly expressive learned temporal modulation.


DELIGHT: Deep Learning Identification of Galaxy Hosts of Transients using Multiresolution Images

Förster et al. 2022, AJ

We present DELIGHT, or Deep Learning Identification of Galaxy Hosts of Transients, a new algorithm designed to automatically and in real time identify the host galaxies of extragalactic transients. The proposed algorithm receives as input compact, multiresolution images centered at the position of a transient candidate and outputs two-dimensional offset vectors that connect the transient with the center of its predicted host. The multiresolution input consists of a set of images with the same number of pixels, but with progressively larger pixel sizes and fields of view. A sample of 16,791 galaxies visually identified by the Automatic Learning for the Rapid Classification of Events broker team was used to train a convolutional neural network regression model. We show that this method is able to correctly identify both relatively large (10″ < r < 60″) and small (r ≤ 10″) apparent size host galaxies using much less information (32 kB) than with a large, single-resolution image (920 kB). The proposed method has fewer catastrophic errors in recovering the position and is more complete and has less contamination (<0.86%) recovering the crossmatched redshift than other state-of-the-art methods. The more efficient representation provided by multiresolution input images could allow for the identification of transient host galaxies in real time, if adopted in alert streams from new generation of large -etendue telescopes such as the Vera C. Rubin Observatory.


Improving Astronomical Time-series Classification via Data Augmentation with Generative Adversarial Networks

García-Jara, Protopapas, Estévez, ApJ, 2022

Due to the latest advances in technology, telescopes with significant sky coverage will produce millions of astronomical alerts per night that must be classified both rapidly and automatically. Currently, classification consists of supervised machine-learning algorithms whose performance is limited by the number of existing annotations of astronomical objects and their highly imbalanced class distributions. In this work, we propose a data augmentation methodology based on generative adversarial networks (GANs) to generate a variety of synthetic light curves from variable stars. Our novel contributions, consisting of a resampling technique and an evaluation metric, can assess the quality of generative models in unbalanced data sets and identify GAN-overfitting cases that the Fréchet inception distance does not reveal. We applied our proposed model to two data sets taken from the Catalina and Zwicky Transient Facility surveys. The classification accuracy of variable stars is improved significantly when training with synthetic data and testing with real data with respect to the case of using only real data.


Confirming new changing-look AGNs discovered through optical variability using a random forest-based light-curve classifier

López-Navas et al. 2022, MNRAS

Determining the frequency and duration of changing-look (CL) active galactic nuclei (AGNs) phenomena, where the optical broad emission lines appear or disappear, is crucial to understand the evolution of the accretion flow around supermassive black holes. We present a strategy to select new CL candidates starting from a spectroscopic type 2 AGN sample and searching for current type 1 photometric variability. We use the publicly available Zwicky Transient Facility alert stream and the Automatic Learning for the Rapid Classification of Events light-curve classifier to produce a list of CL candidates with a highly automated algorithm, resulting in 60 candidates. Visual inspection reduced the sample to 30. We performed new spectroscopic observations of six candidates of our clean sample, without further refinement, finding the appearance of clear broad Balmer lines in four of them and tentative evidence of type changes in the remaining two, which suggests a promising success rate of ≥66 per cent for this CL selection method.


Searching for Changing-state AGNs in Massive Data Sets. I. Applying Deep Learning and Anomaly-detection Techniques to Find AGNs with Anomalous Variability Behaviors

Sánchez-Sáez et al. 2021, AJ

The classic classification scheme for active galactic nuclei (AGNs) was recently challenged by the discovery of the so-called changing-state (changing-look) AGNs. The physical mechanism behind this phenomenon is still a matter of open debate and the samples are too small and of serendipitous nature to provide robust answers. In order to tackle this problem, we need to design methods that are able to detect AGNs right in the act of changing state. Here we present an anomaly-detection technique designed to identify AGN light curves with anomalous behaviors in massive data sets. The main aim of this technique is to identify CSAGN at different stages of the transition, but it can also be used for more general purposes, such as cleaning massive data sets for AGN variability analyses. We used light curves from the Zwicky Transient Facility data release 5 (ZTF DR5), containing a sample of 230,451 AGNs of different classes. The ZTF DR5 light curves were modeled with a Variational Recurrent Autoencoder (VRAE) architecture, that allowed us to obtain a set of attributes from the VRAE latent space that describes the general behavior of our sample. These attributes were then used as features for an Isolation Forest (IF) algorithm that is an anomaly detector for a "one class" kind of problem. We used the VRAE reconstruction errors and the IF anomaly score to select a sample of 8809 anomalies. These anomalies are dominated by bogus candidates, but we were able to identify 75 promising CSAGN candidates.


Searching for changing-state AGNs in massive datasets -- I: applying deep learning and anomaly detection techniques to find AGNs with anomalous variability behaviours

Sánchez-Sáez et al. 2021, AJ

The classic classification scheme for Active Galactic Nuclei (AGNs) was recently challenged by the discovery of the so-called changing-state (changing-look) AGNs (CSAGNs). The physical mechanism behind this phenomenon is still a matter of open debate and the samples are too small and of serendipitous nature to provide robust answers. In order to tackle this problem, we need to design methods that are able to detect AGN right in the act of changing-state. Here we present an anomaly detection (AD) technique designed to identify AGN light curves with anomalous behaviors in massive datasets. The main aim of this technique is to identify CSAGN at different stages of the transition, but it can also be used for more general purposes, such as cleaning massive datasets for AGN variability analyses. We used light curves from the Zwicky Transient Facility data release 5 (ZTF DR5), containing a sample of 230,451 AGNs of different classes. The ZTF DR5 light curves were modeled with a Variational Recurrent Autoencoder (VRAE) architecture, that allowed us to obtain a set of attributes from the VRAE latent space that describes the general behaviour of our sample. These attributes were then used as features for an Isolation Forest (IF) algorithm, that is an anomaly detector for a "one class" kind of problem. We used the VRAE reconstruction errors and the IF anomaly score to select a sample of 8,809 anomalies. These anomalies are dominated by bogus candidates, but we were able to identify 75 promising CSAGN candidates.


Alert Classification for the ALeRCE Broker System: The Real-time Stamp Classifier

Carrasco-Davis et al. 2021, AJ

We present a real-time stamp classifier of astronomical events for the ALeRCE (Automatic Learning for the Rapid Classification of Events) broker. The classifier is based on a convolutional neural network with an architecture designed to exploit rotational invariance of the images, and trained on alerts ingested from the Zwicky Transient Facility (ZTF). Using only the science, reference and difference images of the first detection as inputs, along with the metadata of the alert as features, the classifier is able to correctly classify alerts from active galactic nuclei, supernovae (SNe), variable stars, asteroids and bogus classes, with high accuracy (∼94\%) in a balanced test set. In order to find and analyze SN candidates selected by our classifier from the ZTF alert stream, we designed and deployed a visualization tool called SN Hunter, where relevant information about each possible SN is displayed for the experts to choose among candidates to report to the Transient Name Server database. We have reported 3060 SN candidates to date (9.2 candidates per day on average), of which 394 have been confirmed spectroscopically. Our ability to report objects using only a single detection means that 92\% of the reported SNe occurred within one day after the first detection. ALeRCE has only reported candidates not otherwise detected or selected by other groups, therefore adding new early transients to the bulk of objects available for early follow-up. Our work represents an important milestone toward rapid alert classifications with the next generation of large etendue telescopes, such as the Vera C. Rubin Observatory's Legacy Survey of Space and Time.


The Automatic Learning for the Rapid Classification of Events (ALeRCE) Alert Broker

Förster et al. 2021, AJ

We introduce the Automatic Learning for the Rapid Classification of Events (ALeRCE) broker, an astronomical alert broker designed to provide a rapid and self--consistent classification of large etendue telescope alert streams, such as that provided by the Zwicky Transient Facility (ZTF) and, in the future, the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST). ALeRCE is a Chilean--led broker run by an interdisciplinary team of astronomers and engineers, working to become intermediaries between survey and follow--up facilities. ALeRCE uses a pipeline which includes the real--time ingestion, aggregation, cross--matching, machine learning (ML) classification, and visualization of the ZTF alert stream. We use two classifiers: a stamp--based classifier, designed for rapid classification, and a light--curve--based classifier, which uses the multi--band flux evolution to achieve a more refined classification. We describe in detail our pipeline, data products, tools and services, which are made public for the community (see https://alerce.science). Since we began operating our real--time ML classification of the ZTF alert stream in early 2019, we have grown a large community of active users around the globe. We describe our results to date, including the real--time processing of 9.7×107 alerts, the stamp classification of 1.9×107 objects, the light curve classification of 8.5×105 objects, the report of 3088 supernova candidates, and different experiments using LSST-like alert streams. Finally, we discuss the challenges ahead to go from a single-stream of alerts such as ZTF to a multi--stream ecosystem dominated by LSST.


Alert Classification for the ALeRCE Broker System: The Light Curve Classifier

Sánchez-Sáez et al. 2021, AJ

We present the first version of the ALeRCE (Automatic Learning for the Rapid Classification of Events) broker light curve classifier. ALeRCE is currently processing the Zwicky Transient Facility (ZTF) alert stream, in preparation for the Vera C. Rubin Observatory. The ALeRCE light curve classifier uses variability features computed from the ZTF alert stream, and colors obtained from AllWISE and ZTF photometry. We apply a Balanced Hierarchical Random Forest algorithm with a two-level scheme, where the top level classifies each source as periodic, stochastic, or transient, and the bottom level further resolve each hierarchical class, yielding a total of 15 classes. This classifier corresponds to the first attempt to classify multiple classes of stochastic variables (including nucleus- and host-dominated active galactic nuclei, blazars, young stellar objects, and cataclysmic variables) in addition to different classes of periodic and transient sources, using real data. We created a labeled set using various public catalogs (such as the Catalina Surveys and {\em Gaia} DR2 variable stars catalogs, and the Million Quasars catalog), and we classify all objects with ≥6 g-band or ≥6 r-band detections in ZTF (868,371 sources as of 2020/06/09), providing updated classifications for sources with new alerts every day. For the top level we obtain macro-averaged precision and recall scores of 0.96 and 0.99, respectively, and for the bottom level we obtain macro-averaged precision and recall scores of 0.57 and 0.76, respectively. Updated classifications from the light curve classifier can be found at http://alerce.online