ALeRCE pipeline


ALeRCE’s pipeline is currently ingesting the ZTF alert stream, but we are planning to ingest the ATLAS and LSST streams in the future. Our pipeline consists of different sequential and parallel steps applied to the alerts, starting from the ZTF stream to an output stream, as shown in this figure:

The different steps are the following:

Pipeline.png

We use our Alert Processing Framework library to add steps to this pipeline.


S3 Upload

All alerts are stored in AWS Simple Storage Servive (S3) for later retrieval. They can be accessed via our alert Avro API (https://alerceapi.readthedocs.io/en/latest/avro.html).


Cross-match

ZTF alerts already contain some cross-match information: namely the three closest PanSTARRS objects, their magnitudes and a star/galaxy score; the nearest GAIA object and associated photometry; and if the alert is close to a known asteroid, its name and magnitude. Apart from this information, we use the CDS API (with help from our friends in the Fink broker) to query for cross-matches in other catalogs. In particular, we query for the nearest object and its photometry in the AllWISE catalog.


Stamp Classification

All alerts associated to new objects undergo a stamp based classification (Carrasco-Davis et al. 2020), which provides a quick classification into a 5-class taxonomy (supernova, active galactic nuclei, variable star, asteroid, and bogus). The stamp classifier consists of a rotationally invariant convolutional neural network which uses information from the image stamps and the alert’s metadata. The architecture of the classifier is shown below:

architecture_stamp.png

where a cyclic convolutional neural network is applied to the image cutouts, which are then sent to fully connected layers and then receive other metadata as input. Its associated confusion matrix is the following:

confusion_stamp.png


Pre-processing

Alert magnitudes are produced after measuring the flux in an image difference, which is produced from subtracting a given observation from a reference image. This means that if the object was present in the reference image, the object’s true magnitude can be corrected. The formulas for the correction and associated error are the following:

mcorr.png
Screenshot from 2020-06-25 10-11-08.png

Where mcorr and δmcorr are the corrected magnitude and error, mref and δmref are the reference magnitude and error, mdiff and δmdiff are the difference magnitude and error, and sgn is the sign of the alert (isdiffpos). Note that these formulas can diverge if the reference and difference magnitude are the same and sgn is -1, but this should never happen as no alerts should be triggered in that case. Also, note that the negative term between squared brackets in the numerator of the error formula comes from taking into consideration the covariance between the reference and difference fluxes. This term should be removed if there is an extended source in the reference image. We provide the corrected and uncorrected photometry, as well as the errors of the corrected photometry with and without this term.

It is important to note that only if the reference object’s flux is known these formulas can be applied, which is not always the case. Moreover, if the reference image changes, it is possible that the object changes from being possible to correct to not being possible to correct, and vice versa.

We approach this problem by always providing both the uncorrected and corrected photometries, and flagging data where we detect inconsistent corrections through time, e.g., if the object changes from not being possible to correct to being possible to correct. We also provide a flag which tells whether we believe the object is unresolved or not, for users to decide whether to use the corrected photometry or not (see discussion on the database).


Light Curve Features

After the light curve correction step, we use the corrected or uncorrected light curve to compute basic and advanced features, depending on whether we believe the object was present and unresolved in the reference image. The basic features are simple statistics and are computed for all objects in each filter. The advanced features are a significantly extended version of the Turbo FATS library, including several new features described in Sánchez-Sáez et al. 2020. Some of the features which are particularly relevant for later classification are optical and IR colors, periodogram related features, parameters of an irregular autoregressive model, or the parameters of an analytic supernova model.


Light Curve Classification

After the feature computation a light curve based classifier is applied to objects with at least 6 detections in the g band or at least 6 detections in the r band. This classifier is a balanced hierarchical random forest classifier that uses four classification models and a total of 15 classes (Sánchez-Sáez et al. 2020). The first “hierarchical classifier” has three classes: periodic, stochastic or transient. Then, three more classifiers are applied, one specialized for each of the previous classes. The final class probabilities are obtained after multiplying the results of the hierarchical classifier and each of the three other classifiers. The 15 classes are: Transient: SNe Ia (SNIa), SNe Ib/c (SNIbc), SNe II (SNII), and Super Luminous SNe (SLSNe); Stochastic: Active Galactic Nuclei (AGN), Quasi Stellar Object (QSO), Blazar, Cataclysmic Variable/Novae (CV/Novae), and Young Stellar Object (YSO); and Periodic: Delta Scuti (DSCT), RR Lyrae (RRL), Cepheid (Ceph), Long Period Variable (LPV), Eclipsing Binary (EB), and other periodic objects (Periodic-Other). The taxonomy of this classifier is shown in the following figure:

taxonomy.png

The classifier’s confusion matrix is the following:

conf_matrix_multiclass_original_hierarchical_classes.png


Comparison between the Stamp and Light Curve classifiers

comparison.png

Fraction of objects predicted to belong to a given Stamp Classifier class (rows), normalized among the objects predicted to belong to a given Light Curve Classifier class (columns). We considered a sample of 186,794 unlabeled objects which were classified with the Stamp Classifier (Carrasco–Davis et al. 2020) and the Light Curve Classifier (Sánchez–Sáez et al. 2020).


Outlier detection

Finally, after the previous classification we are testing an outlier detection step, where we attempt to detect objects which are not represented in the previous taxonomy or which are from unknown classes. We also try to detect objects which belong to a known class, but which present an anomalous behavior.


Processing times

processing.png

Cumulative distribution function (CDF) of ZTF streaming times compared to the CDF of ALeRCE pipeline processing times. The ZTF streaming times correspond to the difference between the reported observation time and the alert ingestion time, obtained empirically in a typical night of operations. The ALeRCE pipeline step elapsed times stands for the time needed for an alert to move from ingestion to the completion of a given step, including CPU and wait times. In this figure we consider an incoming alert rate of about 25 per second (c.f., we expect about 5 and 350 alerts per second for ZTF and LSST on average, respectively). The embarrassingly parallel nature of the processing steps suggests that our infrastructure should scale linearly with the number of incoming alerts to manage the LSST alert stream.


More information

You can find more information about our project in the following publications:

You can also learn about the project in the following presentation: