SDSS Spectroscopic Templates with PyDL

One of the original motivations for creating the PyDL package was to reproduce and, ultimately, improve the method for generating spectroscopic templates for the Sloan Digital Sky Survey (SDSS). The spectroscopic templates are used by the “1D” portion of the SDSS spectroscopic pipeline (Bolton et al. 2012 [1]; See also pydl.pydlspec2d). Historically, those were generated by:

  1. Selecting individual objects (galaxies, QSOs, stars, white dwarfs, etc.) with good SDSS spectra and reliable classifications.

  2. For a given set of objects, resample the spectra onto a common wavelength grid and redshift to rest frame.

  3. Perform an iterative PCA analysis on the flux versus wavelength matrix.

  4. Select the most significant eigenspectra as the templates. Typically the first four eigenspectra were chosen.

For SDSS-I/II, the input spectra were all high signal-to-noise, but as the BOSS survey in SDSS-III started to target more distant galaxies, there was interest in improving the robustness of the PCA fitting to handle low signal-to-noise inputs. In addition, full alternatives to PCA, such has Heteroscedastic Matrix Factorization (HMF) [2], [3] could be compared.

Here are the algorithmic steps taken in the code to implement the procedure described above.

  1. template_input() reads a configuration file that includes a list of individual spectra. There is one configuration file for each type of object (galaxy, QSO, etc.).

  2. readspec() reads each spectrum. Masked areas are identified by setting the per-spectrum inverse variance to zero as needed.

  3. A new wavelength grid is created. SDSS typically uses wavelength grids that are evenly spaced in \(\log_{10}\) wavelength.

  4. preprocess_spectra() calls combine1fiber() to perform the resampling and rest frame shift.

  5. pca_solve() or HMF are used to obtain the eigenspectra.

  6. The eigenspectra are written to a FITS file.

At least historically, there was no test data set on which to (unit) test these algorithms. The best one could do was to take the inputs and compare them, function-by-function, to the equivalent IDL® code, ensuring that the results were the same, to some numerical precision.