fitprofs: Fit gaussian profiles

Package: onedspec

Usage

fitprofs input

Parameters

input: List of input images to be fit. The images may be one dimensional spectra (one or more spectra per image) or long slit spectra. Other types of nonspectral images may also be used and for two dimensional images the fitting direction will be determined from either the keyword DISPAXIS in the image header or the dispaxis parameter.

lines = "": List of lines, columns, or apertures to be selected from the input image format. The default empty list, "", selects all vectors in the images. The syntax is a list of comma separated numbers or ranges, where a range is a pair of hyphen separated numbers.

bands = "": List of bands for 3D images. The empty list, "", selects all bands.

dispaxis = ")_.dispaxis", nsum = ")_.nsum": Parameters for defining vectors in 2D and 3D images. The dispersion axis is 1 for line vectors, 2 for column vectors, and 3 for band vectors. A DISPAXIS parameter in the image header has precedence over the dispaxis parameter. The default values defer to the package parameters of the same name.

The following are the fitting parameters.

region = "": Region of the input vectors to be fit specified as a pair of space separated numbers. The coordinates are defined in terms of the linear image header coordinate parameters. For dispersion corrected spectra this is usually wavelength in Angstroms and for other data it is usually pixels. A fitting region must be specified.

positions = "": File of initial or fixed profile positions and (optional) peaks, profile types, and widths. The format consists of lines with one or more whitespace separated fields. The fields are the position, peak relative to the continuum with negative values being absorption, profile type of gaussian, lorentzian, or voigt, and the gaussian and/or lorentzian full width at half maximum. Trailing fields may be missing and fields to be set from default parameters or the image data (the peak value) may be given as INDEF. Comments and any additional columns are ignored. The positions and widths are specified in the coordinate units of the image, usually wavelength for dispersion corrected spectra and pixels otherwise.

background = "": Background values defining the linear background. If not specified the single pixel values nearest the fitting region endpoints are used. Otherwise two whitespace separated values are expected. If a value is a number then that is the background at the lower or upper end of the fitting region (ordered in pixel space not wavelength). The special values "avg(w1,w2,z)" or "med(w1,w2,z)" (note that there can be no whitespace) may be specified, where w1 and w2 are dispersion values, and z is a multiplier. This will take the average or median of pixels within the specified range and multiply the result by the third argument. The dispersion point used for that value in computing the linear background is the average of the dispersion coordinates of the pixels used.

profile = "gaussian" (gaussian|lorentzian|voigt): Default profile type to be fit when a profile type is not specified in the positions file. The type are "gaussian", "lorentzian", or "voigt".

gfwhm = 20., lfwhm = 20.: Default gaussian and lorentzian full width at half maximum (FWHM). These values are used for the initial and/or fixed width when they are not specified in the position file.

fitbackground = yes: Fit the background? If "yes" a linear background across the fitting region will be fit simultaneously with the profiles. If "no" the background will be fixed.

fitpositions = "all": Position fitting option. This may be "fixed" to fix all positions at their initial values, "single" to fit a single shift to the positions while keeping their separations fixed, or "all" to independently fit all the positions.

fitgfwhm = "all", fitlfwhm = "all": Profile width fitting options. These may be "fixed" to fix all widths at their initial values, "single" to fit a single scale factor to the initial widths, or "all" to independently fit all the widths.

The following parameters are used for error estimates as described below in the ERROR ESTIMATES section.

nerrsample = 0: Number of samples for the error computation. A value less than 10 turns off the error computation. A value of ~10 does a rough error analysis, a value of ~50 does a reasonable error analysis, and a value >100 does a detailed error analysis. The larger this value the longer the analysis takes.

sigma0 = INDEF, invgain = INDEF

The pixel sigmas are modeled by the formula:

sigma**2 = sigma0**2 + invgain * I

where I is the pixel value and "**2" means the square of the quantity. If either parameter is specified as INDEF or with a value less than zero then no sigma estimates are made and so no error estimates for the measured parameters is made.

The following parameters determine the output of the task.

components = "": All profiles defined by the position file are simultaneously fit but only a subset of the fitted profiles may be selected for output. A profile or component is identified by the order number in the position file; i.e. the first entry in the position file is 1, the second is 2, etc. The components to be output are specified by a range list. The empty list, "", selects all profiles.

verbose = yes: Print fitting results and record of output images created on the standard output (normally the terminal). The fitting information is printed to the logfile so there is normally no need to redirect this output. The output may be turned off when the task is run as a background task.

logfile = "logfile": Logfile for fitting results. If not specified the results will not be logged.

plotfile = "plotfile": File to contain plot output. The plots show the image vector with overplots of the total fit, the individual components, and the residuals. The plotfile may be examined and manipulated later with tools such as gkimosaic.

output = "": List of output images. If not specified then no output images are created. If images are specified the list is matched with the input list.

option = "fit" (fit|difference): Image output option. The choices are "fit" to output the fitted image vector which is the sum of the fitted profiles (without a background), or "difference" to output the data with the profiles subtracted.

clobber = no, merge = no: Clobber or modify any existing output images? If clobbering is not enabled a warning is printed and any existing output images are not modified. If clobbering is enabled then either new images are created if merge is "no" or the new fits are merged with the existing images. Merging is meaningful when only a subset of the input is fit such as selected lines or apertures.

Description

Fitprofs fits one dimensional profile functions to image vectors and outputs the fitting parameters, plots, and model or residual image vectors. This is done noninteractively using a file of initial profile positions and widths. Interactive profile fitting may be done with the deblending option of splot or stsdas.fitting.ngaussfit.

The input consists of images in a variety of formats. These include all the spectral formats as well as standard images. For two dimensional images (or the first 2D plane of higher dimensional images) either the lines or columns may be fit with possible summing of adjacent lines or columns to increase the signal-to-noise. A subset of the image apertures, lines, or columns may be specified or all image vectors may be fit.

The fitting parameters consist of a fitting region, a list of initial positions, peaks, and widths, initial background endpoints, the fitting function, and the parameters to be fit or constrained. The coordinates and units used for the positions and widths are those defined by the standard linear coordinate header parameters. For dispersion corrected spectra these are generally wavelengths in Angstroms and otherwise they are generally pixels. A fitting region must be specified by a pair of numbers.

The background parameter may be left empty to select the pixel values at the endpoints of the fitting region for defining the initial linear background. Or values at the endpoints of the fitting region may be given explicitly in pixel space order (i.e. the first value is for the edge of the fitting region which has smaller pixel coordinate0 Values can also be computed from the data using the functions "avg(w1,w2)" or "med(w1,w2)" where w1 and w2 are dispersion coordinates. The pixels in the specified range are average or medianed and the dispersion point for the linear background is the average of the dispersion coordinates of the pixels.

The position list file consists of one or more columns. The format of this file has one or more columns. The columns are the wavelength, the peak value (relative to the continuum with negative values being absorption), the profile type (gaussian, lorentzian, or voigt), and the gaussian and/or lorentzian FWHM. End columns may be missing or INDEF values may be specified to use the default parameter values (the profile and widths) or determine the peak from the data. Below are examples of the file line formats

wavelength
wavelength peak
wavelength peak (gaussian|lorenzian|voigt)
wavelength peak gaussian gfwhm
wavelength peak lorentzian lfwhm
wavelength peak voigt gfwhm
wavelength peak voigt gfwhm lfwhm

1234.5                  <- Wavelength only
1234.5 -100             <- Wavelength and peak
1234.5 INDEF v          <- Wavelength and profile type
1234.5 INDEF g 12       <- Wavelength and gaussian FWHM

where peak is the peak value, gfwhm is the gaussian FWHM, and lfwhm is the lorentzian FWHM. This format is the same as used by splot and also by artdata.mk1dspec (except in the latter case the peak is normalized to a continuum of 1).

The profile parameters fit are the central position, the peak amplitude, and the profile widths. The fitting may be constrained in number of ways. The linear background may be fixed or simultaneously fit with the profiles. The profile positions may be fixed, the relative separations fixed but a single zero point shift fit, or all positions may be fit simultaneously. The profile widths may also be fixed, the relative ratios of the widths fixed while fitting a single scale factor, or all widths fit simultaneously. The profile amplitudes are always fit.

The fitting technique uses a nonlinear iterative Levenberg-Marquardt algorithm to reduce the Chi-square of the fit. The execution time increases rapidly with the number of profiles fit so there is an effective limit to the number of profiles that can be fit at once.

The output includes a number of formats. The fitted parameters are recorded in a logfile (if specified) and printed on the standard output (if the verbose flag is set). This output includes the date, image vector, fitting parameters used, and a table of fitted or derived quantities. The parameters included some quantities relevant to spectral lines but others apply to any image data. The quantities are the profile center, the background or continuum at the center of the profile, the integral or flux of the profile (which is negative for profiles below the background), the equivalent width, the profile peak amplitude or core value, and the profile full width at half maximum. Pure gaussian and lorentzian profiles will have one of the widths set to zero while voigt profiles will have both values.

Summary plots are recored in a plotfile (if specified). The plots show the data with the total fit, individual profiles, and residuals overplotted. The plotfile may be examined and printed using the task gkimosaic as well as other tasks which interpret GKI metacode.

The final output consists of images in the same format as the input. The images may be of the total fit (sum of profiles without background) or of the difference (residuals) of the data minus the model.

Error estimates

Error estimates may be computed for the fitted parameters. This requires a model for the pixel sigmas. Currently this model is based on a Poisson statistics model of the data. The model parameters are a constant Gaussian sigma and an "inverse gain" as specified by the parameters sigma0 and invgain. These parameters are used to compute the pixel value sigma from the following formula:

sigma**2 = sigma0**2 + invgain * I

where I is the pixel value and "**2" means the square of the quantity.

If either the constant sigma or the inverse gain are specified as INDEF or with values less than zero then no noise model is applied and no error estimates are computed. Also if the number of error samples is less than 10 then no error estimates are computed. Note that for processed spectra this noise model will not generally be the same as the detector readout noise and gain. These parameters would need to be estimated in some way using the statistics of the spectrum. The use of an inverse gain rather than a direct gain was choosed to allow a value of zero for this parameters. This provides a model with constant uncertainties.

The error estimates are computed by Monte-Carlo simulation. The model is fit to the data (using the noise sigmas) and this model is used to describe the noise-free spectrum. A number of simulations, given by the nerrsample, are created in which random Gaussian noise is added to the noise-free spectrum based on the pixel sigmas from the noise model. The model fitting is done for each simulation and the absolute deviation of each fitted parameter to model parameter is recorded. The error estimate for the each parameter is then the absolute deviation containing 68.3% of the parameter estimates. This corresponds to one sigma if the distribution of parameter estimates is Gaussian though this method does not assume this.

The Monte-Carlo technique automatically includes all effects of parameter correlations and does not depend on any approximations. However the computation of the errors does take a significant amount of time. The amount of time and the accuracy of the error estimates depend on how many simulations are done. A small number of samples (of order 10) is fast but gives crude estimates. A large number (greater than 100) is slow but gives very good estimates. A compromise value of 50 is recommended for many applications.

Examples

1. The following example creates an artificial spectrum and fits it. It requires the artdata and proto packages be loaded.

cl> mk1dspec test slope=1 temp=0 lines=testlines nl=20
cl> mknoise test rdnoise=10 poisson=yes
cl> fields testlines fields=1,3 > fitlines
cl> fitprofs test reg="4000 8000" pos=fitlines
# Jul 27 17:49 test - Ap 1:
# Nfit=20, background=YES, positions=all, gfwhm=all, lfwhm=all
#   center      cont      flux      eqw      core   gfwhm   lfwhm
  6832.611  1363.188  -13461.8    9.875  -408.339   30.97      0.
  7963.674  1507.641  -8193.58    5.435  -395.207   19.48      0.
  5688.055   1217.01  -7075.11    5.814  -392.006   16.96      0.
    6831.3   1363.02  -7102.01     5.21  -456.463   14.62      0.
  7217.335  1412.323   -10110.    7.158  -427.797    22.2      0.
  6709.286  1347.437  -4985.06      3.7  -225.346   20.78      0.
  6434.317  1312.319  -7121.03    5.426  -342.849   19.51      0.
  6130.415  1273.506    -6164.     4.84  -224.146   25.83      0.
  4569.375  1074.138   -3904.6    3.635  -183.963   19.94      0.
  5656.645  1212.999  -8202.81    6.762  -303.617   25.38      0.
   4219.53  1029.458  -5161.64    5.014  -241.135   20.11      0.
  4551.424  1071.845  -3802.61    3.548   -139.39   25.63      0.
  4604.649  1078.643  -5539.15    5.135  -264.654   19.66      0.
  6966.557  1380.294  -11717.5    8.489  -600.581   18.33      0.
  4259.019  1034.501  -4280.38    4.138  -213.446   18.84      0.
  5952.958  1250.843  -8006.98    6.401  -318.313   23.63      0.
   4531.89  1069.351  -712.598   0.6664  -155.197   4.313      0.
  7814.418  1488.579  -2926.49    1.966  -164.891   16.67      0.
  5310.929  1168.846  -10132.2    8.669  -487.502   19.53      0.
  5022.948  1132.066   -7532.8    6.654  -325.594   21.73      0.

2. Suppose there is no obvious continuum level near the fitting region but you want to specify a flat continuum level as the average of pixels in a specified wavelength region. The background region would be specified as

background = "avg(4250,4425.3) avg(4250,4425.3)"

Note that the value must be given twice to get a flat continuum.

Revisions

FITPROFS V2.11.3: Modified to allow a more general specification of the background.

FITPROFS V2.11: Modified to include lorentzian and voigt profiles. The parameters and positions file format have changed in this version. A new parameter controls the number of Monte-Carlo samples used in the error estimates.

FITPROFS V2.10.3: Error estimates based on a simple noise model are now computed.

FITPROFS V2.10: This task is new.

Time requirements

The following CPU times were obtained with a Sun Sparcstation I. The number of pixels in the fitting region and the number of lines fit were varied. The worst case of fitting all parameters and a background was considered as well as the constrained case of fitting line positions and a single width with fixed background.

Npixels Nprofs Fitbkg Fitpos  Fitsig   CPU(sec)
  100      5     yes    all     all       1.9
  100     10     yes    all     all       3.3
  100     15     yes    all     all       5.6
  100     20     yes    all     all       9.0
  512      5     yes    all     all       4.7
  512     10     yes    all     all      10.0
  512     15     yes    all     all      17.6
  512     20     yes    all     all      27.8
 1000      5     yes    all     all       8.0
 1000     10     yes    all     all      18.0
 1000     15     yes    all     all      31.8
 1000     20     yes    all     all      50.2
 1000     25     yes    all     all      72.8
 1000     30     yes    all     all     100.2
  512      5      no    all  single       2.8
  512     10      no    all  single       5.3
  512     15      no    all  single       8.6
  512     20      no    all  single      12.8

Crudely this implies CPU time goes as the 1.4 power of the number of profiles and the 0.75 power of the number of pixels.