reidentify: Automatically identify features in spectra

Package: irs

Summary

Given a reference vector with identified features and (optionally) a coordinate function find the same features in other elements of the reference image and fit a new dispersion function or determine a zero point shift. After all vectors of the reference image are reidentified use the reference vectors to reidentify corresponding vectors in other images. This task is used for transferring dispersion solutions in arc calibration spectra and for mapping geometric and dispersion distortion in two and three dimensional images.

Usage

reidentify reference images

Parameters

reference
Image with previously identified features to be used as features reference for other images. If there are multiple apertures, lines, or columns in the image a master reference is defined by the section parameter. The other apertures in multispec images or other lines, or columns (selected by step) are reidentified as needed.
images
List of images in which the features in the reference image are to be reidentified. In two and three dimensional images the reidentifications are done by matching apertures, lines, columns, or bands with those in the reference image.
interactive = no
Examine and fit features interactively? If the task is run interactively a query (which may be turned off during execution) will be given for each vector reidentified after printing the results of the automatic fit and the user may chose to enter the interactive identify task.
section = "middle line"
If the reference image is not one dimensional or specified as a one dimensional image section then this parameter selects the master reference image vector. The master reference is used when reidentifying other vectors in the reference image or when other images contain apertures not present in the reference image. This parameter also defines the direction (columns, lines, or z) of the image vectors to be reidentified. The section parameter may be specified directly as an image section or in one of the following forms
line|column|x|y|z first|middle|last|# [first|middle|last|#]]
first|middle|last|# [first|middle|last|#] line|column|x|y|z
where each field can be one of the strings separated by | except for # which is an integer number. The field in [] is a second designator which is used with three dimensional data. See the example section for identify for examples of this syntax. Abbreviations are allowed though beware that 'l' is not a sufficient abbreviation.
newaps = yes
Reidentify new apertures in the images which are not in the reference image? If no, only apertures found in the reference image will be reidentified in the other images. If yes, the master reference spectrum is used to reidentify features in the new aperture and then the new aperture solution will be added to the reference apertures. All further identifications of the new aperture will then use this solution.
override = no
Override previous solutions? If there are previous solutions for a particular image vector being identified, because of a previous identify or reidentify, this parameter selects whether to simply skip the reidentification or do a reidentification and overwrite the solution in the database.
refit = yes
Refit the coordinate function? If yes and there is more than one feature and a coordinate function was defined in the reference image database then a new coordinate function of the same type as in the reference is fit using the new pixel positions. Otherwise only a zero point shift is determined for the revised coordinates without changing the form of the coordinate function.

The following parameters are used for selecting and reidentifying additional lines, columns, or apertures in two dimensional formats.

trace = no
There are two methods for defining additional reference lines, columns, or bands in two and three dimensional format images as selected by the step parameter. When trace is no the master reference line or column is used for each new reference vector. When this parameter is yes then as the reidentifications step across the image the last reidentified features are used as the reference. This "tracing" is useful if there is a coherent shift in the features such as with long slit spectra. However, any features lost during the tracing will be lost for all subsequent lines or columns while not using tracing always starts with the initial set of reference features.
step = "10"
The step from the reference line, column, or band used for selecting and/or reidentifying additional lines, columns, or bands in a two or three dimensional reference image. For three dimensional images there may be two numbers to allow independent steps along different axes. If the step is zero then only the reference aperture, line, column, or band is used. For multiaperture images if the step is zero then only the requested aperture is reidentified and if it is non-zero (the value does not matter) then all spectra are reidentified. For long slit or Fabry-Perot images the step is used to sample the image and the step should be large enough to map any significant changes in the feature positions.
nsum = "10"
Number of lines, columns, or bands across the designated vector axis to be summed when the image is a two or three dimensional spatial spectrum. It does not apply to multispec format spectra. If the image is three dimensional an optional second number can be specified for the higher dimensional axis (the first number applies to the lower axis number and the second to the higher axis number). If a second number is not specified the first number is used for both axes. This parameter is not used for multispec type images.
shift = "0"
Shift in user coordinates to be added to the reference features before centering. If the image is three dimensional then two numbers may be specified for the two axes. Generally no shift is used by setting the value to zero. When stepping to other lines, columns, or bands in the reference image the shift is added to the primary reference spectrum if not tracing. When tracing the shift is added to last spectrum when stepping to higher lines and subtracted when stepping to lower lines. If a value if INDEF is specified then an automatic algorithm is applied to find a shift.
nlost = 0
When reidentifying features by tracing, if the number of features not found in the new image vector exceeds this number then the reidentification record is not written to the database and the trace is terminated. A warning is printed in the log and in the verbose output.

The following parameters define the finding and recentering of features. See also center1d.

cradius = 5.
Centering radius in pixels. If a reidentified feature falls further than this distance from the previous line or column when tracing or from the reference feature position when reidentifying a new image then the feature is not reidentified.
threshold = 0.
In order for a feature center to be determined, the range of pixel intensities around the feature must exceed this threshold. This parameter is used to exclude noise peaks and terminate tracing when the signal disappears. However, failure to properly set this parameter, particularly when the data values are very small due to normalization or flux calibration, is a common error leading to failure of the task.

The following parameters select and control the automatic addition of new features during reidentification.

addfeatures = no
Add new features from a line list during each reidentification? If yes then the following parameters are used. This function can be used to compensate for lost features from the reference solution, particularly when tracing. Care should be exercised that misidentified features are not introduced.
coordlist = "linelists$idhenear.dat"
User coordinate list consisting of a list of line coordinates. Some standard line lists are available in the directory "linelists$". The standard line lists are described under the topic linelists.
match = -3.
The maximum difference for a match between the feature coordinate function value and a coordinate in the coordinate list. Positive values are in user coordinate units and negative values are in units of pixels.
maxfeatures = 50
Maximum number of the strongest features to be selected automatically from the coordinate list.
minsep = 2.
The minimum separation, in pixels, allowed between feature positions when defining a new feature.

The following parameters determine the input and output of the task.

database = "database"
Database containing the feature data for the reference image and in which the features for the reidentified images are recorded.
logfiles = "logfile"
List of files in which to keep a processing log. If a null file, "", is given then no log is kept.
plotfile = ""
Optional file to contain metacode plots of the residuals.
verbose = no
Print reidentification information on the standard output?
graphics = "stdgraph"
Graphics device. The default is the standard graphics device which is generally a graphics terminal.
cursor = ""
Cursor input file. If a cursor file is not given then the standard graphics cursor is read.

The following parameters are queried when the 'b' key is used in the interactive review.

crval, cdelt
These parameters specify an approximate coordinate value and coordinate interval per pixel when the automatic line identification algorithm ('b' key) is used. The coordinate value is for the pixel specified by the crpix parameter in the aidpars parameter set. The default value of crpix is INDEF which then refers the coordinate value to the middle of the spectrum. By default only the magnitude of the coordinate interval is used. Either value may be given as INDEF. In this case the search for a solution will be slower and more likely to fail. The values may also be given as keywords in the image header whose values are to be used.
aidpars = "" (parameter set)
This parameter points to a parameter set for the automatic line identification algorithm. See aidpars for further information.

Description

Features (spectral lines, cross-dispersion profiles, etc.) identified in a single reference vector (using the tasks identify or autoidentify) are reidentified in other reference vectors and the set of reference vectors are reidentified in other images with the same type of vectors. A vector may be a single one dimensional (1D) vector in a two or three dimensional (2D or 3D) image, the sum of neighboring vectors to form a 1D vector of higher signal, or 1D spectra in multiaperture images. The number of vectors summed in 2D and 3D images is specified by the parameter nsum. This parameter does not apply to multiaperture images.

As the previous paragraph indicates, there are two stages in this task. The first stage is to identify the same features from a single reference vector to a set of related reference vectors. This generally consists of other vectors in the same reference image such as other lines or columns in a long slit spectrum or the set of 1D aperture spectra in a multiaperture image. In these cases the vectors are identified by a line, column, band, or aperture number. The second stage is to reidentify the features from the reference vectors in the matching vectors of other images. For example the same lines in the reference image and another image or the same apertures in several multiaperture images. For multiaperture images the reference vector and target vector will have the same aperture number but may be found in different image lines. The first stage may be skipped if all the reference vectors have been identified.

If the images are 2D or 3D or multiaperture format and a step greater than zero is specified then additional vectors (lines/columns/bands) in the reference image will be reidentified from the initial master reference vector (as defined by an image section or section parameter) provided they have not been reidentified previously or the override flag is set. For multiple aperture spectral images, called multiaperture, a step size of zero means don't reidentify any other aperture and any other step size reidentifies all apertures. For two and three dimensional images, such as long slit and Fabry-Perot spectra, the step(s) should be large enough to minimize execution time and storage requirements but small enough to follow shifts in the features (see the discussion below on tracing).

The reidentification of features in other reference image vectors may be done in two ways selected by the parameter trace. If not tracing, the initial reference vector is applied to the other selected vectors. If tracing, the reidentifications are made with respect to the last set of identifications as successive steps away from the reference vector are made. The tracing method is appropriate for two and three dimensional spatial images, such as long slit and Fabry-Perot spectra, in which the positions of features traced vary smoothly. This allows following large displacements from the initial reference by using suitably small steps. It has the disadvantage that features lost during the reidentifications will not propagate (unless the addfeatures option is used). By not tracing, the original set of features is used for every other vector in the reference image.

When tracing, the parameter nlost is used to terminate the tracing whenever this number of features has been lost. This parameter, in conjunction with the other centering parameters which define when a feature is not found, may be useful for tracing features which disappear before reaching the limits of the image.

When reidentifying features in other images, the reference features are those from the same aperture, line, column, or band of the reference image. However, if the newaps parameter is set apertures in multiaperture spectra which are not in the reference image may be reidentified against the master reference aperture and added to the list of apertures to be reidentified in other images. This is useful when spectra with different aperture numbers are stored as one dimensional images.

The reidentification of features between a reference vector and a target vector is done as follows. First a mean shift between the two vectors is determined. After correcting for the shift the estimated pixel position of each reference feature in the target vector is used as the starting point for determining a feature center near this position. The centering fails the feature is dropped and a check against the nlost is made. If it succeeds it is added to the list of features found in the target spectrum. A zero point shift or new dispersion function may be determined. New features may then be added from a coordinate list. The details are given below.

There may be a large shift between the two vectors such that the same feature in the target vector is many pixels away from the pixel position in the reference spectrum. A shift must then be determined. The shift parameter may be used to specify a shift. The shift is in user coordinates and is added to the reference user coordinates before trying to center on a feature. For example if the reference spectrum has a feature at 5015A but in the new spectrum the feature is at 5025A when the reference dispersion function is applied then the shift would be +10. Thus a reference feature at 5015A would have the shift added to get 5025A, then the centering would find the feature some pixel value and that pixel value would be used with the true user coordinate of 5015A in the new dispersion solution.

When tracing a 2D/3D reference spectrum the shift is applied to the previous reidentified spectrum rather than the initial reference spectrum. The shift is added for increasing line or column values and subtracted for decreasing line or column values. This allows "tracing" when there is a rotation or tilt of the 2D or 3D spectrum. When not tracing the shift is always added to the reference spectrum features as described previously.

When reidentify other images with the reference spectrum the shift parameter is always just added to the reference dispersion solution matching the aperture, line, or column being reidentified.

If the shift parameter is given as INDEF then an automatic search algorithm is applied. There are two algorithms that may be used. If the search parameter is INDEF then a cross-correlation of the features list with the peaks found in the target spectrum is performed. This algorithm can only find small shifts since otherwise many lines may be missing off either end of the spectrum relative to the reference spectrum.

If the search parameter is non-zero then the pattern matching algorithm described in aidpars is used. The search parameter specified a search radius from the reference solution. If the value is positive the search radius is a distance in dispersion units. If the value is negative then the absolute value is used as a fraction of the dispersion range in the reference solution. For example, a value of -0.1 applied to reference dispersion solution with a range of 1000A would search for a new solution within 100A of the reference dispersion solution.

The pattern matching algorithm has to stages. First if there are more than 10 features in the reference the pattern matching tries to match the lines in the target spectrum to those features with a dispersion per pixel having the same sign and a value within 2%. If no solution is found then the linelist is used to match against the lines in the target spectrum, again with the dispersion per pixel having the same sign and a value within 5%. The first stage works when the set of features is nearly the same while the second stage works when the shifts are large enough that many features in the reference and target spectra are different.

The centering algorithm is described under the topic center1d and also in identify. If a feature positions shifts by more than the amount set by the parameter cradius from the starting position (possibly after adding a shift) or the feature strength (peak to valley) is less than the detection threshold then the new feature is discarded. The cradius parameter should be set large enough to find the correct peak in the presence of any shifts but small enough to minimize incorrect identifications. The threshold parameter is used to eliminate identifications with noise. Failure to set this parameter properly for the data (say if data values are very small due to a calibration or normalization operation) is the most common source of problems in using this task.

If a fitting function is defined for the features in the reference image, say a dispersion function in arc lamp spectra, then the function is refit at each reidentified line or column if the parameter refit is yes. If refitting is not selected then a zero point shift in the user coordinates is determined without changing the form of the fitting function. The latter may be desirable for tracking detector shifts through a sequence of observation using low quality calibration spectra. When refitting, the fitting parameters from the reference are used including iterative rejection parameters to eliminate misidentifications.

If the parameter addfeatures is set additional features may be added from a line list. If there are reference features then the new features are added AFTER the initial reidentification and function fit. If the reference consists only of a dispersion function, that is it has no features, then new features will be added followed by a function fit and then another pass of adding new features. A maximum number of added features, a matching distance in user coordinates, and a minimum separation from other features are additional parameters. This option is similar to that available in identify and is described more fully in the help for that task.

A statistics line is generated for each reidentified vector. The line contains the name of the image being reidentified (which for two dimensional images includes the image section and for multiaperture spectra includes the aperture number), the number of features found relative to the number of features in the reference, the number of features used in the function fit relative to the number found, the mean pixel, user coordinate, and fractional user coordinate shifts relative to the reference coordinates, and the RMS relative to the final coordinate system (whether refit or simply shifted) excluding any iteratively rejected features from the calculation.

If the task is run with the interactive flag the statistics line is printed to the standard output (the terminal) and a query is made whether to examine and/or refit the features. A response of yes or YES will put the user in the interactive graphical mode of identify. See the description of this task for more information. The idea is that one can monitor the statistics information, particularly the RMS if refitting, and select only those which may be questionable to examine interactively. A response of no or NO will continue on to the next reidentification. The capitalized responses turn off the query and act as permanent response for all other reidentifications.

This statistics line, including headers, is written to any specified log files. The log information includes the image being reidentified and the reference image, and the initial shift.

If an accessible file name is given for the plot file then a residual plot of the reidentified lines is recorded in this file. The plot file can be viewed with gkimosaic, stdgraph or reading the file with ".read" when in cursor mode (for example with "=gcur").

The reidentification results for this task are recorded in a database. Currently the database is a directory and entries in the database are text files with filenames formed by adding the prefix "id" to the image name without an image extension.

Examples

1. Arc lines and a dispersion solution were defined for the middle aperture in the multispec for arc spectrum a042.ms. To reidentify the other apertures in the reference image and then another arc image:

cl> reiden a042.ms a045.ms inter+ step=1 ver+
REIDENTIFY: NOAO/IRAF V2.9 valdes@puppis Fri 29-Jun-90
  Reference image = a042.ms.imh, New image = a042.ms, Refit = yes
   Image Data    Found     Fit Pix Shift  User Shift     RMS
a042.ms - Ap 24  48/48   47/48   -2.38E-4    -3.75E-6  0.699
Fit dispersion function interactively? (no|yes|NO|YES) (yes): y
a042.ms - Ap 24  48/48   47/48   -2.38E-4    -3.75E-6  0.699
a042.ms - Ap 23  48/48   47/48      0.216        1.32  0.754
Fit dispersion function interactively? (no|yes|NO|YES) (yes): n
a042.ms - Ap 22  48/48   47/48     0.0627       0.383  0.749
Fit dispersion function interactively? (no|yes|NO|YES) (yes): n
a042.ms - Ap 21  48/48   47/48      0.337        2.06  0.815
<etc>
  Reference image = a042.ms.imh, New image = a045.ms, Refit = yes
   Image Data    Found     Fit Pix Shift  User Shift     RMS
a045.ms - Ap 24  48/48   47/48   -2.38E-4    -3.75E-6  0.699
Fit dispersion function interactively? (no|yes|NO|YES) (yes): y
a045.ms - Ap 24  48/48   47/48   -2.38E-4    -3.75E-6  0.699
a045.ms - Ap 23  48/48   47/48      0.216        1.32  0.754
Fit dispersion function interactively? (no|yes|NO|YES) (yes): N
a045.ms - Ap 22  48/48   47/48     0.0627       0.383  0.749
a042.ms - Ap 21  48/48   47/48      0.337        2.06  0.815
a042.ms - Ap 20  48/48   47/48     -0.293       -1.79  0.726
a042.ms - Ap 19  48/48   48/48      0.472        2.88  0.912

This example is verbose and includes interactive review of reidentifications. The statistics lines have been shortened.

2. To trace a stellar profile and arc lines in long slit images for the purpose of making a distortion correction:

cl> reiden rog022[135,*] "" trace+
cl> reiden rog023 "" sec="mid line" trace+

Revisions

REIDENTIFY V2.11
The search parameter and new searching algorithm has been added. The task will now work with only a warning if the reference image is absent; i.e. it is possible to reidentify given only the database. The addfeatures function will now add features before a fit if there are no reference database features. Previously features could only be added after an initial fit using the reference features and, so, required the reference database to contain features for reidentification. This new feature is useful if one wants to uses a dispersion function from one type of calibration but wants to add features for a different kind of calibration.
REIDENTIFY V2.10.3
The section, nsum, step, and shift parameter syntax was extended to apply to 3D images. The previous values and defaults may still be used. For multiaperture data a step of zero selects only the reference aperture to be reidentified and any other step selects reidentifying all apertures.
REIDENTIFY V2.10
This task is a new version with many new features. The new features include an interactive options for reviewing identifications, iterative rejection of features during fitting, automatic addition of new features from a line list, and the choice of tracing or using a single master reference when reidentifying features in other vectors of a reference spectrum. Reidentifications from a reference image to another image is done by matching apertures rather than tracing. New apertures not present in the reference image may be added.

See also

autoidentify, identify, aidpars, center1d, linelists, fitcoords