identify: Identify features in spectrum for dispersion solution

Package: hydra

Summary

Features are interactively marked in one dimensional image vectors. The features may be spectral lines when the vector is a spectrum or profile positions when the vector is a spatial cut. A function may be fit to the user coordinates as a function of pixel coordinates. This is primarily used to find dispersion functions for spectra such as arc-line calibration spectra. The profile position measurements are generally used for geometric calibrations.

Usage

identify images

Parameters

images: List of images in which to identify features and fit coordinate functions.

section = "middle line"

If an image is not one dimensional or specified as a one dimensional image section then the image section given by this parameter is used. The section defines a one dimensional vector. The image is still considered to be two or three dimensional. It is possible to change the data vector within the program. The section parameter may be specified directly as an image section or in one of the following forms

line|column|x|y|z first|middle|last|# [first|middle|last|#]]
first|middle|last|# [first|middle|last|#] line|column|x|y|z

where each field can be one of the strings separated by | except for # which is an integer number. The field in [] is a second designator which is used with three dimensional data. See the example section for examples of this syntax. Abbreviations are allowed though beware that 'l' is not a sufficient abbreviation.

database = "database": Database in which the feature data and coordinate functions are recorded.

coordlist = "linelists$idhenear.dat": User coordinate list consisting of an list of line coordinates. A comment line of the form "# units <units>", where <units> is one of the understood units names, defines the units of the line list. If no units are specified then Angstroms are assumed. Some standard line lists are available in the directory "linelists$". The standard line lists are described under the topic linelists.

units = ""

The units to use if no database entry exists. The units are specified as described in

cl> help onedspec.package section=units

If no units are specified and a coordinate list is used then the units of the coordinate list are selected. If a database entry exists then the units defined there override both this parameter and the coordinate list.

nsum = "10": Number of lines, columns, or bands across the designated vector axis to be summed when the image is a two or three dimensional spatial spectrum. It does not apply to multispec format spectra. If the image is three dimensional an optional second number can be specified for the higher dimensional axis (the first number applies to the lower axis number and the second to the higher axis number). If a second number is not specified the first number is used for both axes.

match = -3.: The maximum difference for a match between the feature coordinate function value and a coordinate in the coordinate list. Positive values are in user coordinate units and negative values are in units of pixels.

maxfeatures = 50: Maximum number of the strongest features to be selected automatically from the coordinate list (function 'l') or from the image data (function 'y').

zwidth = 100.: Width of graphs, in user coordinates, when in zoom mode (function 'z').

The following parameters are used in determining feature positions.

ftype = "emission": Type of features to be identified. The possibly abbreviated choices are "emission" and "absorption".

fwidth = 4.: Full-width at the base (in pixels) of features to be identified.

cradius = 5.: The maximum distance, in pixels, allowed between a feature position and the initial estimate when defining a new feature.

threshold = 0.: In order for a feature center to be determined the range of pixel intensities around the feature must exceed this threshold.

minsep = 2.: The minimum separation, in pixels, allowed between feature positions when defining a new feature.

The following parameters are used to fit a function to the user coordinates. The icfit package is used and further descriptions about these parameters may be found under that package.

function = "spline3": The function to be fit to the user coordinates as a function of the pixel coordinate. The choices are "chebyshev", "legendre", "spline1", or "spline3".

order = 1: Order of the fitting function. The order is the number of polynomial terms or number of spline pieces.

sample = "*": Sample regions for fitting. This is in pixel coordinates and not the user coordinates.

niterate = 0: Number of rejection iterations.

low_reject = 3.0, high_reject = 3.0: Lower and upper residual rejection in terms of the RMS of the fit.

grow = 0: Distance from a rejected point in which additional points are automatically rejected regardless of their residuals.

The following parameters control the input and output.

autowrite = no: Automatically write or update the database? If "no" then when exiting the program a query is given if the feature data and fit have been modified. The query is answered with "yes" or "no" to save or not save the results. If autowrite is "yes" exiting the program automatically updates the database.

graphics = "stdgraph": Graphics device. The default is the standard graphics device which is generally a graphics terminal.

cursor = "": Cursor input file. If a cursor file is not given then the standard graphics cursor is read.

The following parameters are queried when the 'b' key is used.

crval, cdelt: These parameters specify an approximate coordinate value and coordinate interval per pixel when the automatic line identification algorithm ('b' key) is used. The coordinate value is for the pixel specified by the crpix parameter in the aidpars parameter set. The default value of crpix is INDEF which then refers the coordinate value to the middle of the spectrum. By default only the magnitude of the coordinate interval is used. Either value may be given as INDEF. In this case the search for a solution will be slower and more likely to fail. The values may also be given as keywords in the image header whose values are to be used.

aidpars = "" (parameter set): This parameter points to a parameter set for the automatic line identification algorithm. See aidpars for further information.

Cursor keys

?: Clear the screen and print a menu of options.

a: Apply next (c)enter or (d)elete operation to (a)ll features

b: Identify features and find a dispersion function automatically using the coordinate line list and approximate values for the dispersion.

c: (C)enter the feature nearest the cursor. Used when changing the position finding parameters or when features are defined from a previous feature list.

d: (D)elete the feature nearest the cursor. (D)elete all features when preceded by the (a)ll key. This does not affect the dispersion function.

e: Find features from a coordinate list without doing any fitting. This is like the 'l' key without any fitting.

f: (F)it a function of the pixel coordinates to the user coordinates. This enters the interactive function fitting package.

g: Fit a zero point shift to the user coordinates by minimizing the difference between the user and fitted coordinates. The coordinate function is not changed.

i: (I)nitialize (delete features and coordinate fit).

j: Go to the preceding line, column, or band in a 2D/3D or multispec image.

k: Go to the next line, column, or band in a 2D/3D or multispec image.

l: (L)ocate features in the coordinate list. A coordinate function must be defined or at least two features must have user coordinates from which a coordinate function can be determined. If there are features an initial fit is done, then features are added from the coordinate list, and then a final fit is done.

m: (M)ark a new feature using the cursor position as the initial position estimate.

n: Move the cursor or zoom window to the (n)ext feature (same as +).

o: Go to the specified line, column, or band in a 2D/3D or multispec image. For 3D images two numbers are specified.

p: (P)an to the original window after (z)ooming on a feature.

q: (Q)uit and continue with next image.

r: (R)edraw the graph.

s: (S)hift the fit coordinates relative to the pixel coordinates. The user specifies the desired fit coordinate at the position of the cursor and a zero point shift to the fit coordinates is applied. If features are defined then they are recentered and the shift is the average shift. The shift in pixels, user coordinates, and z (fractional shift) is printed.

t: Reset the current feature to the position of the cursor. The feature is not recentered. This is used to mark an arbitrary position.

u: Enter a new (u)ser coordinate for the current feature. When (m)arking a new feature the user coordinate is also requested.

v: Modify the fitting weight of the current feature. The weights are integers with the lowest weight being the default of 1.

w: (W)indow the graph. A window prompt is given and a number of windowing options may be given. For more help type '?' to the window prompt or see help under gtools.

x: Find a zero point shift for the current dispersion function. This is used by starting with the dispersion solution and features from a different spectrum. The mean shift in user coordinates, mean shift in pixels, and the fractional shift in user coordinates is printed.

y: Up to maxfeatures emission peaks are found automatically (in order of peak intensity) and, if a dispersion solution is defined, the peaks are identified from the coordinate list.

z: (Z)oom on the feature nearest the cursor. The width of the zoom window is determined by the parameter zwidth.

.: Move the cursor or zoom window to the feature nearest the cursor.

+: Move the cursor or zoom window to the (n)ext feature.

-: Move the cursor or zoom window to the previous feature.

Parameters are shown or set with the following "colon commands", which may be abbreviated. To show the value of a parameter type the parameter name alone and to set a new value follow the parameter name by the value.

:show file: Show the values of all the parameters. If a file name is given then the output is appended to that file. If no file is given then the terminal is cleared and the output is sent to the terminal.

:features file: Print the feature list and the fit rms. If a file name is given then the output is appended to that file. If no file is given then the terminal is cleared and the output is sent to the terminal.

:coordlist file: Set or show the coordinate list file.

:cradius value: Set or show the centering radius in pixels.

:threshold value: Set or show the detection threshold for centering.

:database name: Set or show the database for recording feature records.

:ftype value: Set or show the feature type (emission or absorption).

:fwidth value: Set or show the feature width in pixels.

:image imagename: Set a new image or show the current image.

:labels value: Set or show the feature label type (none, index, pixel, coord, user, or both). None produces no labeling, index labels the features sequentially in order of pixel position, pixel labels the features by their pixel coordinates, coord labels the features by their user coordinates (such as wavelength), user labels the features by the user or line list supplied string, and both labels the features by both the user coordinates and user strings.

:match value: Set or show the coordinate list matching distance.

:maxfeatures value: Set or show the maximum number of features automatically found.

:minsep value: Set or show the minimum separation allowed between features.

:read name ap: Read a record from the database. The record name defaults to the image name and, for 1D spectra, the aperture number defaults to aperture of the current image.

:write name ap: Write a record to the database. The record name defaults to the image name and, for 1D spectra, the aperture number defaults to aperture of the current image.

:add name ap: Add features from a database record. The record name defaults to the image name and, for 1D spectra, the aperture number defaults to aperture of the current image. Only the features are added to any existing list of features. The dispersion function is not read.

:zwidth value: Set or show the zoom width in user units.

:/help: Print additional help for formatting graphs. See help under "gtools".

Description

Features in the input images are identified interactively and assigned user coordinates. A "coordinate function" mapping pixel coordinates to user coordinates may be determined from the identified features. A user coordinate list may be defined to automatically identify additional features. This task is used to measure positions of features, determine dispersion solutions for spectra, and to identify features in two and three dimensional images for mapping a two or three dimensional coordinate transformation. Because of this dual use the terms vector and feature are used rather than spectrum and spectral line.

Each image in the input list is considered in turn. If the image is not one dimensional or a one dimensional section of an image then the image section given by the parameter section is used. This parameter may be specified in several ways as described in the PARAMETERS and EXAMPLES sections. The image section is used to select a starting vector and image axis.

If the image is not one dimensional or in multispec format then the number of lines, columns, or bands given by the parameter nsum are summed. The one dimensional image vector is graphed. The initial feature list and coordinate function are read from the database if an entry exists. The features are marked on the graph. The image coordinates are in pixels unless a coordinate function is defined, in which case they are in user coordinate units. The pixel coordinate, coordinate function value, and user coordinate for the current feature are printed.

The graphics cursor is used to select features and perform various functions. A menu of the keystroke options and functions is printed with the key '?'. The cursor keys and their functions are defined in the CURSOR KEYS section and described further below. The standard cursor mode keys are also available to window and redraw the graph and to produce hardcopy "snaps".

There are a number of ways of defining features. They fall into two categories; interactively defining features with the cursor and using automatic algorithms.

The 'm' key is the principle interactive feature marking method. Typing 'm' near the position of a feature applies a feature centering algorithm (see center1d) and, if a center is found, the feature is entered in the feature list and marked on the spectrum. If the new position is within a distance given by the parameter minsep of a previous feature it is considered to be the same feature and replaces the old feature. Normally the position of a new feature will be exactly the same as the original feature. The coordinate list is searched for a match between the coordinate function value (when defined) and a user coordinate in the list. If a match is found it becomes the default user coordinate which the user may override. The new feature is marked on the graph and it becomes the current feature. The redefinition of a feature which is within the minimum separation may be used to set the user coordinate from the coordinate list. The 't' key allows setting the position of a feature to other than that found by the centering algorithm.

The principle automatic feature identification algorithm is executed with the 'b' key. The user is queried for an approximate coordinate value and coordinate interval per pixel. The coordinate value is for the center of the spectrum by default though this may be changed with the aidpars parameters. Only the magnitude of the coordinate interval per pixel is used by default though this also may be changed. Either value may be given as INDEF to do an unconstrained search, however, this will be much slower and more likely to fail. The algorithm searches for matches between the strong lines in the spectrum and lines in the coordinate list. The algorithm is described in the documentation for aidpars.

The 'b' key works with no predefined dispersion solution or features. If two or more features are identified, with 'm', spanning the range of the data or if a coordinate function is defined, from a previous solution, then the 'e', 'l', and 'y' keys may be used to identify additional features from a coordinate list. The 'e' key only adds features at the coordinates of the line lists if the centering algorithm finds a feature at that wavelength (as described below). The 'y' key works in reverse by finding the prominent features using a peak finding algorithm and then looking in the coordinate list for entries near the estimated position. Up to a maximum number of features (maxfeatures) will be selected. If there are more peaks only the strongest are kept. In either of these cases there is no automatic fitting and refitting of the dispersion function.

The 'l' key combines automatic fits with locating lines from the coordinate list. If two or more features are defined an initial fit is made. Then for each coordinate value in the coordinate list the pixel coordinate is determined and a search for a feature at that point is made. If a feature is found (based on the parameters ftype, fwidth, cradius, and threshold) its user coordinate value based on the coordinate function is determined. If the coordinate function value matches the user coordinate from the coordinate list within the error limit set by the parameter match then the new feature is entered in the feature list. Up to a maximum number of features, set by the parameter maxfeatures, may be defined in this way. A new user coordinate function is fit to all the located features. Finally, the graph is redrawn in user coordinates with the additional features found from the coordinate list marked.

A minimum of two features must be defined for the 'l' key algorithm to work. However, three or more features are preferable to determine changes in the dispersion as a function of position.

The 'f' key fits a function of the pixel coordinates to the user coordinates. The type of function, order and other fitting parameters are initially set with the parameters function, order, sample, niterate, low_reject, high_reject and grow.. The value of the function for a particular pixel coordinate is called the function coordinate and each feature in the feature list has a function coordinate value. The fitted function also is used to convert pixel coordinates to user coordinates in the graph. The fitting is done within the interactive curve fitting package which has its own set of interactive commands. For further information on this package see the help material under icfit.

If a zero point shift is desired without changing the coordinate function the user may specify the coordinate of a point in the spectrum with the 's' key from which a shift is determined. The 'g' key also determines a shift by minimizing the difference between the user coordinates and the fitted coordinates. This is used when a previously determined coordinate function is applied to a new spectrum having fewer or poorer lines and only a zero point shift can reasonably be determined. Note that the zero point shift is in user coordinates. This is only an approximate correction for shifts in the raw spectra since these shifts are in pixels and the coordinate function should also be appropriately shifted.

One a set of features is defined one may select features for various operations. To select feature as the current feature the keys '.', 'n', '+', and '-' are used. The '.' selects the feature nearest the cursor, the 'n' and '+' select the next feature, and the '-' selects the previous feature relative to the current feature in the feature list as ordered by pixel coordinate. These keys are useful when redefining the user coordinate with the 'u' key, changing the fitting weight of a feature with 'v', and when examining features in zoom mode.

Features may be deleted with the key 'd'. All features are deleted when the 'a' key immediately precedes the delete key. Deleting the features does not delete the coordinate function. Features deleted in the curve fitting package also are removed from the feature list upon exiting the curve fitting package.

It is common to transfer the feature identifications and coordinate function from one image to another. When a new image without a database entry is examined, such as when going to the next image in the input list, changing image lines or columns with 'j', 'k' and 'o', or selecting a new image with the ":image" command, the current feature list and coordinate function are kept. Alternatively, a database record from a different image may be read with the ":read" command. When transferring feature identifications between images the feature coordinates will not agree exactly with the new image feature positions and several options are available to reregister the feature positions. The key 'c' centers the feature nearest the cursor using the current position as the starting point. When preceded with the 'a' key all the features are recentered (the user must refit the coordinate function if desired). As an aside, the recentering function is also useful when the parameters governing the feature centering algorithm are changed. An additional options is the ":add" command to add features from a database record. This does not overwrite previous features (or the fitting functions) as does ":read".

The (c)entering function is applicable when the shift between the current and true feature positions is small. Larger shifts may be determined automatically with the 's' or 'x' keys.

A zero point shift is specified interactively with the 's' key by using the cursor to indicate the coordinate of a point in the spectrum. If there are no features then the shift is exactly as marked by the cursor. If there are features the specified shift is applied, the features are recentered, and the mean shift for all the features is determined.

The 'x' key uses the automatic line identification algorithm (see aidpars) with the constraint that the dispersion is nearly the same and the is primarily a shift in the coordinate zero point. If features are defined, normally by inheritance from another spectrum, then a first pass is done to identify those features in the spectrum. Since this only works when the shifts are significantly less than the dispersion range of the spectrum (i.e. a significant number of features are in common) a second pass using the full coordinate line list is performed if a shift based on the features is not found. After a shift is found any features remaining from the original list are recentered and a mean shift is computed.

In addition to the single keystroke commands there are commands initiated by the key ':' (colon commands). As with the keystroke commands there are a number of standard graphics features available beginning with ":." (type ":.help" for these commands). The identify colon commands allow the task parameter values to be listed and to be reset within the task. A parameter is listed by typing its name. The colon command ":show" lists all the parameters. A parameter value is reset by typing the parameter name followed by the new value; for example ":match 10". Other colon commands display the feature list (:features), control reading and writing records to the database (:read and :write), and set the graph display format.

The feature identification process for an image is completed by typing 'q' to quit. Attempting to quit an image without explicitly recording changes in the feature database produces a warning message unless the autowrite parameter is set. If this parameter is not set a prompt is given asking whether to save the results otherwise the results are automatically saved. Also the reference spectrum keyword REFSPEC is added to the image header at this time. This is used by refspectra and dispcor. As an immediate exit the 'I' interrupt key may be used. This does not save the feature information and may leave the graphics in a confused state.

Database records

The database specified by the parameter database is a directory of simple text files. The text files have names beginning with 'id' followed by the entry name, usually the name of the image. The database text files consist of a number of records. A record begins with a line starting with the keyword "begin". The rest of the line is the record identifier. Records read and written by identify have "identify" as the first word of the identifier. Following this is a name which may be specified following the ":read" or ":write" commands. If no name is specified then the image name is used. For 1D spectra the database entry includes the aperture number and so to read a solution from a aperture different than the current image and aperture number must be specified. For 2D/3D images the entry name has the 1D image section which is what is specified to read the entry. The lines following the record identifier contain the feature information and dispersion function coefficients.

The dispersion function is saved in the database as a series of coefficients. The section containing the coefficients starts with the keyword "coefficients" and the number of coefficients.

The first four coefficients define the type of function, the order or number of spline pieces, and the range of the independent variable (the line or column coordinate along the dispersion). The first coefficient is the function type code with values:

Code    Type
   1    Chebyshev polynomial
   2    Legendre polynomial
   3    Cubic spline
   4    Linear spline

The second coefficient is the order (actually the number of terms) of the polynomial or the number of pieces in the spline.

The next two coefficients are the range of the independent variable over which the function is defined. These values are used to normalize the input variable to the range -1 to 1 in the polynomial functions. If the independent variable is x and the normalized variable is n, then

n = (2 * x - (xmax + xmin)) / (xmax - xmin)

where xmin and xmax are the two coefficients.

The spline functions divide the range into the specified number of pieces. A spline coordinate s and the nearest integer below s, denoted as j, are defined by

s = (x - xmin) / (xmax - xmin) * npieces
j = integer part of s

where npieces are the number of pieces.

The remaining coefficients are those for the appropriate function. The number of coefficients is either the same as the function order for the polynomials, npieces+1 for the linear spline, or npieces + 3 for the cubic spline.

1. Chebyshev Polynomial

The polynomial can be expressed as the sum

y = sum from i=1 to order {c_i * z_i}

where the c_i are the coefficients and the z_i are defined interactively as:

z_1 = 1
z_2 = n
z_i = 2 * n * z_{i-1} - z_{i-2}

2. Legendre Polynomial

The polynomial can be expressed as the sum

y = sum from i=1 to order {c_i * z_i}

where the c_i are the coefficients and the z_i are defined interactively as:

z_1 = 1
z_2 = n
z_i = ((2*i-3) * n * z_{i-1} - (i-2) * z_{i-2}) / (i-1)

3. Linear Spline

The linear spline is evaluated as

y = c_j * a + c_{j+1} * b

where j is as defined earlier and a and b are fractional difference between s and the nearest integers above and below

a = (j + 1) - s
b = s - j

4. Cubic Spline

The cubic spline is evaluated as

y = sum from i=0 to 3 {c_{i+j} * z_i}

where j is as defined earlier. The term z_i are computed from a and b, as defined earlier, as follows

z_0 = a**3
z_1 = 1 + 3 * a * (1 + a * b)
z_2 = 1 + 3 * b * (1 + a * b)
z_3 = b**3

Examples

1. Because this task is interactive and has many possible applications it is difficult to provide actual examples. Instead some uses of the task are described.

o: For defining distortions in the slit dimension as a function of wavelength the positions of objects are marked at some wavelength. The task reidentify is then used to trace the features to other wavelengths.

o: For determining dispersion solutions in a one dimensional spectrum an arc calibration is used. Three emission features are marked and the (l)ocate key is used to find additional features from a coordinate list of arc lines. The dispersion solution is fit interactively and badly determined or misidentified lines are deleted. The solution may be written to the database or transferred to the object spectrum by reading the object image and deleting all the features. Deleting the features does not delete the coordinate function.

o: For determining a two or three dimensional coordinate transformation a dispersion solution is determined at one slit position in a long slit arc spectrum or one spatial position in a Fabry-Perot spectrum as in the previous example. The features are then traced to other positions with the task reidentify.

2. For images which are two or three dimensional it is necessary to specify the image axis for the data vector and the number of pixels at each point across the vector direction to sum. One way specify a vector is to use an image section to define a vector. For example, to select column 20:

cl> identify obj[20,*]

The alternative is to use the section parameter. Below are some examples of the section parameter syntax for an image "im2d" which is 100x200 and "im3d" which is 100x200x50. On the left is the section string syntax and on the right is the image section

Section parameter |  Image section      |  Description
------------------|---------------------|---------------------
first line        |  im2d[*,1]          |  First image line
middle column     |  im2d[50,*]         |  Middle image column
last z            |  im3d[100,200,*]    |  Last image z vector
middle last y     |  im3d[50,*,50]      |  Image y vector
line 20           |  im2d[*,20]         |  Line 20
column 20         |  im2d[20,*]         |  Column 20
x 20              |  im2d[*,20]         |  Line 20
y 20              |  im2d[20,*]         |  Column 20
y 20 30           |  im2d[20,*,30]      |  Column 20
z 20 30           |  im3d[20,30,*]      |  Image z vector
x middle          |  im3d[*,100,25]     |  Middle of image
y middle          |  im3d[50,*,25]      |  Middle of image
z middle          |  im3d[50,100,*]     |  Middle of image

The most common usage should be "middle line", "middle column" or "middle z".

The summing factors apply to the axes across the specified vector. For 3D images there may be one or two values. The following shows which axes are summed, the second and third columns, when the vector axis is that shown in the first column.

Vector axis       |   Sum axis in 2D    |  Sum axes in 3D
------------------|---------------------|--------------------
     1            |         2           |      2 3
     2            |         1           |      1 3
     3            |         -           |      1 2

Revisions

IDENTIFY V2.11: The dispersion units are now determined from a user parameter, the coordinate list, or the database entry. A new key, 'e', has been added to add features from a line list without doing any fits. This is like the 'l' but without the automatic fitting before and after adding new features. A new key, 'b', has been added to apply an automatic line identification algorithm. The 'x' key has been changed to use the automatic line identification algorithm. The allows finding much larger shifts. The match parameter may now be specified either in user coordinates or in pixels. The default is now 3 pixels. The default threshold value has been changed to 0.

IDENTIFY V2.10.3: The section and nsum parameter syntax was extended to apply to 3D images. The previous values and defaults may still be used. The 'v' key was added to allow assigning weights to features.

IDENTIFY V2.10: The principle revision is to allow multiple aperture images and long slit spectra to be treated as a unit. New keystrokes allow jumping or scrolling within multiple spectra in a single image. For aperture spectra the database entries are referenced by image name and aperture number and not with image sections. Thus, IDENTIFY solutions are not tied to specific image lines in this case. There is a new autowrite parameter which may be set to eliminate the save to database query upon exiting. The new colon command "add" may be used to add features based on some other spectrum or arc type and then apply the fit to the combined set of features.