Refinement is a term frequently used in protein tertiary structure calculations, but which has not been applied to primary structure calculations. The idea is simple: create a rough model quickly and then put significant effort into trying to best fit the data to that rough model, rather than exploring every conceivable conformation of a molecule. This method conserves computational resources by concentrating them on a demonstrably reasonable solution to a problem.
The algorithm used by Tandem is an example of possible algorithms that belong
to this class of methods for obtaining similar results. Variations
of the scheme may be necessary to adequately capture all of the
details of specific experimental protocols. The steps in a simple
refinement-based identification/modeling process are as follows:
- Match a set of tandem mass spectra containing a large set of spectra against a list of proteins, with a small value for the number of missed cleavages (e.g., 0 or 1) and a limited selection of potential modifications and create a set of protein sequences that are most likely to contain valid matches to the spectra (e.g., all proteins containing peptides with an expectation value < 0.1);
- Match the tandem spectra against the new, smaller list of protein sequences, using a large value for the number of missed cleavages and a large selection of potential modifications; and
- If desired, perform the same matching procedure as step 2, using nonspecific
It may be desirable to remove mass spectra that represent valid matches from consideration between Steps 1 & 2 and 2 & 3, depending on the details of the particular implementation of the algorithm. This removal between steps was performed by the software described in the next section.