FASTA-pro file format

Description

List of protein sequences are commonly stored in data files using the FASTA format. This simple ASCII text format looks like this:

>description line first protein
SEQUENCEPROTEINHERE
ANDHEREANDHEREAND
HEREAND
>description line second protein
SECONDPROTEINSEQUENCEHERE
...

This format has the advantage of being very easy to write and read. The description line is necessarily only one line long, which can limit its usefulness for recording detailed information.

A disadvantage of the FASTA format is that it is relatively slow to read with a computer. X! Tandem can also use a simple binary file version of an original FASTA file, which is about 6 times faster to read. The following illustration shows the structure of a FASTA-pro file:

size
(bytes)
typedescription
256bytes header
4unsigned intdescription line length N1
N1ASCII bytesdescription line (including string ending NULL character)
4unsigned intsequence length L1
L1ASCII bytes sequence (including string ending NULL character)
4unsigned int2nd description line length N2
repeat length,description,length,sequence pattern


see also: taxonomy

X! TANDEM API description project