About CDSParser

GenBank

Download

Manual

Contact Us

FASTA Files

Purpose

FASTA files are intended to help the user group identical, or nearly identical, genes together by providing a way to verify the grouping. This is done by creating a neighbor-joining tree through ClustalW to check the similarity of the sequences. These files also can be used in various analytical programs to determine characteristics of the sequences.

FASTA content description

  • All of the sequences. Outputs all of the sequences uploaded into a FASTA file, independent of the groups and delete set.
  • X of the longest sequences per name. Outputs the X longest sequences per name, where X is an integer specified by the user (between 0 and 2147483647). This option also operates independent of the groups and delete set.
  • X of the longest sequences per group. Outputs much like the previous option, except that it does take the groups and delete set into account (be ignoring anything that is not grouped).

Using ClustalW with CDSParser

ClustalW is executed as a spawned process from CDSParser (CDSParser starts it and controls the input and output to it). The neighbor-joining trees created by ClustalW are generated using the less-accurate "fast" method because this feature is provided to help the user verify they are grouping coding sequence names correctly, not to generate a phylogeny for publishing.