NAME
tofasta - Size sort DNA sequences
SYNOPSIS
tofasta [-shortest n ] filename ...
DESCRIPTION
Tofasta is a very simple program that reads in DNA sequences
and writes them out in FASTA format sorted from longest to
shortest. The aim of the program is to format reads so they
are ready for processing by ICAass. Owing to ICAass' clus-
ter threshold definition as a percentage global similarity,
short sequences can sometimes match an undesirably large
number of other longer sequences. TO help avoid that situa-
tion, tofasta has an option to exclude sequences shorter
than a user defined length.
Sequences can be spread amongst any number of files. Vari-
ous sequence formats are supported including GenBank, EMBL,
plain, (unformatted sequence files),Staden's semi-colon and
Experiment file formats, and also 2 NBRF/FASTA style formats
with the description either on the same line as '>sequence-
name' or with the description on the line immediately fol-
lowing the sequence name.
USAGE
-shortest n
The value of n is the shortest acceptable sequence length
tofasta filename1 filename2 filenameN
Always expect a list of space separated filenames which hold
DNA sequence information. No default, always required.
SEE ALSO
N2tool(1), ICAass(1), ICAtool(1), ICAprint(1), ICAstats(1),
ICAmatches(1), ssort(1), just30(1)
BUGS
I hope its too simple to be buggy ! The program exits if it
cannot find any sequence in a file. It may be more useful
to just complain.