NAME

     tofasta - Size sort DNA sequences


SYNOPSIS

     tofasta  [-shortest  n ] filename ...



DESCRIPTION

     Tofasta is a very simple program that reads in DNA sequences
     and  writes  them out in FASTA format sorted from longest to
     shortest.  The aim of the program is to format reads so they
     are  ready for processing by ICAass.  Owing to ICAass' clus-
     ter threshold definition as a percentage global  similarity,
     short  sequences  can  sometimes  match an undesirably large
     number of other longer sequences.  TO help avoid that situa-
     tion,  tofasta  has  an  option to exclude sequences shorter
     than a user defined length.

     Sequences can be spread amongst any number of files.   Vari-
     ous  sequence formats are supported including GenBank, EMBL,
     plain, (unformatted sequence files),Staden's semi-colon  and
     Experiment file formats, and also 2 NBRF/FASTA style formats
     with the description either on the same line as  '>sequence-
     name'  or  with the description on the line immediately fol-
     lowing the sequence name.


USAGE

  -shortest n
     The value of n is the shortest acceptable sequence length

  tofasta filename1 filename2 filenameN
     Always expect a list of space separated filenames which hold
     DNA sequence information. No default, always required.


SEE ALSO

     N2tool(1), ICAass(1), ICAtool(1), ICAprint(1),  ICAstats(1),
     ICAmatches(1), ssort(1), just30(1)


BUGS

     I hope its too simple to be buggy ! The program exits if  it
     cannot  find  any sequence in a file.  It may be more useful
     to just complain.