NAME

     n2tool - quickly cluster similar DNA sequences  using  local
     similarties


SYNOPSIS

     n2tool  [-anti-sense  Yes|No|y|n ] [-index  filename ]
     [-ini  filename ]  -seq filename ...  [-threshold n ]


DESCRIPTION

     N2tool takes files of DNA sequence information and  produces
     an  index  file  which  links  similar sequences together in
     clusters.  N2tool differs from ICAtool because the former is
     guaranteed  to  compare  every  sequence against every other
     whereas the latter is not, and because n2tool uses a quicker
     pairwise  comparison  algorithm.  The other major difference
     between the programs is that N2tool has  no  query  mode  as
     this function is carried out by ICAass.

     Sequences can be spread amongst any number of files and  new
     files  can  be  added  at any time to increase the number of
     sequences clustered.  Various sequence formats are supported
     including   GenBank,   EMBL,  plain,  (unformatted  sequence
     files),Staden's semi-colon and Experiment file formats,  and
     also  2 NBRF/FASTA style formats with the description either
     on the same line as '>sequence-name' or with the description
     on  the line immediately following the sequence name.  Extra
     files of sequences can be added  at  any  time  without  any
     penalty  of  recalculation but no sequences referenced by an
     index should ever be deleted.


USAGE

     N2tool can get its configuration parameters from the command
     line  or  from a user initial configuration file or just set
     to built in defaults.   Parameter  settings  over-ride  each
     other  with defaults being set first, then the configuration
     file then finally the command line.


OPTIONS

  -anti-sense Yes|No|y|n
     Determines whether sequences should also be compared in  the
     opposite sense to how they are entered. Default is no.

  -index filname
     Defines the name  of  the  index  file  existing  or  to  be
     created.  Default  is  "cluster.index" in the current direc-
     tory.

  -ini filename
     Defines the name of the file which holds the user's  initial
     configuration  file. Default is "ICAtool.ini" in the current
     directory.

  -seq filename1 filename2 filenameN
     This flag denotes the start of a  list  of  space  separated
     filenames  which  hold DNA sequence information. No default,
     always required.

  -threshold n
     When creating a cluster  index,  this  flag  determines  the
     subsequence  similarity  score that defines the threshold at
     which 2 sequences are said to be similarDefault is 20 (#  of
     matches - # of mismatches).


FILES

  ICAtool.ini
     If this file is present then all startup details present  in
     it will be read. An example would be
     threshold=25
     anti-sense=yes
     index=cluster.23rdJuly

  cluster.index
     If this file is present when in UPDATE mode then  any  extra
     sequences are added to this existing index


SEE ALSO

     ICAtool(1),     ICAass(1),     ICAprint(1),     ICAstats(1),
     ICAmatches(1), tofasta(1), ssort(1), just30(1)


BUGS

     Doesn't use base ambiguity symbols properly: use only 'n' or
     'N' which are converted to random bases.