My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members
Featured
Downloads
Links

CD-HIT is a program for clustering DNA/protein sequence database at high identity with tolerance.

References:

  • Clustering of highly homologous sequences to reduce the size of large protein database, Weizhong Li, Lukasz Jaroszewski & Adam Godzik, Bioinformatics, (2001) 17:282-283. PDF Pubmed
  • Tolerating some redundancy significantly speeds up clustering of large protein databases, Weizhong Li, Lukasz Jaroszewski & Adam Godzik, Bioinformatics, (2002) 18:77-82. PDF Pubmed
  • Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences, Weizhong Li & Adam Godzik, Bioinformatics, (2006) 22:1658-9. PDF Pubmed
Powered by Google Project Hosting