|
Project Information
Members
Featured
Downloads
Wiki pages
Links
|
(chemfp for short) is a set of formats and related tools for the storage, exchange, and search of cheminformatics fingerprint data sets. Download chemfp-1.0.tar.gz. Note: this is a source distribution. Precompiled installers will be coming as soon as I can figure out how to make them. Cheminformatics FingerprintThe chem-fingerprints (chemfp) project goals are to define and promote common file formats for storing and exchanging cheminformatics fingerprint data sets, and to develop tools which work with that format. The FPS format is used for dense binary fingerprints of size less than about 10,000 bits, and usually only a few hundred bits. The UseCases page describes a few examples of how people might use the FPS format. People mostly use tools, not file formats. The chemfp distribution includes five command-line programs for working with FPS files:
DocumentationThe documentation includes descriptions of how to use the chemfp command-line tools to extract fingerprints from PubChem data or generate them from ChEBI data, and how to carry out theshold-based and k-nearest neighbor Tanimoto searches. For the programmers out there, the command-line tools are built on top of the "chemfp" Python library, and portions of the library are available for public use and documented. You can calculate population counts, but it's better to use the built-in Tanimoto search routines and compute a distance matrix or implement the Butina clustering algorithm. StatusThe project started in early 2010. chemfp-1.0a1 was released at the end of May 2011 and chemfp-1.0 released on 20 September 2011. The FPS format should be stable, the algorithms are tested, and the APIs documented. It's ready for you to use. There are many more things it can do. You can help out. If you had ideas, comment, or code contribution then join the mailing list or send email to me directly. AdvertisingI am an independent consultant who develops software for computational chemisty, with a focus on cheminformatics. Some of the other projects I helped start are VMD are and Biopython. If you are interested in hiring my services or funding the chemfp development, please email me. I also teach Python training courses for cheminformatics for researchers who want to be more effective at the software side of their research. Contact me if you are interested in me visiting your site to do training. |