|
Project Information
Members
Featured
Downloads
Links
|
News: all of the few remaining calls to scipy have been replaced with calls to numpy. Versions 0.1.8 and above do not require scipy as a dependency. IntroductionThis library provides Python functions for agglomerative clustering. Its features include
The interface is very similar to MATLAB's Statistics Toolbox API to make code easier to port from MATLAB to Python/Numpy. The core implementation of this library is in C for efficiency. Setup and InstallationWindowsInstall dependenciesInstall Numpy by downloading the installer and running it. Make sure to run the installer for your version of Python (only Python versions 2.4 or 2.5 are supported). If you use hcluster for plotting dendrograms, you will need matplotlib. Again, download the matplotlib installer for your version of Python. Scipy is optional. Note: The few remaining calls to scipy have been replaced with numpy calls. Scipy is no longer required for hcluster. Install hclusterNote: If you previously installed hcluster, remove it by going to Control Panel::Add/Remove Programs. Download the installer that corresponds to your Python version. Run it. OptionalInstall the IPython and pyreadline libraries for a more user-friendly console interface to Numpy, Scipy, and Matplotlib. Ctypes is required for Python 2.4. Pypihcluster is available in the pypi index. LinuxDebianhcluster is available as a Debian package. Type apt-get install python-hcluster to install python-hcluster and its dependencies. Thanks to Michael Hanke for packaging. FreeBSDhcluster is available as a FreeBSD package. Type cd /usr/ports/science/py-hcluster/ && make install clean to install python-hcluster as a port. Otherwise, type pkg_add -r py25-hcluster to add as a package. Thanks to Wen Heping for packaging. UbuntuRequired Install numpy (required) by typing the following shell command as root: apt-get install python-numpy Required For building from source on Ubuntu 9.01 or higher: apt-get install python-dev Optional Install optional packages by typing the following shell commands as root: apt-get install python-matplotlib # needed for dendrograms apt-get install ipython apt-get install python-scipy Then follow the instructions for building from source on UNIX. Fedora and Red Hat EnterpriseRequired Install numpy (required) by typing the following shell command as root: yum install numpy Optional Install optional packages by typing the following shell commands as root: # The following are optional yum install matplotlib # needed for dendrograms yum install ipython yum install scipy to install Numpy, Scipy, and matplotlib. Then follow the instructions for building from source on UNIX. Build from source on UNIXDownload the source tar ball, unpack it, and go into the source directory. gzip -cd hcluster-XXX.tar.gz | tar xvf - cd hcluster-XXX Build the package by running the setup.py script with build as the build command. python setup.py build Install the package to a prefix of your choice (e.g. /afs/qp/lib/python2.X/site-packages) with install as the build command. python setup.py install --prefix=/afs/qp The --prefix option is optional and defaults to /usr/local on UNIX. hcluster FunctionsThe hcluster Python library has an interface that is very similar to MATLAB's suite of hierarchical clustering functions found in the Statistics Toolbox. Some of the functions should be familiar to users of MATLAB (e.g. linkage, pdist, squareform, cophenet, inconsistent, and dendrogram). The fcluster and fclusterdata are equivalent to MATLAB's cluster and cluseterdata functions. All of the functions in this library reside in the hcluster package, which must be imported prior to using its functions. Python HelpIf you are unfamiliar with python, the Python Tutorial is a good start. If you are looking for a good reference book, I highly recommend David Beazley's Python Essential Reference. It is by far the most comprehensive book I've come across, covering most of python's functionality with a very complete index. A Quick ExampleThis script imports the pdist, linkage, and dendrogram functions. It then generates 10 random 100-dimensional observation vectors (with pdist), hierarchically clusters them (with linkage), and visualizes the result (with dendrogram). from hcluster import pdist, linkage, dendrogram import numpy from numpy.random import rand X = rand(10,100) X[0:5,:] *= 2 Y = pdist(X) Z = linkage(Y) dendrogram(Z) Function ListingFlat cluster formation
Agglomerative cluster formation
Distance matrix computation from a collection of raw observation vectors
Statistic computations on hierarchies
Visualization
Tree representations of hierarchies
Distance functions between two vectors u and v
Predicates
Copyright (C) Damian Eads, 2007-2010. All Rights Reserved. MATLAB is a registered trademark of the Mathworks Corporation |