|
Project Information
|
INTRODUCTIONThis is a simple class which implements bloom filter in PHP. It is probably not useful for any particular purpose out-of-the-box because it does not deal with storage issues. Feel free to hack it to suit your needs or drop a ticket if you need a specific feature. A Bloom filter is a simple space-efficient randomized data structure for representing a set in order to support membership queries. Bloom filters have a great deal of potential for distributed protocols where systems need to share information about what data they have available e.g. for web cache sharing. For a nice visual demo of how bloom filters work see http://la.ma.la/misc/js/bloomfilter/. For more information on the theory behind bloom filters read this paper: http://www.box.net/shared/zvmgyyixe3 DOWNLOADThere is not download. For now get the source from SVN. REQUIREMENTSYou need pecl/bitset http://pecl.php.net/package/Bitset. SIMPLE USE//initialize the bloom filter
$bf = new BloomFilter(3000000);
$f = fopen('/usr/share/dict/words','r');
while(!feof($f)) $bf->add(trim(fgets($f)));
//write the bit field to file for later use
file_put_contents('bloom',$bf->field);
//initialize the bloom filter from file
$bf = BloomFilter::init(file_get_contents('bloom'));
//test some membership queries
$test_words = array('this','library','does','bloom','filters','in','php');
foreach ($test_words as $word) if ($bf->has($word)) echo $word."\n";
|