wiseparser


php HTML parser

it's more a try to get more reliable parser than the simple HTML DOM parser and more mature when it comes to really bad HTML.

It uses a syntax close to Perl HTML::Treebuilder.

How to use:

require_once('treebuilder.php');


$mytree = new Tree();
$mytree->parse_content('<div>Hello world</div>');
// or
$mytree->parse_file('http://www.google.com');
$mytree->parse_file('myfile.htm');

// To print HTML, just do:
echo $mytree;

// For those of you who familiar with HTML::Treebuilder, usage is almost the same. Implemented methods:

// same as HTML::Element
Element->attr($attr, $value = null);
Element->tag($tag = null);
Element->look_down($keys);
Element->traverse($callback, $text_only=false);
Element->push_content($test_or_node, ..);
Element->unshift_content($test_or_node, ..);
Element->detach();
Element->preinsert($test_or_node, ..);
Element->postinsert($test_or_node, ..);
Element->right(); // returns right sibling if any
Element->left(); // returns left sibling if any
Element->pindex(); // returns index in parent's children array
Element->detach_content(); //detaches all children nodes and returns them
Element->detach_content(); //deletes all children nodes and returns self
Element->as_HTML();
Element->as_text();
// plus one additional method:
Element->seek_n_destroy($keys); // same as look_down()->__destruct();

// same as HTML::Treebuilder:
Tree->parse_content($content);
Tree->parse_file($filename_or_url);

Project Information

Labels:
PHP HTML parser