Introduction
We have different modes to handle unknown html tags when converting html to creole:
- Raise NotImplementedError on unknown tags.
- Use <<html>> macro to mask unknown tags.
- Escape all unknown tags.
- Remove all unknown tags.
As default behaviour we use the last one and remove all unknown html tags.
You can change the default behaviour by passing a callable to Html2CreoleEmitter() class or to html2creole() function.
examples
Raise NotImplementedError on unknown tags.
from creole import html2creole
from creole.shared.unknown_tags import raise_unknown_node
print html2creole(u"<unknown><strong>foo</strong></unknown>", unknown_emit=raise_unknown_node)
result:
Traceback (most recent call last):
...
NotImplementedError: Node from type 'unknown' is not implemented!
Use <<html>> macro to mask unknown tags.
from creole import html2creole
from creole.shared.unknown_tags import use_html_macro
print html2creole(u"<unknown><strong>foo</strong></unknown>", unknown_emit=use_html_macro)
result:
<<html>><unknown><</html>>**foo**<<html>></unknown><</html>>
Escape all unknown tags.
from creole import html2creole
from creole.shared.unknown_tags import escape_unknown_nodes
print html2creole(u"<unknown><strong>foo</strong></unknown>", unknown_emit=escape_unknown_nodes)
result:
<unknown>**foo**</unknown>
Remove all unknown tags.
from creole import html2creole
from creole.shared.unknown_tags import transparent_unknown_nodes
print html2creole(u"<unknown><strong>foo</strong></unknown>", unknown_emit=transparent_unknown_nodes)
result:
**foo**
complex example
You can also pass the callable to Html2CreoleEmitter():
from creole.html_parser.parser import HtmlParser
from creole.html2creole.emitter import CreoleEmitter
from creole.shared.unknown_tags import escape_unknown_nodes
h2c = HtmlParser(debug=False)
document_tree = h2c.feed(u"<unknown><strong>foo</strong></unknown>")
emitter = CreoleEmitter(document_tree, debug=False, unknown_emit=escape_unknown_nodes)
print emitter.emit()
result:
<unknown>**foo**</unknown>