| Issue 59: | maximum recursion depth exceeded in tree traversal (python) | |
| 2 people starred this issue and may be notified of changes. | Back to list |
The following code results in python RuntimeError: maximum recursion depth
exceeded
import html5lib
p = html5lib.html5parser.HTMLParser()
d = p.parse("<small>"*1000+"</small>"*1000)
d.toxml()
Switch to use the dom tree or beautifulsoup tree yield the same result due
to how the tree traversal is implemented. serialize/treewalkers on the
same document also result in the same problem. The root cause is calling
the node __iter__() method recursively in order to traverse the tree.
Not sure if this is a problem worth fixing.
|
|
,
Nov 13, 2007
We should look at changing some of our recursive algorithms to iterative ones so cases like this work. |
|
,
Jun 01, 2008
I'm running into this problem a lot when parsing http://www.imdb.com/ or http://www.cnn.com/ and many other sites with beautifulsoup treebuilder. import html5lib, urllib u = "http://www.imdb.com" d = urllib.urlopen(u).read() p = html5lib.HTMLParser(tree=html5lib.treebuilder.getTreeBuilder("beautifulsoup") t = p.parse(d) Child File "build\bdist.win32\egg\html5lib\treebuilders\soup.py", line 34, in append Child File "build\bdist.win32\egg\html5lib\treebuilders\soup.py", line 94, in __init __ File "build\bdist.win32\egg\html5lib\treebuilders\_base.py", line 32, in __ini t__ RuntimeError: maximum recursion depth exceeded |
|
,
Jun 01, 2008
koalacha: That looks like issue 70 , which is fixed in the latest version on SVN trunk. |
|
,
Jun 08, 2008
(No comment was entered for this change.)
Labels: Port-Python
|
|
,
Jun 08, 2008
SimpleTree's toxml implementation is pretty "dumb", thus recursive, as is minidom's toxml implementation (blame minidom for this, not html5lib). However, serializer/treewalkers are non-recursive since a while ago, and thus don't produce the error. |
|
|
|