My favorites | Sign in
Project Logo
                
New issue | Search
for
| Advanced search | Search tips
Issue 59: maximum recursion depth exceeded in tree traversal (python)
2 people starred this issue and may be notified of changes. Back to list
Status:  New
Owner:  ----
Type-Defect
Priority-Medium
Port-Python


Sign in to add a comment
 
Reported by shawn.hs...@gmail.com, Nov 03, 2007

The following code results in python RuntimeError: maximum recursion depth
exceeded

import html5lib
p = html5lib.html5parser.HTMLParser()
d = p.parse("<small>"*1000+"</small>"*1000)
d.toxml()

Switch to use the dom tree or beautifulsoup tree yield the same result due
to how the tree traversal is implemented.  serialize/treewalkers on the
same document also result in the same problem.  The root cause is calling
the node __iter__() method recursively in order to traverse the tree.

Not sure if this is a problem worth fixing.
Comment 1 by jgraham.html, Nov 13, 2007
We should look at changing some of our recursive algorithms to iterative ones so
cases like this work.
Comment 3 by koalacha, Jun 01, 2008
I'm running into this problem a lot when parsing http://www.imdb.com/ or
http://www.cnn.com/ and many other sites with beautifulsoup treebuilder.

import html5lib, urllib
u = "http://www.imdb.com"
d = urllib.urlopen(u).read()
p = html5lib.HTMLParser(tree=html5lib.treebuilder.getTreeBuilder("beautifulsoup")
t = p.parse(d)

Child
  File "build\bdist.win32\egg\html5lib\treebuilders\soup.py", line 34, in append
Child
  File "build\bdist.win32\egg\html5lib\treebuilders\soup.py", line 94, in __init
__
  File "build\bdist.win32\egg\html5lib\treebuilders\_base.py", line 32, in __ini
t__
RuntimeError: maximum recursion depth exceeded
Comment 4 by excors, Jun 01, 2008
koalacha: That looks like  issue 70 , which is fixed in the latest version on SVN 
trunk.
Comment 5 by jgraham.html, Jun 08, 2008
(No comment was entered for this change.)
Labels: Port-Python
Comment 6 by t.broyer, Jun 08, 2008
SimpleTree's toxml implementation is pretty "dumb", thus recursive, as is minidom's 
toxml implementation (blame minidom for this, not html5lib).

However, serializer/treewalkers are non-recursive since a while ago, and thus don't 
produce the error.
Sign in to add a comment

Hosted by Google Code