My favorites | Sign in
Project Home Downloads Wiki Issues Source
New issue   Search
for
  Advanced search   Search tips   Subscriptions
Issue 197: html5lib can parse multiple <script> tags next to each in body.
1 person starred this issue and may be notified of changes. Back to list
Status:  Invalid
Owner:  ----
Closed:  Feb 2012


Sign in to add a comment
 
Reported by jwight, Feb 1, 2012
What steps will reproduce the problem?

Use a lxml tree builder and tree walker to load and dump HTML containing script elements in the body.

The input file looks like:

<!DOCTYPE html>
<html lang="en">
    <head><title></title></head>
    <body>
        <script src="a.js"></script>
        <script src="b.js"></script>
    </body>
</html>

What is the expected output? What do you see instead?

When serialising the stream I see:

<!DOCTYPE html><html lang=en><title></title></head>
    <body>
        <script src=a.js></script>
        <script src=b.js></script>

Note the busy </body> and </html> tags.

Please provide any additional information below.

All in one script included.
test.py
792 bytes   View   Download
Feb 1, 2012
#1 jwight
"Note the busy" -> "note the missing"
Feb 2, 2012
#2 t.broyer
HTML serialization omits optional tags, and the <head> start tag and </body> and </html> end tags are optional in the document you parsed (</head> and <body> are not because they're followed by space characters).

See http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html#optional-tags
Status: Invalid
Sign in to add a comment

Powered by Google Project Hosting