Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

child elements are moved out of their parents #8

Closed
GoogleCodeExporter opened this issue Apr 27, 2015 · 8 comments
Closed

child elements are moved out of their parents #8

GoogleCodeExporter opened this issue Apr 27, 2015 · 8 comments

Comments

@GoogleCodeExporter
Copy link

> What steps will reproduce the problem?
Execute the attached testcase

> What is the expected output? What do you see instead?
When sanitizing, the sanitizer moves inner elements out of it's parent under 
certain circumstances (see examples in testcase).

I don't want the sanitizer to change the markup but to remove all contents that 
are not allowed.

> What version of the product are you using? On what operating system?
r135 / linux

Original issue reported on code.google.com by matzep...@gmail.com on 1 Feb 2013 at 12:10

Attachments:

@GoogleCodeExporter
Copy link
Author

updated testcase

Original comment by matzep...@gmail.com on 1 Feb 2013 at 12:15

Attachments:

@GoogleCodeExporter
Copy link
Author

Entering into http://html5.validator.nu/ the first example
    <p>123<p>abcdefg</p>456</p>
gives
    Error: No p element in scope but a p end tag seen.
    From line 1, column 24; to line 1, column 27
    efg</p>456</p>↩

because the </p> at the end doesn't close a tag.  The second <p> closes the 
first <p> per HTML5 parsing rules.
http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.ht
ml#parsing-main-inbody says

"""
A start tag whose tag name is one of: "address", "article", "aside", 
"blockquote", "center", "details", "dialog", "dir", "div", "dl", "fieldset", 
"figcaption", "figure", "footer", "header", "hgroup", "main", "menu", "nav", 
"ol", "p", "section", "summary", "ul"
If the stack of open elements has a p element in button scope, then act as if 
an end tag with the tag name "p" had been seen.

Insert an HTML element for the token.
"""

which means that when a <p> is seen inside a <p>, an implicit </p> is seen, so
    <p>123<p>abcdefg</p>456</p>
is equivalent to
    <p>123</p><p>abcdefg</p>456

which is what the HTML sanitizer produces.

By understanding browser tag nesting rules, the sanitizer avoids a lot of 
ambiguity in HTML, and can produce output that will be consistently and safely 
interpreted by a variety of browsers.

----

Sanitizers.BLOCKS.sanitize("<div><meta/><p>abcdefg</p></div>")

should not produce

"<div><meta/><p>abcdefg</p></div>"

since <meta> is not a block tag, and is not even allowed in the body.

----

Marking this bug invalid.  Please reopen if you feel this was in error.

Original comment by mikesamuel@gmail.com on 2 Feb 2013 at 6:19

  • Changed state: Invalid

@GoogleCodeExporter
Copy link
Author

The paragraph handling has just been added to illustrate the sanitizers 
behaviour.

However, the meta-tag is a real problem for us as thunderbird generates markup 
like "<blockquote><meta></blockquote>" all the time and we have to display this 
for our users correctly. However, this becomes hard because the sanitizer 
modifies the markup during the removal of the meta-tag. I just want the 
sanitizer to remove the meta-tag which is currently not possible.

Please reopen as I'm not allowed to...

Kind regards
Matthias

Original comment by matzep...@gmail.com on 4 Feb 2013 at 12:30

@GoogleCodeExporter
Copy link
Author

Reopened.

Is the problem that you're doing something like

    PolicyFactory policy = new HtmlPolicyBuilder()
      .allowCommonBlockElements()
      .allowElements("meta")
      .toFactory();
     String htmlSnippet = "<blockquote><meta></blockquote>";
     String sanitized = policy.sanitize(htmlSnippet);
     System.out.println(sanitized);

and you get

    <blockquote></blockquote></body><meta />

?

Original comment by mikesamuel@gmail.com on 5 Feb 2013 at 9:42

  • Changed state: New

@GoogleCodeExporter
Copy link
Author

Closing for lack of response.  Re the attached test case:

> assertEquals("<p>123<p>abcdefg</p>456</p>",
>                     
Sanitizers.BLOCKS.sanitize("<p>123<p>abcdefg</p>456</p>"));

the test golden is invalid.  <p> tags do not nest in HTML.

> assertEquals("<div><meta/><p>abcdefg</p></div>",
>                     
Sanitizers.BLOCKS.sanitize("<div><meta/><p>abcdefg</p></div>"));

is also invalid since <p> tags cannot be direct children of <div> elements.
You can white-list <meta> elements if you like using a custom policy, but 
<meta> is not a block element so should be Sanitizers.BLOCKS.

Original comment by mikesamuel@gmail.com on 24 Jul 2013 at 4:00

  • Changed state: WontFix

@GoogleCodeExporter
Copy link
Author

Hello everyone!

We have a similar behaviour in this case:
assertEquals("<h1>TEXT</h1>", 
Sanitizers.BLOCKS.sanitize("<H1><center>TEXT</H1>"));

For this one the result is:
<h1></h1>TEXT
instead of:
<h1>TEXT</h1>

But test case:
assertEquals("<h1>TEXT</h1>", 
Sanitizers.BLOCKS.sanitize("<H1></center>TEXT</H1>"));

works as expected:
<h1>TEXT</h1>

What's wrong with the first one?

I would appreciate your feedback to this case.

Original comment by a.chichi...@semrush.com on 29 Sep 2014 at 7:08

@GoogleCodeExporter
Copy link
Author

#6, filed as 
https://code.google.com/p/owasp-java-html-sanitizer/issues/detail?id=33

Original comment by mikesamuel@gmail.com on 1 Oct 2014 at 12:46

@harischandraprasad
Copy link

Version: r239.

I am facing similar issue with <font> along with <div>.
<div> along with its content is moved outside <font>.
Due to this <font> is not applied to the content in <div>.

Sample code snippet:

PolicyFactory policy = new HtmlPolicyBuilder()
            .allowCommonBlockElements()
            .allowElements("font")
            .allowAttributes("face", "size").onElements("font")
            .toFactory();
        String htmlSnippet = "<font face=\"Calibri\" size=\"2\"><div>Hi Hari</div></font>";
        String sanitized = policy.sanitize(htmlSnippet);

Original:
<font face="Calibri" size="2"><div>Hi Hari</div></font>

Sanitized:
<font face="Calibri" size="2"></font><div>Hi Hari</div>

Is this issues can also be covered in above issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants