Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML allowed when it should not be? #124

Closed
GoogleCodeExporter opened this issue Jul 19, 2015 · 31 comments
Closed

HTML allowed when it should not be? #124

GoogleCodeExporter opened this issue Jul 19, 2015 · 31 comments

Comments

@GoogleCodeExporter
Copy link

When OKAPI puts comments into the cache_logs table, it sets the "text_html" 
flag to FALSE. Unfortunatelly, OC code doesn't handle this flag properly and 
STILL assumes, that the input is in HTML (that's a serious bug, though not in 
OKAPI code).

Some developers have adapted this "bug/feature" and use HTML tags when 
submitting the "comment" field.

Something needs to be fixed here...

Original issue reported on code.google.com by rygielski on 14 Jun 2012 at 7:26

@GoogleCodeExporter
Copy link
Author

BTW, OC code seems to be running complex reformatting code before writing AND 
after reading this value. They never simply assume it to be plain-text. :-/

Original comment by rygielski on 14 Jun 2012 at 7:42

@GoogleCodeExporter
Copy link
Author

All the above was checked against OCPL code only, I don't know about OCDE 
branch yet.

Original comment by rygielski on 14 Jun 2012 at 7:43

@GoogleCodeExporter
Copy link
Author

OCPL stores HTML-lized comments in the database (it doesn't really 
matter if 'text_html' field is set to FALSE). OKAPI must save it in HTML 
either way. However, escaping plain-text doesn't work. If we put 
"<b>" in, it still gets converted to "" before display! NONE of 
this process is documented. There doesn't seem to be ANY way to force 
OCPL to treat a string as either plain-text nor HTML. It's always 
something in between! In other words, we cannot guarantee how it gets 
displayed in the end. Since text_html=0 doesn't add <br/>s on its own, 
we can only add proper <br/>s and hope it's okay. 

Original comment by rygielski on 14 Jun 2012 at 8:38

@GoogleCodeExporter
Copy link
Author

This issue was closed by revision r388.

Original comment by rygielski on 14 Jun 2012 at 8:50

  • Changed state: Fixed

@GoogleCodeExporter
Copy link
Author

Actually, it is NOT fixed. It is now "more consistently broken".

Original comment by rygielski on 14 Jun 2012 at 8:51

@GoogleCodeExporter
Copy link
Author

I have verified this with OC.de and came to the conclusion that the log storage 
format is odd, but it is not broken.

There are two types of log formats:

a) plain text logs: text_html=0, and the text is stored in a "HTML display" 
form, i.e. with HTML entities encoded and line feeds replaced by <br />. The 
application will either display it this way, or decode this to its original 
form if necesseray, e.g. when the log is to be edited or another encoding is 
needed.

b) HTML logs: text_html=1, and the text is stored as HTML code as provided by 
the user.

The current OKAPI implementation correctly creates the form (a). In the code 
comments, you write:

# If we put "<b>" in, it still gets converted to "" before display!

This is not true for OC.de - the log texts are passed to the web browser 
unencoded*, and the browser will display it as plain text "". I can hardly 
believe that OC.pl decodes this to "" before output ...

* Actually, OC.de code does two modifications when displaying text_html=0 logs: 
it replaces ASCII smileys by images, i.e. ":-)" by a <img 
src=".../smiley-smile.gif" />, and it inserts <a href...> tags for hyperlinks.

# (it doesn't really matter if 'text_html' field is set to FALSE)

text_html MUST be set to 0 for plain text logs - otherwise they would be 
presented to the user in their HTML-encoded form when re-editing the log!


Conclusion: The services/logs/submit documentation is not correct at least 
regarding OC.de and should be updated.

Enabling HTML logs by OKAPI would be nice, too. :-) Of course, implementation 
would be complex: It would need a HTML purifier, like in the OC or GC code, 
which strips "bad tags" from log texts and maybe closes open tags (the latter 
one still is missing at OC.de, but GC does ist).

Original comment by following09 on 28 Mar 2013 at 1:40

  • Changed state: Accepted
  • Added labels: Priority-Medium
  • Removed labels: Priority-High

@GoogleCodeExporter
Copy link
Author

> the log texts are passed to the web browser unencoded

sorry, this should read: "the log texts are passed to the web browser 
unDEncoded"

Original comment by following09 on 28 Mar 2013 at 2:42

@GoogleCodeExporter
Copy link
Author

Unfortunatelly, I don't have time to dig into it again. If you want, you may 
fix the docs, but I think you should assume that comment 3 is true for OCPL (I 
am not 100% sure, but usually I don't make mistakes when I check for such 
things).

Original comment by rygielski on 28 Mar 2013 at 5:33

@GoogleCodeExporter
Copy link
Author

This issue was updated by revision r604.

Original comment by rygielski on 30 Mar 2013 at 1:11

@GoogleCodeExporter
Copy link
Author

The new comment_format option is broken: HTML comments must be saved with 
text_html=1, not text_html=0!

If you set text_html=1 for comment_format=HTML, everything is fine. I have 
tested this on OC.de and verified in OC.pl source code - there is no decoding 
when displaying cache_logs.text, just smiley replacement and inserting 
hyperlinks.

Original comment by following09 on 30 Mar 2013 at 3:48

  • Added labels: Priority-High
  • Removed labels: Priority-Medium

@GoogleCodeExporter
Copy link
Author

> If you set text_html=1 for comment_format=HTML, everything is fine. 

Rethought: It is NOT fine, because HTML data must not be inserted as it is into 
the database. It must be purified first to ensure that it does not contain 
malicious tags, javascript code etc. There is no purifing afterwards before 
display!

The comment_format option in this form is not useful: with text_html=0 it is 
broken, and with text_html=1 it would be dangerous. It should be discarded and 
not be reimplemented before OKAPI includes an HTML cleaner.

Original comment by following09 on 30 Mar 2013 at 3:53

@GoogleCodeExporter
Copy link
Author

I don't follow. This parameter is ignored, and the documentation states it is 
ignored. What exactly do you want me to do?

Original comment by rygielski on 30 Mar 2013 at 4:25

@GoogleCodeExporter
Copy link
Author

This parameter encourages developers to submit HTML log comments, which does 
not work.

> What exactly do you want me to do?

Remove the comment_format parameter and document that OKAPI currently does not 
support HTML logging.

Original comment by following09 on 30 Mar 2013 at 4:33

@GoogleCodeExporter
Copy link
Author

Just to clarify:

1. I insert a new log entry: http://i.imgur.com/sGFcSqT.png
2. In the database it has text_html=0: http://i.imgur.com/Qe8FfeJ.png
3. It is still displayed in HTML: http://i.imgur.com/jGUMUiD.png

I don't see a way to fix it in OKAPI.

Original comment by rygielski on 30 Mar 2013 at 4:34

@GoogleCodeExporter
Copy link
Author

> Remove the comment_format parameter and document that OKAPI
> currently does not support HTML logging.

But it kind of does, and I can't turn it off. As I said, many developers 
started to use this "hidden" functionality already. I would like them to 
indicate that they are doing it on purpose, that's why I added this parameter.

Perhaps, it should work differently on OCDE, but I think it is the best way for 
OCPL.

Original comment by rygielski on 30 Mar 2013 at 4:37

@GoogleCodeExporter
Copy link
Author

Shit, then I must have overlooked some decoder in the OC.pl code. :-(


Original comment by following09 on 30 Mar 2013 at 4:45

Attachments:

@GoogleCodeExporter
Copy link
Author

Sry, first screenshot was wrong, I had submitted the log entry as Text and 
later faked it because I had saved it to some wrong directory and did not find 
it when uploading here. ;-) This is was I did as step 1:

Original comment by following09 on 30 Mar 2013 at 4:49

Attachments:

@GoogleCodeExporter
Copy link
Author

Here it is, in OC.pl code

viecache.php:
  $rs = sql("SELECT ... `cache_logs`.`text` `text` ... FROM `cache_logs` ...");
  $record = sql_fetch_array($rs);
  $tmplog_text = $record['text'];
  $tmplog_text = str_replace($smileytext, $smileyimage, $tmplog_text);
  $tmplog_text = help_addHyperlinkToURL($tmplog_text);
  $tmplog_text = tidy_html_description($tmplog_text);

common.inc.php:
  function tidy_html_description:
    $tidy =  tidy_parse_string(html_entity_decode($text, ENT_NOQUOTES, "UTF-8"), ...);

This is not done in OC.de code.

Original comment by following09 on 30 Mar 2013 at 5:06

@GoogleCodeExporter
Copy link
Author

This issue was updated by revision r610.

Original comment by following09 on 30 Mar 2013 at 5:32

@GoogleCodeExporter
Copy link
Author

To sum up: Currently, OKAPI submits HTML logs for OCPL, and PLAINTEXT logs for 
OCDE. The 'submit' docs are still unclear on that.

Alternative:

Could you implement HTML for OCDE? If you could, then I would also try to 
implement PLAINTEXT for OCPL. This way, comment_format would become valid.

Original comment by rygielski on 30 Mar 2013 at 7:03

@GoogleCodeExporter
Copy link
Author

I could implement it if you allow me using this code from within OKAPI:

https://github.com/OpencachingDeutschland/oc-server3/tree/master/lib/htmlpurifie
r-4.2.0

This library is stable, though we have lots issues on improving HTML 
purification in OC.de todo list, either because it throws away tags or 
attributes which need not to be thrown away, or because it does not close open 
tags. This might be solved by

- updating or customizing htmlpurifier-4.2.0, or
- looking for a replacement which does it better, or
- writing a own one,

but that is not on my todo list for 2013.

Original comment by following09 on 30 Mar 2013 at 7:29

@GoogleCodeExporter
Copy link
Author

Okay. Since this library would be needed for OCDE only, you could include it 
directly from OCDE code (without committing it to OKAPI). BTW, OCPL's purifying 
is crappy too.

Original comment by rygielski on 30 Mar 2013 at 7:42

@GoogleCodeExporter
Copy link
Author

This issue was updated by revision r637.

Original comment by following09 on 31 Mar 2013 at 2:54

@GoogleCodeExporter
Copy link
Author

Fixed in r642.

Take a look at it. I modified OCDE part slightly, but I tested on OCPL only.

Original comment by rygielski on 31 Mar 2013 at 5:25

  • Changed state: Fixed

@GoogleCodeExporter
Copy link
Author

This issue was updated by revision r644.

Original comment by following09 on 31 Mar 2013 at 5:58

@GoogleCodeExporter
Copy link
Author

Documentation should make clear what 'auto' is. On OC.de it must NOT be used on 
plaintext comments, because the HTMLPurifier might strip things from the text 
which look like (unwanted) html tags.

Original comment by following09 on 31 Mar 2013 at 6:13

  • Changed state: Started
  • Added labels: Priority-Medium
  • Removed labels: Priority-High

@GoogleCodeExporter
Copy link
Author

They are many existing apps which do not know about comment_format's existence. 
Some of them submit plaintext, some of them submit html. We need to add the 
auto option in order for these apps to work properly with OCDE (and to be 
backward compatible with OCPL too).

> On OC.de it must NOT be used on plaintext comments

It is not used for plaintext comments, just the "auto" ones.

BTW, with this configuration (auto paragraphs), HTMLPurifier works fine for 99% 
of usual plaintext entries: http://i.imgur.com/T0897Cc.png

I added a link to this issue, so that future developers may know exactly what's 
going on. Fixed in r646.

Original comment by rygielski on 31 Mar 2013 at 7:13

  • Changed state: Fixed

@GoogleCodeExporter
Copy link
Author

Thanks for the documentation update and thumbs-up. :-)

> HTMLPurifier works fine for 99% of usual plaintext entries:

Ok, but that won't help if the HTML entities are not encoded after purifing. 
Auto MUST NOT be used on plain text on any OC node except the current/old OC.nl 
installation, i.e. all current applications which rely on 'comment' field to be 
plain text format (except on OC.nl) are broken should be fixed.

Original comment by following09 on 31 Mar 2013 at 7:27

@GoogleCodeExporter
Copy link
Author

I don't understand. Give me an example - when does something break?

Original comment by rygielski on 31 Mar 2013 at 8:54

@GoogleCodeExporter
Copy link
Author

Adding nl2br() for 'auto' option (r650) resolves the main problem, i.e. 
eaten-up line breaks.

Here is an example for the remaining problem with auto + plaintext. Surely it 
won't happen often that someone enters plain text which looks like HTML tags. 
Someone hit the wrong key - 'a' is directly besides of '<':

Log text:
"We went <a 10 km but far > 5 km to find the cache."

OKAPI submit request: 
http://local.opencaching.de/oc-server/server-3.0/htdocs/okapi/services/logs/subm
it?cache_code=OC10004&logtype=Comment&comment=We+went+%3Ca+10+km+but+far+%3E5+km
+to+find+the+cache.

Result:

Original comment by following09 on 31 Mar 2013 at 10:05

Attachments:

@GoogleCodeExporter
Copy link
Author

This issue was updated by revision r651.

Original comment by following09 on 31 Mar 2013 at 10:24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant