| Issue 67: | Channel detection in posts is over-eager and breaks links that contain # character | |
| 2 people starred this issue and may be notified of changes. | Back to list |
What steps will reproduce the problem? 1. Post a link like: http://en.wikipedia.org/wiki/Free_software#Definition 2. 3. What is the expected output? What do you see instead? This should be treated as a link to a particular section of a wikipedia page. Instead the part of the link saying #Definition has been treated as a reference to a channel called #Definition rather than as part of the link You can see an example here: http://openku.appspot.com/user/openku/presence/a43026ffb79549d6b2d3b4fca5a64dc0 Please use labels and text to provide additional information.
Mar 28, 2009
a dirty quick fix would be to check if '#' is after a '\s' or after a line start
Mar 29, 2009
Take a look at this: http://adewale.jaiku.com/presence/5c2cf07037954357aa43c7cd5030b5a2 This shows that the real problem is that Jaiku formatting is being applied to http links. The problem with the channels and the breaking of links that have underscores in them are all instances of the same underlying bug. We should be looking at some way to detect that something is an http or https link and making sure that none of the Jaiku formatting rules are applied.
Mar 29, 2009
I thought I could use python-markdown2's <a href="https://code.google.com/p/python- markdown2/wiki/LinkPatterns">link-patterns</a> to solve this but that suffers from the same problem as #jaikuengine: def _do_link_patterns(self, text): """Caveat emptor: there isn't much guarding against link patterns being formed inside other standard Markdown links, e.g. inside a [link def][like this]. Dev Notes: *Could* consider prefixing regexes with a negative lookbehind assertion to attempt to guard against this. """
Mar 29, 2009
@adewale: It's not that simple. Jaikuengine applies formatting according to this scheme: 1. Markdown conversion — turns everything into html according to markdown's formatting rules. 2. Autolinking — makes links out of urls that wasn't converted during markdown. 3. Actor linking — makes links out of #channels and @usernames. * http://example.com/test_underscore_problem is supposed to be handled by step 2 but the underscores are converted to em in step 1. * http://example.com/#test is is supposed to be handled by step 2 but is messed up by step 3. * [test](http://example.com/test_underscore_problem) works fine since conversion is handled fully in step 1. * [test](http://example.com/#test) is supposed to be handled by step 1 but is messed up by step 3. * [#test](http://example.com/) is supposed to be handled by step 1 but is messed up by step 3. * [http://jaiku.com](http://test.com) is supposed to be handled by step 1 but is messed up by step 2. etc.
Mar 29, 2009
By the way, I have a patch for the regexp that seem to work in many cases (but not all) — should I upload that to rietku while we think about how to solve this issue once and for all?
Mar 29, 2009
Please upload it. A partial solution with tests will move us closer to a full solution that still passes those same tests. |
#Channel and @actor replacement is done after markdown conversion and because of that needs more advanced regexps than just r'#([a-zA-Z][a-zA-Z0-9]{%d,%d})'. It's possible to improve the regexps but to avoid issues altogether it might be better to use DOM mode instead. Since this replacement is done every time a comment is displayed, going to DOM could have performance implications. What do you think, should I try to improve the regexp or try to do something that's failsafe?