Issue 1930: 'commentlink' breaks on HTML coded characters
Status:  New
Owner: ----
Reported by omark...@gmail.com, May 27, 2013
************************************************************
***** NOTE: THIS BUG TRACKER IS FOR GERRIT CODE REVIEW *****
***** DO NOT SUBMIT BUGS FOR CHROME, ANDROID, INTERNAL *****
***** ISSUES WITH YOUR COMPANY'S GERRIT SETUP, ETC.    *****
***** THOSE ISSUE BELONG IN DIFFERENT ISSUE TRACKERS!  *****
************************************************************

Affected Version: 2.5.2

What steps will reproduce the problem?
1. Set the commentlink regex to: match = "(#)(\\d+)" (i.e. match #123)
2. Include some special character in the commit message (I tested with a single quote)
3. Visit the Webinterface of the change you just pushed

What is the expected output? What do you see instead?

I would expect the single quote to be displayed as such. Instead I'm shown ' the #39 being a hyperlink.

Please provide any additional information below.

I only tested with the single quote but I assume other HTML characters should/could be affected as well. http://www.w3.org/MarkUp/html-spec/html-spec_13.html
May 28, 2013
#1 omark...@gmail.com
One way to get around this it to use following regex: "#(\\d+)(?!;|\\d+)"
Explanation: #Number(s) not followed by ; or more numbers (otherwise ' would match with #3).
Oct 16, 2013
#2 mycr...@gmail.com
Or don't match any #<number> preceded by &:

(?<!&)#\d+
Oct 16, 2013
#3 mycr...@gmail.com
Actually in Gerrit notation: "(?<!&)#(\\d+)". Note that the first group is not a capturing group (so link should contain $1 for the number).