Issue 3521: Internal server error/exception in side-by-side diff for filename encoded ISO-8859-1
Status:  New
Owner: ----
Reported by tw201...@gmail.com, Aug 14, 2015
Affected Version: 2.11.2

What steps will reproduce the problem?
1. On a Linux box, accessed via PuTTY terminal set to ISO-8859-1: set LANG=en_US.iso88591
2. Create a file named "äöü.txt" (I used vi). Content also äöü. Save.
3. ls or more show the file äöü.txt
4. git add äöü.txt
5. git commit
6. vi opens, type commit message äöü; save
7. Using git-review from Openstack: git review --draft (I only tried draft changes)
8. Change is pushed to Gerrit. (Gerrit instance runs on a Linux box with LANG=en_US.UTF-8.) Go visit the change screen for the change. Everything looks good: commit message äöü shown, file list has a file
äöü.txt.
9. Click on the file name äöü.txt to open the side-by-side diff. 
10. Gray screen of death: Internal server error.

What is the expected output? What do you see instead?

Expected: side-by-side diff of file äöü.txt, empty on the left, with content äöü on the right.

Actual: gray screen of death. Exception in log:

[2015-08-14 16:56:01,561] ERROR com.google.gerrit.httpd.restapi.RestApiServlet : Error in GET /changes/11/revisions/efc47d5fc96ec1ac0fde4f9cbe168f23f95e9671/files/%C3%B6%C3%A9%C3%BC.txt/diff?context=ALL&intraline
java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 0
	at com.google.gerrit.prettify.common.SparseFileContent.get(SparseFileContent.java:75)
	at com.google.gerrit.server.change.GetDiff$Content.addDiff(GetDiff.java:335)
	at com.google.gerrit.server.change.GetDiff.apply(GetDiff.java:158)
	at com.google.gerrit.server.change.GetDiff.apply(GetDiff.java:72)
	at com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:324)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:725)
	at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:279)
	at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:269)
	at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:180)
	at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
	at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)
	at com.google.gerrit.httpd.GetUserFilter.doFilter(GetUserFilter.java:82)
	at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
	at com.google.gwtexpui.server.CacheControlFilter.doFilter(CacheControlFilter.java:73)
	at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
	at com.google.gerrit.httpd.RunAsFilter.doFilter(RunAsFilter.java:117)
	at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
	at com.google.gerrit.httpd.AllRequestFilter$FilterProxy$1.doFilter(AllRequestFilter.java:64)
	at com.google.gerrit.httpd.AllRequestFilter$FilterProxy.doFilter(AllRequestFilter.java:57)
	at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
	at com.google.gerrit.httpd.RequestContextFilter.doFilter(RequestContextFilter.java:75)
	at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
	at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)
	at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)
	at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)
	at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)
	at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
	at org.eclipse.jetty.server.handler.RequestLogHandler.handle(RequestLogHandler.java:95)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
	at org.eclipse.jetty.server.Server.handle(Server.java:497)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
	at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
	at java.lang.Thread.run(Thread.java:745)

Looks like JGit fails to find the blob. Not sure if that is a Gerrit error or a JGit bug.

My guess: filename is encoded in ISO-8859-1 and JGit expects either ASCII or UTF-8 and cannot deal with ISO-8859-1. Command-line git on the Linux box where I committed can show a diff:

$ git diff HEAD~1 HEAD -- $äöü.txt
diff --git a/äöü.txt b/äöü.txt
new file mode 100644
index 0000000..3c01580
--- /dev/null
+++ b/äöü.txt
@@ -0,0 +1 @@
+äöü
$

Linux version in both cases: RHEL 7.1
Command-line git version: 1.8.3.1
Gerrit version: 2.11.2, LANG=en_US.UTF-8

BTW: entering this bug report, the form on this bug tracker notified me "Issue attachment storage quota exceeded." Some server-side cleanup in his bug tracker needed?
Aug 15, 2015
#1 tw201...@gmail.com
Same behavior also if in step 7 I don't use "git review --draft" but do a "git push origin HEAD:/refs/drafts/master" directly.

So we can remove the Openstack git-review from the equation.
Aug 15, 2015
#2 tw201...@gmail.com
Just noticed that the exception trace was from a file named öéü.txt. I tried this several times and must have copied the wrong trace. In any case, the symptom is always the same.
Aug 16, 2015
#3 tw201...@gmail.com
Perhaps relevant: this old thread from 2009 starting here: http://dev.eclipse.org/mhonarc/lists/egit-dev/msg00344.html

Looks like nothing was done; the latest PathFilter still looks the same: https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/treewalk/filter/PathFilter.java

And looking at line 99 in PathFilter, method include(), I'd guess that this is comparing the ISO-8859-1 byte sequence from the repo in TreeWalk with the UTF-8 byte sequence in the PathFilter, which of course won't match and thus the file is not included in the walk. Which is a pity because TreeWalk otherwise is careful to convert paths to strings via RawParseUtil.decode(), which does fall back to ISO-8859-1. So I think if one used string comparisons instead of byte sequence comparisons to compare the path filter against the walk's current path, it might even work.
Sep 28, 2015
Project Member #4 dougk....@gmail.com
Hmm, I wonder if this is perhaps related (in some way) to Issue 2756?  At the very least, it's another case where non-ASCII filenames are treated poorly, even if I've not been able to come up with exact replication steps (other than the commit I know which is broken).