Issue 2756: Rebase with UTF-8 characters in repository causes unnecessary renames
Status:  New
Owner: ----
Project Member Reported by dougk....@gmail.com, Jul 3, 2014

Affected Version: 2.8.3, 2.9

What steps will reproduce the problem?
I wish I could find a simple reproduction case that will show this bug, but so far, I have not.  The use case that triggered this, however, was as follows:

Repository contains, among other things, a set of files with UTF-8 characters in their names.  Specifically "Wü_*", "Mü_*", and "Dü_*" (there are several files matching these patterns.

Patchset 1 is created that edits unrelated files, and Patchset 2 modifies the same file in Patchset 1, but does not directly conflict (i.e. a simple rebase works).  Patchset 1a is created and master is set to point to Patchset 1a.  Patchset 2 is rebased onto Patchset 1a (using Gerrit's "rebase"), and I notice that several files are renamed in the process: specifically, "Wü" ("W\303\274") becomes "W\357\277\275" (and similar issues for the other diacritics), but the rebase otherwise works.

I can't specifically pin this down to Gerrit or JGit yet (especially since it appears Gerrit includes its own rebase logic, handling the three-way merge inside MergeUtil), but somewhere along the line, it's as if a tree is getting completely mangled.  In fact, the files which get renamed are in a completely separate tree than the files which are changed.
Jul 3, 2014
Project Member #1 dougk....@gmail.com
This might be related to the ResolveMerger in JGit -- at least as of mergeTrees, I can see the corrupted filenames.  I'm not familiar enough with JGit to understand this class fully, though... it may take some time.  But, I think it's pretty safe to say it's in JGit.