Issue 1827: Replication + authGroup eventually replicate
Status:  Submitted
Owner:
Closed:  Apr 2013
Reported by manuel.v...@gmail.com, Mar 13, 2013
Affected Version: 2.5.2

What steps will reproduce the problem?
1. Set-up replication between 2 servers using authGroup and group "example-replication" (empty group, "visible to all members").
2. Create a project using CLI create-project (lets call it "foobar")
3. Set-up permission by cloning refs/meta/config:
[access "refs/*"]
Read = group example-replication
...
Push those changes

4. push some content in the newly create project (foobar master).

What is the expected output? What do you see instead?

Push should be replicated but it's not (even after delay much longer than replication delay, like 1h).

If you play around the Access right page (add some permissions, remove others) you might eventually get the content replicated.


Please provide any additional information below.

If you remove "authGroup" (and restart server) from replication config, everything works like a charm.

Mar 14, 2013
#1 manuel.v...@gmail.com
I made the same tests with gerrit and replication plugins built out of master:
- gerrit 2.5.2-1684-g443a3a1
- replication 1.1-SNAPSHOT (API Version    2.6-SNAPSHOT)

I get the same behaviour: authGroup only works when permission "Read on refs/* to replication group" is manually set from Web UI.

Here is the ML thread about it: https://groups.google.com/d/msg/repo-discuss/0b6Ultrwvqg/loCKd0lyt84J


Apr 12, 2013
#2 sop@google.com
Someone just reported this to me:
--
Looks like we had to set "Make group visible to all registered users" setting and then restart the server (didn't work if we didn't restart).
--

I think the replication plugin is unable to find the authGroup without this flag set; the internal group backend might not be allowing the plugin to find the group because it thinks the plugin is an anonymous end-user trying to find it in the web UI.

The group cache probably needed to be evicted after setting the flag in order to allow the backend to see the new group. Or the backend was caching something else about the group visibility.
Apr 13, 2013
#3 manuel.v...@gmail.com
I'm not sure about restart (in my setup the creation of replication group is done during gerrit server init so a couple of restart happens before we start to replicate).

However, for this very specific bug it's a case issue with the permission. We were pushing "Read = group example-replication" (R upper case). This was not taken into account by the replication plugin (even if the UI is OK with it dans display the access rule).

As soon as you make it lower case "read = group example-replication" (r lower case) it works like a charm.

I tried to find were the permission comparison is done but I failed to find it in replication plugin.
Apr 22, 2013
#4 sop@google.com
(No comment was entered for this change.)
Labels: 2.6
Apr 22, 2013
#5 sop@google.com
(No comment was entered for this change.)
Labels: -2.6 Blocking-2.6
Apr 23, 2013
#6 sop@google.com
The authGroup bug happens because of the following:

- On startup of a server Gerrit is loading the replication plugin on a random
  thread. This thread has no current user associated with it. With no user the
  InternalGroupBackend parses the group name and verifies Anonymous User can see
  the group. Typically it cannot, so the group fails to parse, and does not load.

- When an Administrator reloads the plugin the reload happens on a thread with
  the SSH authenticated user as the current user. Users with administrateServer
  can see groups, so the plugin is now able to parse that group and replication
  works as expected.

- Touching the replication.jar to force an automatic reload of the plugin causes
  the first case (similar to server startup) where the group cannot parse and is
  discarded, again breaking replication.
Status: Started
Apr 23, 2013
#7 sop@google.com
(No comment was entered for this change.)
Owner: sop@google.com
Apr 23, 2013
#8 sop@google.com
Likewise there is also a race condition here between plugins supplying group systems and the replication plugin.

With no explicit startup order managed by the server the replication plugin may start before a group system providing plugin, causing some groups to be unavailable, but later appear when reloading the replication plugin. Same basic bug as the internal groups I described in comment #6.
Apr 23, 2013
#9 sop@google.com
https://gerrit-review.googlesource.com/44961 and its parent
https://gerrit-review.googlesource.com/44950 should fix this.
Status: ChangeUnderReview
Apr 25, 2013
#10 sop@google.com
(No comment was entered for this change.)
Status: Submitted
Labels: -Blocking-2.6 FixedIn-2.6