ruby: Encode decode cleanup and behavior normalization #338

skippy · 2015-05-03T18:14:20Z

This PR may be a bridge too far. I am sending this PR as DO NOT MERGE; if you agree with the direction of the changs, I'll add some tests (in the old-style format) and documentation.

This pr started as just wanting to add common ruby syntactic sugar, specifically #to_json and #to_proto. Then I realized that the jruby .encode_json was broken (gist). But it also had the very nice and useful #to_h which the c-ext did not.

I tried consolidating the json encoding/decoding into pure ruby, using the stdlib JSON module, the JSON gem (using the c-extension), and Yajl. This ripped out a lot of c code, and a bit of java code, and I liked the simplification. Until I ran benchmarks. tl;dr: the native json library (haberman/upb) in this lib was consistantly 10x faster (gist). Yowzers....

I don't want to reopen the conversation in #121, and definitely not the tenor. However, I agree with the concerns that having two fully separate implementations is going to be hard to maintain. I'm finding more inconsitancies than I expected, and I'm just starting to scratch the surface. (e.g., another is that the jruby extension doesn't check STRING encoding during init. like the c-ext does). BUT it is alpha :)

If the JSON performance was closer, I would have submitted the PR with consolidated logic and have tried to make the case that moving some code into ruby is a good thing from a consistancy and maintenance perspective. However, the performance difference is so great I don't feel I should make that argument.

So here is a less-aggressive PR with fewer changes. Again, if this is moving in an acceptable direction, I'll finish it up.

skippy · 2015-05-03T18:15:04Z

ruby/ext/google/protobuf_c/message.c

+  TypedData_Get_Struct(_self, MessageHeader, &Message_type, self);
+
+  VALUE hash = rb_hash_new();
+  rb_gc_register_address(&hash);


is this needed if I use rb_hash_new()? I sort of suspect not

Oh, definitely not! rb_gc_register_address registers a static root with the garbage collector -- e.g., for a global variable. You're telling the GC that some random address on this thread's stack is a root to be kept alive. Probably not what you want :-)

ha! yes; sorry, I should have dived into the docs a bit more on that one. thanks for the explanation.

skippy · 2015-05-04T03:10:13Z

your call, but:

having a pure ruby method of #to_h that is used by both implementations is 'better' (hard time defining that... more maintainable? I'm not so sure... clearer? yes. dry-er? yes...but does that really matter?
but raw-c is 9x faster (gist)

but damn, if the java and jruby versions were a bit closer in performance, then the argument for moving most of the logic to ruby with a little java in the performant spots, and then a bit more of the C code could be made.

haberman · 2015-05-04T03:25:25Z

Until I ran benchmarks. tl;dr: the native json library (haberman/upb) in this lib was consistantly 10x faster (gist). Yowzers....

I would also mention that the upb JSON encoder is not very optimized (it uses snprintf() to format numbers, for example). Once this is optimized, the performance difference will probably be even greater than 10x.

By the way, if you are discovering inconsistencies between implementations, please add what you discovered to the unit tests. Not only does this make everything more robust, it makes refactorings like this much more tractable and lower-risk.

skippy · 2015-05-04T03:49:48Z

ruby/ext/google/protobuf_c/message.c

+       upb_msg_field_next(&it)) {
+    const upb_fielddef* field = upb_msg_iter_field(&it);
+    VALUE msg_value = layout_get(self->descriptor->layout, Message_data(self), field);
+    VALUE msg_key   = ID2SYM(upb_fielddef_name(field));


NOTE this makes the hash return symbols for keys. I think this is the right call for 2 reasons:

hell of a lot faster

primary reason: the message initializer only accepts symbolized keys

skippy · 2015-05-04T04:19:05Z

@haberman I am adding tests; on every one that I find and fix. Yet I'm eager to dive into tests a bit more as there are a bunch of edge cases that aren't being tested. I hope to start get to that later this week.

nice work on haberman/upb btw.

cfallin · 2015-05-04T04:26:24Z

Hi @skippy,

Thanks for all the work here! I'll give a closer look tomorrow. But one high-level request, especially since the flurry of pull-requests and comments is becoming somewhat hectic: could you put together a design/summary document of sorts listing (i) the discrepancies you're finding between the MRI Ruby extension and JRuby extension, and (ii) the specific semantic and API changes you're proposing? It would be easier to read that first and then review the code as appropriate. A shared Google doc is fine, or a gist in Markdown, or whatnot.

It's been a while since I reviewed the JRuby extension and wrote the MRI one so the summary is useful for me personally, at least :-)

Thanks!

haberman · 2015-05-04T04:32:10Z

@skippy Thanks for adding the tests and thanks for your nice words about upb!

skippy · 2015-05-04T21:49:19Z

@cfallin of course; thank you for asking. An initial draft is here: https://gist.github.com/skippy/00c23e4e7bdac5124fe4.

No rush on my end!

cfallin · 2015-05-05T21:30:04Z

Thanks for that -- I'll try to take a look in the next few days!

* make consistent between mri and jruby * create a #to_h and have it use symbols for keys * add #to_json and #to_proto helpers on the Google::Protobuf message classes

skippy · 2015-05-13T17:57:12Z

ruby/lib/google/protobuf.rb

+    end
+
+  end
+end


these methods existed in both jruby and mri, but I think it is cleaner to just define them once

skippy · 2015-05-13T17:58:09Z

@cfallin what ya think?

cfallin · 2015-05-14T17:55:02Z

The unification effort in general is really helpful here, thanks!

The only potential concern I have is in adding #to_proto and #to_json methods to message objects. It's definitely more convenient than having to invoke MyClass.encode(msg), but I recall a fairly spirited discussion around this when we designed the API and there were arguments to keep the message objects' method list "clean", i.e., only have overrides to Object methods. I realize the cat is partially out of the bag already with #to_h, and from my perspective today at least, the convenience methods do seem nice (no need to write out the message class's name explicitly; also allows for generic methods that take a message of any type).

@haberman, do you have any opinion here?

(To be clear, this is my only real potential concern, and I'm happy to take the rest of the changes!)

skippy · 2015-05-14T18:36:39Z

@cfallin very fair. #to_h is a common extension and one worth keeping, esp. after Rails, which I think popularized that helper (though I'm not totally clear on the etymology). #to_json has become more popular as a helper, usually used in the context of creating a data representation (such as a hash) and then, at the last minute, serializing it. It is common pattern in the ruby on rails and api realms. #to_proto is pure syntax sugar, and one that I liked from a few other ruby proto libraries.

As an FYI, and why this pattern has emerged, is that frameworks like Rails, and other ruby gems, have a common short-hand where you can use syntax such as render json: msg, and it will call msg.to_json. And that pattern holds for other helpers such as render proto: msg => msg.to_proto.

The ruby community loves its syntax sugar, that is for sure.

Except for #to_h, which makes retrieving keys and values from the object very convenient, I won't push hard (or perhaps I should say, push my luck) for the other two. :) Let me know what you all decide, and I'll go in and make the changes.

thanks!

haberman · 2015-05-15T17:42:17Z

The unfortunate thing about adding methods on the message objects is that is mixes the namespace of field names (from the .proto file) with the namespace of helper/convenience methods. msg.to_json is a helper method but msg.to_jsonx is a message field. This is really unclear to a reader. If a message field is named to_json, the user will need to use msg["to_json"] instead. If we add any convenience methods in the future, we could break existing code for any users who used that method name as a protobuf field name.

All that said, it's becoming apparent that the convenience and convention of helper methods has such strong precedent in Ruby culture that we'd be remiss not to support it. It's clear that we should support #to_proto and #to_json. Others we should evaluate on a case-by-case basis.

cfallin · 2015-05-15T17:52:38Z

OK, I'm in agreement too. Since we're still in alpha, I don't feel too bad about encroaching-by-eminent-domain into field namespace and grabbing a few method names. I agree that we should be careful about this in the future so that we don't break users (especially if we ever want to use a short/common/ambiguous name as a method).

The change itself looks good, so LGTM, and I'll merge!

ruby: Encode decode cleanup and behavior normalization

skippy · 2015-05-15T22:14:11Z

@cfallin I found a bug mapping #to_h through the Map type. I'm tracking down right now. I'll beef up the test cases in that area as well. It will take me, possibly, EOW to get that patch in.

but shit. sorry about that.

cfallin · 2015-05-15T22:17:57Z

@skippy OK, no worries, thanks for letting us know! If you'd like or if you need to pass it off, I'm happy to track it down given a repro, otherwise we're happy to take your fix.

googlebot added the cla: yes label May 3, 2015

skippy reviewed May 3, 2015
View reviewed changes

skippy changed the title ~~Encode decode cleanup and behavior normalization~~ ruby: Encode decode cleanup and behavior normalization May 3, 2015

skippy reviewed May 4, 2015
View reviewed changes

skippy force-pushed the encode-decode-helpers branch from 240ab51 to ff0c997 Compare May 13, 2015 17:03

adding and simplifying encoders/decoders

d1b52a0

* make consistent between mri and jruby * create a #to_h and have it use symbols for keys * add #to_json and #to_proto helpers on the Google::Protobuf message classes

skippy force-pushed the encode-decode-helpers branch from ff0c997 to d1b52a0 Compare May 13, 2015 17:04

skippy reviewed May 13, 2015
View reviewed changes

cfallin added a commit that referenced this pull request May 15, 2015

Merge pull request #338 from skippy/encode-decode-helpers

a526605

ruby: Encode decode cleanup and behavior normalization

cfallin merged commit a526605 into protocolbuffers:master May 15, 2015

skippy mentioned this pull request May 16, 2015

ruby: Map to_h fixes #396

Closed

blowmage mentioned this pull request May 22, 2019

[Ruby] Remove to_hash methods #6166

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ruby: Encode decode cleanup and behavior normalization #338

ruby: Encode decode cleanup and behavior normalization #338

skippy commented May 3, 2015

skippy May 3, 2015

cfallin May 8, 2015

skippy May 13, 2015

skippy commented May 4, 2015

haberman commented May 4, 2015

skippy May 4, 2015

skippy commented May 4, 2015

cfallin commented May 4, 2015

haberman commented May 4, 2015

skippy commented May 4, 2015

cfallin commented May 5, 2015

skippy May 13, 2015

skippy commented May 13, 2015

cfallin commented May 14, 2015

skippy commented May 14, 2015

haberman commented May 15, 2015

cfallin commented May 15, 2015

skippy commented May 15, 2015

cfallin commented May 15, 2015

+                  end
+                end
+              end

ruby: Encode decode cleanup and behavior normalization #338

ruby: Encode decode cleanup and behavior normalization #338

Conversation

skippy commented May 3, 2015

skippy May 3, 2015

Choose a reason for hiding this comment

cfallin May 8, 2015

Choose a reason for hiding this comment

skippy May 13, 2015

Choose a reason for hiding this comment

skippy commented May 4, 2015

haberman commented May 4, 2015

skippy May 4, 2015

Choose a reason for hiding this comment

skippy commented May 4, 2015

cfallin commented May 4, 2015

haberman commented May 4, 2015

skippy commented May 4, 2015

cfallin commented May 5, 2015

skippy May 13, 2015

Choose a reason for hiding this comment

skippy commented May 13, 2015

cfallin commented May 14, 2015

skippy commented May 14, 2015

haberman commented May 15, 2015

cfallin commented May 15, 2015

skippy commented May 15, 2015

cfallin commented May 15, 2015