New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ruby: Encode decode cleanup and behavior normalization #338
Conversation
TypedData_Get_Struct(_self, MessageHeader, &Message_type, self); | ||
|
||
VALUE hash = rb_hash_new(); | ||
rb_gc_register_address(&hash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this needed if I use rb_hash_new()
? I sort of suspect not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, definitely not! rb_gc_register_address
registers a static root with the garbage collector -- e.g., for a global variable. You're telling the GC that some random address on this thread's stack is a root to be kept alive. Probably not what you want :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ha! yes; sorry, I should have dived into the docs a bit more on that one. thanks for the explanation.
your call, but:
but damn, if the java and jruby versions were a bit closer in performance, then the argument for moving most of the logic to ruby with a little java in the performant spots, and then a bit more of the C code could be made. |
I would also mention that the upb JSON encoder is not very optimized (it uses By the way, if you are discovering inconsistencies between implementations, please add what you discovered to the unit tests. Not only does this make everything more robust, it makes refactorings like this much more tractable and lower-risk. |
upb_msg_field_next(&it)) { | ||
const upb_fielddef* field = upb_msg_iter_field(&it); | ||
VALUE msg_value = layout_get(self->descriptor->layout, Message_data(self), field); | ||
VALUE msg_key = ID2SYM(upb_fielddef_name(field)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE this makes the hash return symbols for keys. I think this is the right call for 2 reasons:
- hell of a lot faster
- primary reason: the message initializer only accepts symbolized keys
@haberman I am adding tests; on every one that I find and fix. Yet I'm eager to dive into tests a bit more as there are a bunch of edge cases that aren't being tested. I hope to start get to that later this week. nice work on haberman/upb btw. |
Hi @skippy, Thanks for all the work here! I'll give a closer look tomorrow. But one high-level request, especially since the flurry of pull-requests and comments is becoming somewhat hectic: could you put together a design/summary document of sorts listing (i) the discrepancies you're finding between the MRI Ruby extension and JRuby extension, and (ii) the specific semantic and API changes you're proposing? It would be easier to read that first and then review the code as appropriate. A shared Google doc is fine, or a gist in Markdown, or whatnot. It's been a while since I reviewed the JRuby extension and wrote the MRI one so the summary is useful for me personally, at least :-) Thanks! |
@skippy Thanks for adding the tests and thanks for your nice words about upb! |
@cfallin of course; thank you for asking. An initial draft is here: https://gist.github.com/skippy/00c23e4e7bdac5124fe4. No rush on my end! |
Thanks for that -- I'll try to take a look in the next few days! |
240ab51
to
ff0c997
Compare
* make consistent between mri and jruby * create a #to_h and have it use symbols for keys * add #to_json and #to_proto helpers on the Google::Protobuf message classes
ff0c997
to
d1b52a0
Compare
end | ||
|
||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these methods existed in both jruby and mri, but I think it is cleaner to just define them once
@cfallin what ya think? |
The unification effort in general is really helpful here, thanks! The only potential concern I have is in adding @haberman, do you have any opinion here? (To be clear, this is my only real potential concern, and I'm happy to take the rest of the changes!) |
@cfallin very fair. As an FYI, and why this pattern has emerged, is that frameworks like Rails, and other ruby gems, have a common short-hand where you can use syntax such as The ruby community loves its syntax sugar, that is for sure. Except for thanks! |
The unfortunate thing about adding methods on the message objects is that is mixes the namespace of field names (from the .proto file) with the namespace of helper/convenience methods. All that said, it's becoming apparent that the convenience and convention of helper methods has such strong precedent in Ruby culture that we'd be remiss not to support it. It's clear that we should support |
OK, I'm in agreement too. Since we're still in alpha, I don't feel too bad about encroaching-by-eminent-domain into field namespace and grabbing a few method names. I agree that we should be careful about this in the future so that we don't break users (especially if we ever want to use a short/common/ambiguous name as a method). The change itself looks good, so LGTM, and I'll merge! |
ruby: Encode decode cleanup and behavior normalization
@cfallin I found a bug mapping #to_h through the Map type. I'm tracking down right now. I'll beef up the test cases in that area as well. It will take me, possibly, EOW to get that patch in. but shit. sorry about that. |
@skippy OK, no worries, thanks for letting us know! If you'd like or if you need to pass it off, I'm happy to track it down given a repro, otherwise we're happy to take your fix. |
Hey @cfallin
This PR may be a bridge too far. I am sending this PR as DO NOT MERGE; if you agree with the direction of the changs, I'll add some tests (in the old-style format) and documentation.
This pr started as just wanting to add common ruby syntactic sugar, specifically
#to_json
and#to_proto
. Then I realized that the jruby.encode_json
was broken (gist). But it also had the very nice and useful#to_h
which the c-ext did not.I tried consolidating the json encoding/decoding into pure ruby, using the stdlib JSON module, the JSON gem (using the c-extension), and Yajl. This ripped out a lot of c code, and a bit of java code, and I liked the simplification. Until I ran benchmarks. tl;dr: the native json library (haberman/upb) in this lib was consistantly 10x faster (gist). Yowzers....
I don't want to reopen the conversation in #121, and definitely not the tenor. However, I agree with the concerns that having two fully separate implementations is going to be hard to maintain. I'm finding more inconsitancies than I expected, and I'm just starting to scratch the surface. (e.g., another is that the jruby extension doesn't check STRING encoding during init. like the c-ext does). BUT it is alpha :)
If the JSON performance was closer, I would have submitted the PR with consolidated logic and have tried to make the case that moving some code into ruby is a good thing from a consistancy and maintenance perspective. However, the performance difference is so great I don't feel I should make that argument.
So here is a less-aggressive PR with fewer changes. Again, if this is moving in an acceptable direction, I'll finish it up.