What steps will reproduce the problem? 1. db = Sequel.mysql(:database => "foo", :encoding => "latin1") 2. Make a query with db
What is the expected output? What do you see instead? I expect to have results encoded in latin1, but I got them encoded in utf8. If I manually do db.run('set names latin1'), then I got them in latin1.
What version of the product are you using? On what operating system? I'm using Sequel 3.11.0 on GNU/Linux with MySQL 5.1.
Comment #1
Posted on May 19, 2010 by Happy BearThis is probably a bug in the mysql gem, since Sequel uses the :encoding or :charset options to set the encoding:
if encoding = opts[:encoding] || opts[:charset] # set charset before the connect. using an option instead of "SET (NAMES|CHARACTER_SET_*)" works across reconnects conn.options(Mysql::SET_CHARSET_NAME, encoding) end
Please contact the mysql gem author. If it turns out that Sequel is just using the mysql gem wrong, please open a new ticket.
Comment #2
Posted on May 19, 2010 by Grumpy CamelThe problem seems to come from Sequel (or my code). I do not have the problem when I use the MySQL gem directly in irb.
Comment #3
Posted on May 19, 2010 by Happy BearAre you doing the same thing Sequel is doing (calling conn.options(Mysql::SET_CHARSET_NAME, "latin1"))? From your message, it doesn't appear so.
Doing a little more research, it appears that MySQL has two different ways to set the chararacter set: http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html
I'm not a MySQL expert, so there may be a reason that SET NAMES is better than SET CHARACTER SET. However, that's not a change I feel comfortable making without some input from the community. If you'd like this feature changed, please post on the Sequel Google Group. Assuming the rest of the community is in agreement, I don't have a problem switching the feature to use SET NAMES instead.
Comment #4
Posted on May 19, 2010 by Grumpy CamelI can reproduce that on mysql with this sequence:
conn = Mysql.init conn.options(Mysql::READ_DEFAULT_GROUP, "client") # <- without this option, it works fine conn.options(Mysql::SET_CHARSET_NAME, "latin1") conn.conn.character_set_name # => "latin1" conn.real_connect(...) conn.conn.character_set_name # => "utf8"
Comment #5
Posted on May 19, 2010 by Happy BearHmm, I guess there are two possibilities. First, try the patch at http://pastie.org/967774.txt. That sets the character set after connecting. If that works, maybe we should just do that, as it doesn't negatively affect existing behavior.
Alternatively, if that doesn't work, and READ_DEFAULT_GROUP is screwing things up, try the patch at http://pastie.org/967778.txt. Unfortunately, if that works, I'll still have to see what the community thinks about it, since it breaks backwards compatibility for people relying on the client default group being read by default.
Please let me know how it works out.
Comment #6
Posted on May 19, 2010 by Grumpy CamelFor information, I've opened a bug upstream: http://rubyforge.org/tracker/index.php?func=detail&aid=28220&group_id=4550&atid=17562 and http://github.com/luislavena/mysql-gem/issues/issue/5
Comment #7
Posted on May 19, 2010 by Grumpy CamelThe first patch works :-)
And, FWIW, the other too.
Comment #8
Posted on May 19, 2010 by Happy BearThe weird part is, character_set_name is always returning 'latin1' for me, even if I set it to utf8 using SET CHARACTER SET or SET NAMES and turn off the READ_DEFAULT_GROUP.
Anyway, since there were no negative effects in my testing, I committed this: http://github.com/jeremyevans/sequel/commit/0adf8f14371fccff213cc9cd347611e642e32e44
Comment #9
Posted on May 19, 2010 by Grumpy CamelThanks
Comment #10
Posted on Jun 4, 2010 by Quick DogSince you are setting encoding after connection (gem 3.12.0) it is not working for me. I got '???' instead of utf8 cyrillic characters. When i revert back this patch and implement second solution (http://pastie.org/967778.txt) - it works fine again! Gem 3.10.0 also works fine for me. Steps to reproduce ~ $ irb
require 'sequel' => true Sequel.version => "3.12.0" DB = Sequel.connect('mysql://root@localhost/neomebel_development?encoding=utf8') => # DB[:products].each{|r| puts r[:description]} ????????? ????? ?????? ??????????? ????????? ??????? ? ??????. ???????? ??????? ???????????? ? ????????????, ? ??????????? ? ??? ??????????? ??????????. ????????? ??????????? ??????????? ??? ???????? ?????.
It is a serious bug. Please fix it ASAP. I have spent whole day researching why "heroku db:push" works wrong... (heroku gem uses taps, taps uses sequel)
Comment #11
Posted on Jun 4, 2010 by Happy BearI am sorry that it isn't working for you. I chose the first patch because it appeared to fix the issue without negative side effects. It looks like that patch fixed some cases and broke others.
Unfortunately, I don't know a good solution to the problem. Reverting the applied patch and using the patch at http://pastie.org/967778.txt has the potential to break other people's code that are relying on Sequel reading the client default group by default.
I'll post a message on the Sequel Google Group asking for the community's input before I make a decision.
Comment #12
Posted on Jun 5, 2010 by Happy BearThis patch broke for me as well. IIRC, the charset has to be set before the connect or it does not work correctly and breaks across reconnects (hence my original comment in the code).
Comment #13
Posted on Jun 5, 2010 by Happy BearThe connect options are only used during the real_connect call, so setting them after connecting does nothing.
When setting the 'client' read group, the real_connect call also reads the my.cnf for additional options. I imagine what happened in bruno's case is that his my.cnf has a charset=utf8, which was overriding the latin1 he had set manually.
Comment #14
Posted on Jun 5, 2010 by Happy BearWould it be better to send a "set names" SQL query after connecting? http://pastie.org/993156.txt
Comment #15
Posted on Jun 5, 2010 by Happy BearNo, SET NAMES causes even more issues, as noted in my original comment/patch.
Since mysql connections can timeout and libmysqlclient automatically reconnects, the SET NAMES will only apply to the very first connection and in the event of an automatic reconnect, everything will mysteriously break.
Comment #16
Posted on Jun 5, 2010 by Happy BearSequel turned off automatic reconnection many versions ago. Does that change things?
Comment #17
Posted on Jun 6, 2010 by Happy BearWhere's the code that turns off mysql's auto-reconnection?
I'm not sure how the original patch was working, afaik setting options after the connect does not do anything.
Comment #18
Posted on Jun 6, 2010 by Happy BearSequel stopped turning reconnection on in http://github.com/jeremyevans/sequel/commit/edbfbb78018f62a14400f562024ef03c389a2cd4
According to http://www.tmtm.org/en/mysql/ruby/, it defaults to not reconnecting.
Comment #19
Posted on Jun 7, 2010 by Happy BearSince I'm not sure if the ruby-mysql driver handles the SET_CHARSET_NAME option specially, and it appears that READ_DEFAULT_GROUP can override the SET_CHARSET_NAME option, how about an approach that sets the encoding using the SET_CHARSET_NAME option before connecting, but issues a set names query after connecting: http://pastie.org/994988.txt
Really, this is a bug in ruby-mysql. No sane interface is going to have options from a configuration file override options given explicitly.
Comment #20
Posted on Jun 7, 2010 by Happy BearI think that patch is reasonable. I'd love to be a little more explicit though, and add a comment to the SET NAMES stating that it doesn't work across reconnects, and put in a reconnect=false for good measure.
I agree this is a bug upstream, but its probably a bug in libmysqlclient itself since ruby-mysql is just a simple wrapper.
Comment #21
Posted on Jun 7, 2010 by Happy BearFixed: http://github.com/jeremyevans/sequel/commit/a9115795dbae1c7d518ebd4341e6d674b97a8780
Status: Fixed