Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protoc doesn't support files starting with Byte order marks. #592

Closed
jtattermusch opened this issue Jul 13, 2015 · 7 comments · Fixed by #601
Closed

protoc doesn't support files starting with Byte order marks. #592

jtattermusch opened this issue Jul 13, 2015 · 7 comments · Fixed by #601
Assignees

Comments

@jtattermusch
Copy link
Contributor

Trying to compile .proto file starting with unicode byte order mark ( U+FEFF character) results in an error.

This is extremely painful in Visual Studio, which saves all files with a byteorder mark, so basically every .proto file that you write in Visual Studio editor will be considered broken by protoc.

The error message is also not very helpful:
health.proto:1:1: Interpreting non ascii codepoint 239.
[libprotobuf WARNING google/protobuf/compiler/parser.cc:491] No syntax specified
for the proto file. Please use 'syntax = "proto2";' or 'syntax = "proto3";' to
specify a syntax version. (Defaulted to proto2 syntax.)
health.proto:1:1: Expected top-level statement (e.g. "message").
health.proto:1:2: Interpreting non ascii codepoint 187.
health.proto:1:3: Interpreting non ascii codepoint 191.

@jtattermusch
Copy link
Contributor Author

@anandolee, I was hoping you could take a look at this. If you are too busy, feel free to assign me and I will try to find some time to address this.

@jskeet FYI (I think we've mentioned this problem once somewhere).

@anandolee
Copy link
Contributor

It is not only related to C#, parser.cc will be changed to support BOM?
@pherl do you think we should support .proto files starting with BOM?

@chai2010
Copy link

I think the .proto file is a UTF8 encoding file, withnot BOM.
The protoc is not a editor, don't need support invalid .proto file format.

@jtattermusch
Copy link
Contributor Author

as @anandolee suggested, this is basically a question which encodings should be considered valid for .proto file. If we don't support 0xfeff mark at the beginning of the proto file, we are making life harder for Visual Studio users.

@xfxyjwf
Copy link
Contributor

xfxyjwf commented Jul 14, 2015

Protobuf only support utf-8 encoded files, but I can see it's a pain to deal with BOM added automatically by visual studio. I would vote for updating our parser to ignore BOM.

@anandolee
Copy link
Contributor

Will update our parser to ignore BOM

@anandolee anandolee removed the c# label Jul 14, 2015
@jtattermusch
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants