Export to GitHub

snakeyaml - issue #136

ScannerException when loading stream with tab character


Posted on Dec 13, 2011 by Swift Bird

What steps will reproduce the problem? 1. new Yaml().load("--- 36L\tDIESEL\n"); 2. 3.

What is the expected output? What do you see instead? "36L\tDIESEL"

What version of SnakeYAML are you using? On what Java version? 1.9

Please provide any additional information below. (Often a failing test is the best way to describe the problem.)

A verbatim tab character is allowed in a YAML string so the example given above should parse OK.

Comment #1

Posted on Dec 13, 2011 by Helpful Monkey

would be interesting to see the Exception without compiling anything. Since you have it already why not to post it here?

Comment #2

Posted on Dec 14, 2011 by Massive Rhino

Comment deleted

Comment #3

Posted on Dec 14, 2011 by Swift Bird

Thank you for you reply.

I try the following example:

import org.yaml.snakeyaml.Yaml;

public class Issue136 { public static void main(String[] args) { new Yaml().load("--- 36L\tDIESEL\n"); } }

and I get the following exception:

Exception in thread "main" while scanning for the next token found character '\t' that cannot start any token in "", line 1, column 8: --- 36L DIESEL ^

at org.yaml.snakeyaml.scanner.ScannerImpl.fetchMoreTokens(ScannerImpl.java:358)
at org.yaml.snakeyaml.scanner.ScannerImpl.peekToken(ScannerImpl.java:202)
at org.yaml.snakeyaml.parser.ParserImpl$ParseDocumentEnd.produce(ParserImpl.java:265)
at org.yaml.snakeyaml.parser.ParserImpl.peekEvent(ParserImpl.java:161)
at org.yaml.snakeyaml.parser.ParserImpl.getEvent(ParserImpl.java:171)
at org.yaml.snakeyaml.composer.Composer.composeDocument(Composer.java:125)
at org.yaml.snakeyaml.composer.Composer.getSingleNode(Composer.java:106)
at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:121)
at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:296)
at org.yaml.snakeyaml.Yaml.load(Yaml.java:266)
at Issue136.main(Issue136.java:5)

Comment #5

Posted on Dec 14, 2011 by Massive Rhino

The corresponding issue for PyYAML has been created: http://pyyaml.org/ticket/219

Comment #6

Posted on Dec 14, 2011 by Massive Rhino

I found the answer. This comment explains it (http://code.google.com/p/snakeyaml/source/browse/src/main/java/org/yaml/snakeyaml/scanner/ScannerImpl.java#1630): "The specification is really confusing about tabs in plain scalars. We just forbid them completely. Do not use tabs in YAML!"

I would recommend to ask the question in the general YAML mailing list (see http://www.yaml.org/) to get the proper explanation.

Comment #7

Posted on Dec 15, 2011 by Swift Bird

I will ask on the mailing list.

The example is output from Ruby 1.8, so it must be pretty common.

Comment #8

Posted on Dec 18, 2011 by Massive Rhino

I think I managed to fix the problem. Please have a look at this clone: http://code.google.com/r/py4fun-tabinscalar/source/checkout

As you can see, the test accepts tabs inside a plain scalar: http://code.google.com/r/py4fun-tabinscalar/source/browse/src/test/java/org/yaml/snakeyaml/issues/issue136/TabInScalarTest.java

If we accept this fix then SnakeYAML will work differently then PyYAML. I will try to create the same patch for PyYAML and discuss it in the YAML core mailing list.

Comment #9

Posted on Dec 18, 2011 by Swift Bird

I posted the question on the mailing list, and got an answer that a tab in a plain scalar should be allowed:

http://sourceforge.net/mailarchive/forum.php?thread_name=419E75E7-E159-4998-995D-9EEC8D075F94%40datek.no&forum_name=yaml-core

I will try the clone, and report back to JRuby which uses SnakeYAML in their Ruby 1.9 implementation.

Comment #10

Posted on Dec 18, 2011 by Swift Bird

Alas, I have no hg client available to me. Can you make a JAR available?

Comment #11

Posted on Dec 19, 2011 by Massive Rhino

Here it is: http://code.google.com/p/snakeyaml/downloads/list

I follow your question in the YAML core mailing list, but I do not see any answer. Did you get the answer to your private account ?

As far as I know Ruby 1.8 is using its own YAML parser it caused a number of problems. In Ruby 1.9 then switched to Psych, which is using the same core engine as PyYAML and SnakeYAML. Ruby 1.9 and JRuby shall work the same. But with this fix SnakeYAML (and JRuby when it stitches to this version) will accept the tabs but Ruby will not. It will cause misunderstanding. That is why it is important to have the common approach with PyYAML and libyaml (used by Psych)

Comment #12

Posted on Dec 19, 2011 by Swift Bird

Thanks for the JAR.

Yes, the reply was sent to me privately. I have forwarded it to the mailing list now.

I agree that interoperability is very important. How should we bring this change to Ruby?

Comment #13

Posted on Dec 19, 2011 by Massive Rhino

I have provided the solution with tests to PyYAML (ticket 219 - http://pyyaml.org/ticket/219). In order to use it in Ruby 1.9 the following must be done: 1) we agree on this approach with PyYAML developers 2) PyYAML and libyaml are fixed 3) Psych must take the latest version of libyaml (with the fix) 4) Ruby must take the latest Psych version

It looks like a long path...

Comment #14

Posted on Dec 21, 2011 by Massive Rhino

Comment deleted

Comment #15

Posted on Dec 21, 2011 by Massive Rhino

Please be aware that you can use tabs in double- or single- quoted scalars. This is easy and safe because it works the same way in all the parsers. I have added a test to show it: http://code.google.com/p/snakeyaml/source/browse/src/test/java/org/yaml/snakeyaml/issues/issue136/TabInScalarTest.java

(The file snakeyaml-1.10-SNAPSHOT.jar will be removed from the 'Download' area to avoid confusion)

Comment #16

Posted on Dec 26, 2011 by Swift Bird

My problem is that I do not control the encoding of the stream, only the decoding, so the tabs are already present.

py4fun, if we get the verification that the YAML is legal, can you release a 1.10 version with this fix? The JRuby team would like to use a release version rather that a snapshot.

Comment #17

Posted on Dec 27, 2011 by Massive Rhino

1) does it mean that the 1.10-SNAPSHOT works as you expect ? (can I remove it ?) 2) we do not mind to include this change (fix ?) into 1.10, but I would like first to hear the explanation from Kirill (PyYAML). Please be aware that the very same YAML document will work differently in Ruby and JRuby. According to our release cycle, 1.10 version will be released in February.

Comment #18

Posted on Dec 27, 2011 by Swift Bird

Yes, the snapshot works as expected.

Excellent that you can include the fix. We all away Kirill's verdict :)

I am not sure when the next JRuby release is, but we will want to have the new release version included. If JRuby 1.7.0 is released before february, I guess we will include a snapshot first, and then include the release version of SnakeYAML in a later patch-level release.

Comment #19

Posted on Jan 12, 2012 by Massive Rhino

Fixed. Try the latest snapshot.

The fix will be delivered in version 1.10

Comment #20

Posted on Nov 5, 2012 by Happy Elephant

Comment deleted

Comment #21

Posted on Nov 6, 2012 by Happy Elephant

This issue still shows up in version 1.11.

Any ideas why this is still happening?

Comment #22

Posted on Nov 6, 2012 by Grumpy Horse

Comment deleted

Comment #23

Posted on Nov 6, 2012 by Massive Rhino

Can you please provide more information ? What is happening ? How we can reproduce it ? Can you run all the tests ? If you mean this: found character '\t' that cannot start any token in 'reader', line 3, column 1: CREATE TABLE account ( then it is a totally different issue. Please read: http://yaml.org/spec/1.1/

Tabs may appear inside comments and quoted or block scalar content. Tabs must not appear elsewhere, such as in indentation and separation spaces.

This is exactly what the error message says. Tabs cannot be used as indentation.

Comment #24

Posted on Nov 6, 2012 by Happy Elephant

Thanks for the response. Your link points to the yaml specification. Of course, I will not be reading the bazillion pages to try to understand why tabs are not allowed as indentation. Logically, I do not see a plausible reason why it should not be allowed especially since many people use gui editors to do their work. But thank you nonetheless for the answer.

Comment #25

Posted on Nov 8, 2012 by Massive Rhino

To avoid misunderstanding in the future the error message has been improved See: http://code.google.com/p/snakeyaml/wiki/changes

Implemented here: http://code.google.com/p/snakeyaml/source/detail?r=af8d7ccf66e5fa047be44c6899ff66979b94251d

It will be delivered in version 1.12

Status: Fixed

Labels:
Type-Defect Priority-Medium