Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dart should allow invalid scalar values for string literals #13535

Closed
floitschG opened this issue Sep 24, 2013 · 7 comments
Closed

Dart should allow invalid scalar values for string literals #13535

floitschG opened this issue Sep 24, 2013 · 7 comments
Assignees
Labels
area-language Dart language related items (some items might be better tracked at github.com/dart-lang/language). type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)

Comments

@floitschG
Copy link
Contributor

The spec currently (as of 2013-09-24) states:
===
\u{HEX_DIGIT_SEQUENCE} is the unicode scalar value represented by the HEX_DIGIT_SEQUENCE. It is a compile-time error if the value of the HEX_DIGIT_SEQUENCE is not a valid unicode scalar value.
===

We should remove this limitation since users can create these strings dynamically anyway. In most cases it just makes testing more difficult.

@gbracha
Copy link
Contributor

gbracha commented Aug 27, 2014

One can argue that this should be a static warning rather than a compilation error. And one can argue the other way too. The question is whether treating it as an error is more useful to users. I think this case is, so I'm inclined to leave things as they are, but could be convinced otherwise. I'm also not sure what the implications are for the VM.


Set owner to @gbracha.
Added Accepted label.

@floitschG
Copy link
Contributor Author

Just to clarify: I still want to have a check that it is in the unicode-range 0-0x10FFFF, but I would remove the checks for invalid ranges.
For example: "\u{DC80}".

I think that "\u" (and as consequence "\x") is either used as a means to express a specific rune (for example the G-clef, \u{1D11E}), or, more frequently, to encode an existing string. In that case, the developer could want to encode invalid sequences, and I think we should allow it.

Making it a static warning is ok for me.

@lrhn
Copy link
Member

lrhn commented Aug 28, 2014

I agree that we should allow surrogate values as \u-escapes.

It made sense to not allow them back when Dart strings were UTF-32 encoded, but now that we are using JavaScript strings, which are just sequences of UTF-16 code units, there is on need for the restriction, and it's more of an obstruction than a help.

It is quite possible to have strings containing unpaired surrogates, you just have to create them in a round-about way.

 var string = "ab\uDC12ef"; // fails to compile
 var string2 = new String.fromCharCodes([0x61, 0x62, 0xdc12, 0x65, 0x66]); // works
 var string3 = "ab${new String.fromCharCode(0xdc12)}ef"; // works
 var string4 = "ab" + "\u{10012}"[1] + "ef"; // works

These workarounds have one thing in common: the strings are not compile time constants.

Some users want to store bit patterns as strings (it's the most efficient representation we have for compile-time constant blobs), and it's frustrating when some bit patterns are disallowed by the syntax, even if they are fine as values.

(A better alternative for that problem would be typed-data literals, e.g. [1,2,3]u16 as a Uint16List literal, but we don't have that.)

@gbracha
Copy link
Contributor

gbracha commented Aug 28, 2014

Thanks for the clarifications. That makes a lo more sense. I'll check with the VM team and TC52.

@DartBot
Copy link

DartBot commented Aug 28, 2014

This comment was originally written by las...@gmail.com


In this case, the change would just be changing "Unicode scalar value" to "code point".

http://www.unicode.org/glossary/#unicode_scalar_value

@floitschG
Copy link
Contributor Author

In Dart terms we use the term "Rune" for code points.
If possible I would mention both,, but if you only want to use one term, "Unicode code point" is what should be in the spec.

@floitschG floitschG added Type-Defect area-language Dart language related items (some items might be better tracked at github.com/dart-lang/language). labels Aug 28, 2014
@kevmoo kevmoo added type-bug Incorrect behavior (everything from a crash to more subtle misbehavior) and removed priority-unassigned labels Feb 29, 2016
@floitschG
Copy link
Contributor Author

Dupe of #26620.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-language Dart language related items (some items might be better tracked at github.com/dart-lang/language). type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)
Projects
None yet
Development

No branches or pull requests

5 participants