Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add syntax for character code constants. #886

Open
lrhn opened this issue Aug 8, 2012 · 19 comments
Open

Add syntax for character code constants. #886

lrhn opened this issue Aug 8, 2012 · 19 comments
Labels
small-feature A small feature which is relatively cheap to implement.

Comments

@lrhn
Copy link
Member

lrhn commented Aug 8, 2012

The currently simplest way to get the character code of the character '9' is either "9".charCodeAt(0), which isn't even constant, or writing the number constant, e.g., 0x39. Or define a whole slew of constants as in the dart2js characters.dart library:<http://code.google.com/p/dart/source/browse/branches/bleeding_edge/dart/lib/compiler/implementation/util/characters.dart>

It would improve readability and usability a lot if there was a simple way to specify "the character code of the character _", like C and Java's '9', Ruby's ?9, Scheme's #\9 or SML's #"9".

I propose the SML syntax since it doesn't collide with any other Dart syntax, and it could allow arbitrary Dart single-character string literals, piggy-backing on known string syntax, so it's possible to write, e.g., #"\n" instead of 0x10.
It should only work as a literal, no #stringVar or multi-character strings, or other funky stuff.

I don't want a character type, just a way to create integer constants that is both readable, simple and compile-time constant.

@gbracha
Copy link

gbracha commented Aug 8, 2012

Given that these are defined once and for all in the library and you can in fact write $F and get a readable, simple, concise constant, I'm not sure why we want a construct (consuming a valuable token that might be used for some more important syntactic construct).


Added this to the Later milestone.
Added Accepted label.

@lrhn
Copy link
Member Author

lrhn commented Aug 8, 2012

I have a library now that handles a certain subset of characters. If I need other characters, or if other users need character codes, they will need to create their own library. It doesn't handle non-alphanumeric characters as well (q.v. $UNDERSCORE vs. #"_").

I think it makes sense for the language to support this feature instead of leaving it to libraries that are, by nature, either incomplete or very large.

@DartBot
Copy link

DartBot commented Sep 5, 2012

This comment was originally written by @simonpai


Just adding my 2 cents. Perhaps Dart can supply a library for char constants? In that case:

  1. User can optionally include them
  2. User can prefix the constants by importing with prefix, which avoids name collision

IMHO, this feature is like the String joining utility that Java misses. It doesn't really block anything if absent, but it will end up with everyone rebuilding the same wheel in every Dart project.

@efortuna
Copy link

We have this available in the dart:html library. http://api.dartlang.org/docs/bleeding_edge/dart_html/KeyCode.html

Can we make it available for both platforms instead?

@lrhn
Copy link
Member Author

lrhn commented Mar 20, 2013

The key code library is a good example of creating just the thing you need for one purpose. It only has one A code. I can't see from the api whether that is a lower or upper case A, but a character code library in would need both - and not have a "windows key" entry.

In other words, I don't see the general applicability of the library.

@efortuna
Copy link

right. I guess I missed that when I read this bug the first time. Yes, for the KeyCode library, by design, we are providing constants for the numbers associated with a particular key on a keyboard, (so there is only one code for "a", lower and upper case because it is the same key on the keyboard) not the ascii (or other) char code for the letters.

@DartBot
Copy link

DartBot commented Mar 20, 2013

This comment was originally written by greg...@gmail.com


@Emily - I mixed up the key handlers and ascii in my email. Sorry for the confusion.

One advantage of having literals, is working with unicode characters. The $F constant example above obviously only works for characters which are also valid Dart identifiers.

For example - without literals:

const int _A_MACRON = 256; //'Ā'.codeUnits.first;

switch(c) {
   case _A_MACRON: print(c); break;
}

With literals - you can just write:

switch(c) {
   case c'Ā': print(c); break;
}

@DartBot
Copy link

DartBot commented Mar 20, 2013

This comment was originally written by greg...@gmail.com


Oops - I see that's a dup of LRN's comment above.

@lrhn
Copy link
Member Author

lrhn commented Aug 23, 2013

Issue dart-lang/sdk#2093 has been merged into this issue.

@lrhn
Copy link
Member Author

lrhn commented Apr 22, 2014

Issue dart-lang/sdk#18322 has been merged into this issue.

@kasperl
Copy link

kasperl commented Jul 10, 2014

Removed this from the Later milestone.
Added Oldschool-Milestone-Later label.

@kasperl
Copy link

kasperl commented Aug 4, 2014

Removed Oldschool-Milestone-Later label.

@jamesderlin
Copy link

This is an ugly idea, but for completeness' sake at the very least:

Since "9".charCodeAt(0) can't be constant, have we considered inverting it by adding a const constructor to int? e.g.:

const int.charCodeOf(String character)

It would be annoyingly verbose, but it wouldn't require any syntax changes. (It also might be on par with the existing int.fromEnvironment constructor.)

@lrhn lrhn transferred this issue from dart-lang/sdk Mar 18, 2020
@lrhn
Copy link
Member Author

lrhn commented Mar 18, 2020

I have never given up on this as a language feature. It's just not a particularly high priority since the charcode package handles most of the use-cases adequately.

Using a const constructor is ingenious, but ugly. (There is precedence in the fromEnvironment constructors, but they are also ugly).

That said, focusing on Unicode code points is not necessarily the correct level of abstraction. It's better than code units, but it still isn't complete grapheme clusters. That means that in many cases, you should not be looking at individual code points at all, and the places where it's correct, it's also very likely to be ASCII only. There is a reason that package:charcode has generally been adequate.

@lrhn lrhn added the small-feature A small feature which is relatively cheap to implement. label Jul 8, 2020
@Cat-sushi
Copy link

Cat-sushi commented Jan 31, 2021

I'm from non-english speaking country Japan, and I would like to have g'𠮷' as a grapheme cluster constant.
It would include that package:characters should be a part of dart:core.

To be more precise, '𠮷' is represented by single code point U+20BB7, but by two code units of UTF-16 (a surrogate pair) 0xD842 0xDFB7.
I mean, each character constant should be a code point literal at minimum, but a grapheme cluster is better.

@lrhn
Copy link
Member Author

lrhn commented Feb 1, 2021

This proposal is for code point (integer) constants written symbolically, so c"𠮷" wouldn't work for that. That's specified as evaluating to the code point of the single-code-point string.

Grapheme clusters are sequences of code points, which means that there is no representation distinction between that and a String, which is also a sequence of code points represented as a sequence of UTF-16 code units.
That makes g"𠮷" just a shorthand for "𠮷".characters, not a numeric constant.

(Having a shorthand for .characters is an interesting idea. I think it's aiming too low. I'd be interested in allowing arbitrary user-defined string prefixes, so you can define your own xml'<foo bar="baz">qux</foo>' and have that call the special (or not) xml function. It gets even more interesting if the prefix understands interpolations, so re'foo${bar}+' would call re with both the strings "foo" and "+" and the value of bar, and then it can do its own interpolation, like JavaScript's template literals.)

@Cat-sushi
Copy link

Cat-sushi commented Feb 1, 2021

This proposal is for code point (integer) constants written symbolically, so c"𠮷" would work for that.

Good enough.

@eseidel
Copy link

eseidel commented Mar 19, 2022

Ran into this while writing an HTML tokenizer in Dart. https://github.com/RubberDuckEng/html6 The Tokenizer really wants to work on codepoints (int / Runes), but also wants to compare those against ascii literals (e.g. '<'). I could use package:codepoints but that seems like a bunch of namespace pollution for a few characters. 🤷 Not a big deal, just sharing the anecdote (feel free to hide the comment if not useful).

@lrhn
Copy link
Member Author

lrhn commented Mar 19, 2022

If you use package:charcode v ^1.3.0, it can generate declarations for the constants you need.
It only needs to be a dev-dependency, and you can even remove it again after generating the file.

(I still want character constants, but until I get it, this covers most of my needs.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
small-feature A small feature which is relatively cheap to implement.
Projects
None yet
Development

No branches or pull requests

8 participants