Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String has no operator< etc #15693

Closed
trinarytree opened this issue Dec 18, 2013 · 13 comments
Closed

String has no operator< etc #15693

trinarytree opened this issue Dec 18, 2013 · 13 comments
Labels
area-library closed-not-planned Closed as we don't intend to take action on the reported issue type-enhancement A request for a change that isn't a bug

Comments

@trinarytree
Copy link

to reproduce:
'foo' > 'bar' yields
"Unhandled exception: Class 'String' has no instance method '>'."
that sucks. so we need 'foo'.compareTo('bar') > 0, which looks horrible.

javascript and python let us compare strings with <, >, <=, etc.
dart lets us compare numbers (1 < 2). why not strings?
Comparables should get <, >, <=, etc defined for free.

here is one possible implementation: if a class implements Comparable
and a relational operator is not explicitly overridden, let it just
delegate to compareTo. who would complain about such a feature?

Dart VM version: 1.0.2.1_r30821 (Tue Dec 3 12:51:29 2013) on "linux_x64"

@floitschG
Copy link
Contributor

Added Area-Library, Triaged labels.

@lrhn
Copy link
Member

lrhn commented Dec 18, 2013

cc @sgjesse.
Removed Type-Defect label.
Added Type-Enhancement label.

@floitschG
Copy link
Contributor

We haven't yet found enough use-cases where string-comparisons with '<' are useful and work internationally. Most of the time it is a sign that the code is doing something wrong.

We provide a compareTo operator, since some data-structures need to sort their elements. These data-structures usually don't care for the actual order, as long as the elements have a specific deterministic order. The String.compareTo works for this use-case.

Some clases, like DateTime, provide compareTo, but cannot provide the operators <, >, <=, >=, that are consistent with compareTo. This is, because compareTo only looks at the absolute time-value (is the dateTime before/after the other dateTime), but "==" looks at the attached timezone too. '<=' would need to be consistent with compareTo (for the "<" part), and at the same time with "==", which is unfortunately not possible.

@lrhn
Copy link
Member

lrhn commented Dec 18, 2013

There is the use case caused by Dart not having character code literals: Programmers are likely to want to compare single characters, so they write something like:
  var firstChar = string[0];
  if ('0' <= firstChar && firstChar <= '9') { // something on digits }
I.e., using simple compare operations makes good sense on single-character strings, because there is no good alternative.

@trinarytree
Copy link
Author

thanks for the explanations, florian. i had a rather long discussion with justin f.
and think i see a way to proceed. the only real problem is a lack of docs.

the dart contracts seem to be
(1) identical defines the finest possible equivalence relation, reference equality (this one is documented, so no problems there).
(2) == is required to define an equivalence relation at least fine enough to detect differences
in the publicly visible behavior of the objects, not including aliasing behaviors
(e.g. if a and b are 2 mutable objects and !identical(a, b), then we may require
a.x == b.x for all x, but we don't require that the assignment a.x = y causes b.x == y).
  note that it might not be possible to completely formalize this part of the contract.
e.g. suppose the field a.x is supposed to be a reference to a itself and a.validate checks whether identical(this, this.x).
should a == b return true when identical(a.x, b.x) or when identical(a.x, a) and identical(b.x, b)?
you could argue that a == b should be true when identical(a.x, b.x) because these are externally visible fields,
but then a.validate and b.validate would disagree.
if we want a.validate and b.validate to agree, then we must accept that a.x and b.x will disagree.
so there is no way to define == as "all of their behaviors are the same", even when a and b are immutable.
so we should be more lax and let the programmer decide what counts as "same behavior" for that class.
(3) A.compareTo is required to define an ordering on the instances of A, and the equivalence relation
that this induces must be at least as coarse as that induced by ==, i.e. a == b implies a.compareTo(b); the converse need not hold.

(2) seems to be the thing causing the desire to make DateTime's == disagree with that induced by compareTo.
since time zone is "externally visible" from a certain point of view,
(2) implies that 2 DateTimes with different time zones should be != even if they refer to the same moment.
but no ordering can be completely consistent with that.
strings may have a similar problem if, for example, we want 2 different unicode code points to be equal according to compareTo,
but unequal according to ==.

as far as i can tell, java does not require (2):
http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#equals(java.lang.Object)
and, being somewhere between difficult and impossible to formalize anyway,
one could argue that (2) should not be part of the general contract, that we are free to define == however we want
so long as it defines an ordering - reflexive, symmetric, and transitive when restricted to objects of the given type.
(how it needs to behave with instances of subclasses is a can of worms.
it is also interesting to note that the java docs "strongly recommend" that the natural ordering be consistent with equals

  • dart's DateTime.compareTo is not consistent with ==.)

so the right argument for String and DateTime not having < isn't "we can't", but "we don't want to".
i.e. we could make == for String or DateTime induce the same equivalence relation that compareTo does,
but for each, we actually want this other equivalence relation around that is strictly finer than the equivalence
relation induced by compareTo. we could define < for DateTime in a way that is compatible with compareTo in the sense
that a.compareTo(b) < 0 implies a < b, but we don't want to because of the fear that ((a < b does not imply a.compareTo(b))
may be confusing).

ok, that was a lot of babbling. here's my proposal.
add docs stating the general contracts, especially in Comparable:
(a) == can define any equivalence relation it wants.
(b) compareTo can define any ordering it wants.
(c) it's recommended that compareTo be at least as coarse as ==, i.e. a == b implies a.compareTo(b) == 0.
(d) if <, <=, >, >= are defined, it's recommended that a.compareTo(b) < 0 implies a < b.
(if (c) and (d) happen then there is a nice homomorphism mapping == equivalence classes onto compareTo equivalence classes)
(e) warning: if compareTo induces an equivalence relation strictly coarser than == (i.e. a.compareTo(b) == 0 does not imply a == b),
then it may be best just not to define <, <=, >=, > since it cannot agree with compareTo, which may be unexpected.
e.g. String and DateTime each define a compareTo which induces an equivalence relation that is coarser than ==,
which is why they don't define <, <=, >=, >.

on the other hand, if you dart guys are willing to live with a homomorphism that is not an isomorphism,
i think people could get plenty of utility out of it. e.g. this maximum finding algorithm
var max = list.first;
list.forEach((x) {
  if (x > max) {
    max = x;
  }
});
would work just fine on Strings or DateTimes provided (c) and (d) hold.
a sort algorithm that used such a < instead of compareTo would also work just fine.
this would be a bug though:
validate(startTime, endTime) => startTime <= endTime;

anyway, let us know if you guys can choose a contract and update the docs. it would be helpful. thanks!

@floitschG
Copy link
Contributor

floitschG commented Dec 19, 2013

We don't require (2) either (although it generally is the expected behavior).
There are already exceptions (like doubles with -0.0 == 0.0 and NaN != NaN), but I think this is mostly the case.

I agree with a-e.

Doubles are an exception, but that's because IEEE forces it on us.

Note that Strings do not fall into the same category as DateTime.

Strings could have a-e work, but the main-reason we don't add it, is because comparing strings without locale frequently is an error.
There are cases where this is not true (like "0" < someString && someString <= "9"), and I think that we should make these cases easier to deal with.

Maybe you have a really good use-case for adding the comparison operators, and we will have to revisit our decision.

TL;DR: we (and in particular I) don't want to add comparison operators to Strings, because they give the impression that their comparison operator sorts them alphabetically.
To be honest I fought against adding the compareTo on the String, too.

@trinarytree
Copy link
Author

florian, that's fine to not change any implementation. but can we get docs on Comparable explaining the intended/recommended contracts so that all this reasoning is more public? e.g. before this thread, it wasn't obvious to me that for some classes, compareTo and == intentionally disagreed, nor that it should explain the lack of implementing <, <=, etc.

@lrhn
Copy link
Member

lrhn commented Jan 2, 2014

I have updated the documentation of Comparable (on bleeding edge).
I hope this helps.

@trinarytree
Copy link
Author

thanks a bunch.

i think there are a few typos:
(1) "The double class has the comparison operators that are compatible with equality."
do you mean "The double class's comparison operators are not compatible with equality."?
i mention this because you earlier say that double's compareTo and == don't agree:
"See double where the compareTo method is more precise than equality"
(2) "where this fail" -> "where this fails"
(3) "The comparison operations is intended" -> "The comparison operations are intended"
(4) "When possible a the order of a" -> "When possible, the order of a"
(5) "If equality and compareTo agrees" -> "If equality and compareTo agree"
(6) "If equality and compareTo disagrees" -> "If equality and compareTo disagree"
(7) "and the ordering represents a less-than/greater-than ordering".
not sure what this part means. did you mean, as opposed to a partial order?
if so, that's usually called a total order, not a less-than/greater-than order.
but i thought that's actually the general contract: <, <=, ==, >=, >, if defined, are always
supposed to represent a total order. compareTo is also always supposed to be a total order.
so if my guesses above were right, that conditional clause can just be removed, since it's always true.

@lrhn
Copy link
Member

lrhn commented Jan 3, 2014

Thanks for the proof-reading :)

I agree that the description can probably be reduced significantly. This gist is supposed to be:

 The < operator family should be compatible with the == operator, and they should be used only for less-than/greater-than relations. (This is unrelated to Comparable, but there is no better place to write it).
 The compareTo function is intended for sorting and ordering in general, and must be total.
 It may be the case that compareTo and operator== don't agree. If operator< is defined, it should match operator==, and will likely disagree with compareTo too.

By a "less-than/greater-than" ordering, I mean an ordering that you would naturally use the words "less than" about. Numbers are the natural example. Temperatures would be another example (they are just a single number too, just with a meaning).
For DateTime, the ordering is "before/after", not "less than". For String, the ordering is ... well, there is no natural ordering, but let's say "lexically before/after, on the code units of the string". One string is not "less than" another, it's just "ordered before".

Wrt. total/partial ordering;

The compareTo method must be total (it's only allowed to return negative, zero or positive, there is no fourth option). If it isn't total, sorting and SplayTreeMap will break. Well, technically, all functions are total on their domain, so you just have to make sure that the compareTo method doesn't get values outside of the domain where it works - i.e., it must always be used in a setting where it is total.

The < operator should preferably be total too, but if it isn't (with some good reason), probably nothing breaks. It's just a case of a<=b and b<=a both being false (NaN is the classical example). It's still annoying since if a<b, a==b and a>b are all false, then the equality (b<=a) == !(a<b) doesn't hold.
You could meaningfully use < for a partial ordering on a custom lattice class, if you are careful.

@trinarytree
Copy link
Author

thanks for the explanation. still, you may want to back off from the "less-than/greater-than" concept.
e.g. suppose the docs say "If you'd say 'less than' in english, then consider implementing <."
what if the reader speaks russian? what if, for some concept A, in english you'd say "less than",
but in some other natural language you wouldn't literally translate it as "less than",
but you'd use some other phrase, like "before".
also, i would say "less than" in english when speaking of strings or dates because
i'm accustomed to saying "less than" for absolutely any ordering when i want to focus on the ordering properties.
i might also say "comes before", "is before", "is smaller than", refer to the "minimum",
the "smallest", the "smaller of the 2", or other comparison-like phrases. e.g. i might say
"consider the lexicographical ordering on strings. suppose string a is less than b..."
or without any prefacing sentence at all,
"remove the smallest (or least) date from the database."
yes, i could also say "earliest", but if i used one of the other phrases,
i don't think most people would correct me or be confused.
i feel like the contract shouldn't focus on how people say things but just on the axioms you expect.

how about this?
"compareTo must define a total order, and List.sort depends on this.
compareTo and the comparison operators (<, <=, ==, >=, >)
need not have any relationship to each other and the comparison operators need
not even define a partial order, however the following are recommended:

(1) If any of <, <=, ==, >=, > are defined, they should be consistent with each other.
(2) compareTo should be consistent with equality: a.compareTo(b) == 0 iff a == b.
(3) If one cannot satisfy (2), compareTo should be at least as coarse as equality
(a == b implies a.compareTo(b) == 0), and <, <=, >=, > should not be implemented.

Examples
DateTime's compareTo is coarser than equality, so it does not implement <, <=, >=, >.
double's compareTo is finer than equality (when neither operand is NaN),
but IEEE 754 forces it to implement <, <=, >=, > anyway (in violation of (3)),
and to implement == not as an equivalence relation (NaN != NaN)."

@DartBot
Copy link

DartBot commented Jan 13, 2014

This comment was originally written by @tatumizer


We discussed it at length in some earlier threads, and I still think current way of defining == for DateTime is the worst out of all possibilities. Among other things, it leads to Map based on DateTime as key to be inconsistent with TreeMap (the latter requires compareTo to be in sync with ==).
Is it too late to admit mistake and fix it?

@kevmoo kevmoo added type-enhancement A request for a change that isn't a bug and removed priority-unassigned labels Feb 29, 2016
@lrhn lrhn added the closed-not-planned Closed as we don't intend to take action on the reported issue label Mar 23, 2016
@lrhn
Copy link
Member

lrhn commented Mar 23, 2016

We don't expect to add < to String. The recommendation is to use compareTo or other specialized string comparison functions (like case-insensitive compare).

(And I still think we should have character code literals).

@lrhn lrhn closed this as completed Mar 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-library closed-not-planned Closed as we don't intend to take action on the reported issue type-enhancement A request for a change that isn't a bug
Projects
None yet
Development

No branches or pull requests

5 participants