Assigned
Status Update
Comments
ed...@gmail.com <ed...@gmail.com> #2
[Comment deleted]
sp...@gtempaccount.com <sp...@gtempaccount.com> #3
I'd rather see this added as an index feature, so the uniqueness could be spread
across multiple fields (and an index would almost certainly be needed to support this
efficiently, anyway).
across multiple fields (and an index would almost certainly be needed to support this
efficiently, anyway).
ed...@gmail.com <ed...@gmail.com> #4
I totally agree that this should be implemented as an index. It could (and I think
should) still be specified in the schema/property constructor and the index
automatically generated.
should) still be specified in the schema/property constructor and the index
automatically generated.
ed...@gmail.com <ed...@gmail.com> #5
It would also be great to have a way, in the schema declarations, to specify
composite unique indexes/properties, especially for reference properties.
composite unique indexes/properties, especially for reference properties.
sa...@gmail.com <sa...@gmail.com> #6
Yes, this would be super useful.
be...@gmail.com <be...@gmail.com> #7
Yes, this is a very common usecase +1
am...@gmail.com <am...@gmail.com>
ma...@gmail.com <ma...@gmail.com> #8
I need this feature.
tw...@gmail.com <tw...@gmail.com> #9
Can't this be done with a carefully contstructed key name?
ed...@gmail.com <ed...@gmail.com> #10
No. It can't be done with a carefully constructed key name... because the key name is
immutable once it is set at creation. For example, say you have a User model with an
email address in it. You want the email address to be unique across all User
entities. If you use the email address as key name you are out of luck when the user
changes his/her email.
immutable once it is set at creation. For example, say you have a User model with an
email address in it. You want the email address to be unique across all User
entities. If you use the email address as key name you are out of luck when the user
changes his/her email.
be...@gmail.com <be...@gmail.com> #11
no i don't think so, because keys are used to reference objects, so if you change the
key you have to update all referencing properties.
uniqueness is not the same use-case as a primary key
key you have to update all referencing properties.
uniqueness is not the same use-case as a primary key
ry...@google.com <ry...@google.com> #12
hi all! we've discussed this a bit internally, and we agree, it would be a useful
feature. it's definitely one of the most widely used constraints in SQL schemas.
unfortunately, it would be pretty difficult to implement with the current datastore
architecture, due in large part to our index consistency model, described in
http://code.google.com/appengine/articles/transaction_isolation.html . we might have
to chalk this up as another place the datastore doesn't provide traditional
relational database features.
at first, the key_name approach seemed tempting to me too. unfortunately, as edoardo
and bernddorn mention, it effectively makes those properties read only. having said
that, i suspect that "unique" properties like these actually are often read only, so
that might not be quite as big a drawback.
out of curiosity, would this feature be useful to anyone if it was limited to entity
groups? ie, you could specify that property values must be unique within each
individual entity group, but they could be repeated across entity groups?
feature. it's definitely one of the most widely used constraints in SQL schemas.
unfortunately, it would be pretty difficult to implement with the current datastore
architecture, due in large part to our index consistency model, described in
to chalk this up as another place the datastore doesn't provide traditional
relational database features.
at first, the key_name approach seemed tempting to me too. unfortunately, as edoardo
and bernddorn mention, it effectively makes those properties read only. having said
that, i suspect that "unique" properties like these actually are often read only, so
that might not be quite as big a drawback.
out of curiosity, would this feature be useful to anyone if it was limited to entity
groups? ie, you could specify that property values must be unique within each
individual entity group, but they could be repeated across entity groups?
ed...@gmail.com <ed...@gmail.com> #13
ryanb: even properties that are thought to be immutable/readonly (like unique
identifiers, say ISSN/ISBN etc) may actually change in the future (and they did).
That is why, in general, it is a very bad idea to use properties as key_names (that's
the all thinking behind using a simple integer as primary key in relational database
design). Moreover, there are unique properties (say email addresses) that you want
unique and are most likely to change.
The correct way of implementing this would be to have the unique constraint apply
within all entities of a specific KIND, not just an entity group as you suggested.
Another way of going at it would be to be able to query the datastore within a
transaction.
identifiers, say ISSN/ISBN etc) may actually change in the future (and they did).
That is why, in general, it is a very bad idea to use properties as key_names (that's
the all thinking behind using a simple integer as primary key in relational database
design). Moreover, there are unique properties (say email addresses) that you want
unique and are most likely to change.
The correct way of implementing this would be to have the unique constraint apply
within all entities of a specific KIND, not just an entity group as you suggested.
Another way of going at it would be to be able to query the datastore within a
transaction.
be...@gmail.com <be...@gmail.com> #14
for me this would be totally ok, afaik an entity-group is needed anyways to check
uniqueness in a transaction safe manner, isnt it?
even though in my applications i do not check uniqueness transaction safe. i thought
that if this would happen on the datastore end it would be possible implement it on
the storage/index level, which would make it much less error prune than doing it on
the application level via properties.
uniqueness in a transaction safe manner, isnt it?
even though in my applications i do not check uniqueness transaction safe. i thought
that if this would happen on the datastore end it would be possible implement it on
the storage/index level, which would make it much less error prune than doing it on
the application level via properties.
ye...@gmail.com <ye...@gmail.com> #15
Transaction isn't needed to guarantee a property is unique. Do a put and followed by
a query on that property, if count is greater than 1, delete self. If you worry about
exception, you can mark one property of the new object to indicate it's new and flip
it once verified unique. If there's contention of put, both object should delete them
self, there shouldn't be 2 object both marked old but with same property value.
Of course solve this on index level is much efficient.
a query on that property, if count is greater than 1, delete self. If you worry about
exception, you can mark one property of the new object to indicate it's new and flip
it once verified unique. If there's contention of put, both object should delete them
self, there shouldn't be 2 object both marked old but with same property value.
Of course solve this on index level is much efficient.
th...@gmail.com <th...@gmail.com> #16
question: Are entity ID's guaranteed to be unique?
If so, would it be possible to create a property that does something like:
entity.index = eventual_id
instead of:
entity.put()
entity.index = entity.key().id()
If so, would it be possible to create a property that does something like:
entity.index = eventual_id
instead of:
entity.put()
entity.index = entity.key().id()
[Deleted User] <[Deleted User]> #17
[Comment deleted]
ry...@google.com <ry...@google.com> #18
generally, no, IDs are not unique. however, they will be unique for entities with the
same kind and the same parent, or root entities of the same kind.
in general, the full path is the only thing that's always guaranteed to be unique.
see
http://code.google.com/appengine/docs/datastore/keysandentitygroups.html#Paths_and_Key_Uniqueness
.
same kind and the same parent, or root entities of the same kind.
in general, the full path is the only thing that's always guaranteed to be unique.
see
.
wk...@gmail.com <wk...@gmail.com> #19
I think this "compromise" is pretty much useless. The whole point of having unique
entities is to have some slightly more complex guarantees about consistency than what
you currently provide with key_name. Is this feature really impossible to implement
in a scalable way or is this a specific problem of bigtable or the datastore?
What about allowing to safely change the key_name? Could you somehow allow for
locking two entity groups at the same time? If a deadlock occurs you could just
revert one of the transactions and let the other continue to run. This might need
some more thought though. :)
entities is to have some slightly more complex guarantees about consistency than what
you currently provide with key_name. Is this feature really impossible to implement
in a scalable way or is this a specific problem of bigtable or the datastore?
What about allowing to safely change the key_name? Could you somehow allow for
locking two entity groups at the same time? If a deadlock occurs you could just
revert one of the transactions and let the other continue to run. This might need
some more thought though. :)
bo...@gmail.com <bo...@gmail.com> #20
I ran into this and between a thread in the groups and this ticket, it's been
helpful. I did come up with a solution for my specific use case. I'm using a floating
point number as a score and need it to be unique because I am paging off of the
value. I also needed it to not be immutable, because scores can change. What I did
was override the put method, with a check to confirm the value being unique and if
not, then adjusted it and tried again.
def put(object):
valid = False
while valid == False:
cquery = db.GqlQuery('SELECT * FROM Story WHERE score = :1', object.score)
results = cquery.fetch(1)
if len(results) > 0:
value = value + .01
else:
valid = True
super(Story, Story).put(object)
One thing that might be useful from the appengine standpoint would be to add an
attribute similar to validator for properties, like unique_validator = None, which
could then be a hook for developers to create their own functions. Or if you needed a
more generic function that could raise an exception, it could be something similar to
the above, except it would raise an exception on the check for an existing value, and
it could then be up to the developer to catch the exception and adjust the value
accordingly before reattempting the put operation. Then you could just have a unique
= True/False attribute.
helpful. I did come up with a solution for my specific use case. I'm using a floating
point number as a score and need it to be unique because I am paging off of the
value. I also needed it to not be immutable, because scores can change. What I did
was override the put method, with a check to confirm the value being unique and if
not, then adjusted it and tried again.
def put(object):
valid = False
while valid == False:
cquery = db.GqlQuery('SELECT * FROM Story WHERE score = :1', object.score)
results = cquery.fetch(1)
if len(results) > 0:
value = value + .01
else:
valid = True
super(Story, Story).put(object)
One thing that might be useful from the appengine standpoint would be to add an
attribute similar to validator for properties, like unique_validator = None, which
could then be a hook for developers to create their own functions. Or if you needed a
more generic function that could raise an exception, it could be something similar to
the above, except it would raise an exception on the check for an existing value, and
it could then be up to the developer to catch the exception and adjust the value
accordingly before reattempting the put operation. Then you could just have a unique
= True/False attribute.
ka...@gmail.com <ka...@gmail.com> #21
In JPA (Java) there is an annotation @UniqueConstraint [1] that can be put on the
class-level table annotation to identify which properties should be unique (either
alone or alongside others). This is usually used in addition to property-level
annotations, which have that unique=True/False property.
Before calling put(), the information from the annotation (or a simple property
_unique_constraints) is used to issue some queries, and, if no constraints are
violated, the entity is stored. If any constraint is violated an appropriate error
can be raised.
I have implemented a very basic version of my suggestion, which is enough for what I
need:
## start
class DbEntity(db.Model):
_unique_properties = None
timestamp = db.DateTimeProperty(auto_now=True)
def __init__(self, parent=None, key_name=None, _app=None, unique_properties = None,
**kwds):
super(DbEntity, self).__init__(parent, key_name, _app, **kwds)
if unique_properties:
logging.debug("Unique properties: %s" % unique_properties)
self._unique_properties = unique_properties
def put(self):
if self._unique_properties:
logging.debug('checking unique properties for : %s ...' % self.__class__)
for unique in self._unique_properties:
gqlString = 'WHERE %s = :1' % unique
param = eval('self.%s' % unique)
logging.info ('GQL: self.gql("%s", %s)' % (gqlString, param))
query = self.gql(gqlString, eval('self.%s' % unique))
otherObjects = query.fetch(10)
if len(otherObjects) > 0:
logging.error("Objects that violate the constraints: %s" % otherObjects)
raise db.BadPropertyError("Other instances of %s exist with %s = %s" %
(self.__class__, unique, param))
return super(DbEntity, self).put()
class Team(DbEntity):
name = db.StringProperty(required=True)
def __init__(self, parent=None, key_name=None, _app=None, **kwds):
# set the unique property names
super(Team, self).__init__(unique_properties = ['name'], **kwds)
## end
[1]http://java.sun.com/javaee/5/docs/api/javax/persistence/UniqueConstraint.html
class-level table annotation to identify which properties should be unique (either
alone or alongside others). This is usually used in addition to property-level
annotations, which have that unique=True/False property.
Before calling put(), the information from the annotation (or a simple property
_unique_constraints) is used to issue some queries, and, if no constraints are
violated, the entity is stored. If any constraint is violated an appropriate error
can be raised.
I have implemented a very basic version of my suggestion, which is enough for what I
need:
## start
class DbEntity(db.Model):
_unique_properties = None
timestamp = db.DateTimeProperty(auto_now=True)
def __init__(self, parent=None, key_name=None, _app=None, unique_properties = None,
**kwds):
super(DbEntity, self).__init__(parent, key_name, _app, **kwds)
if unique_properties:
logging.debug("Unique properties: %s" % unique_properties)
self._unique_properties = unique_properties
def put(self):
if self._unique_properties:
logging.debug('checking unique properties for : %s ...' % self.__class__)
for unique in self._unique_properties:
gqlString = 'WHERE %s = :1' % unique
param = eval('self.%s' % unique)
query = self.gql(gqlString, eval('self.%s' % unique))
otherObjects = query.fetch(10)
if len(otherObjects) > 0:
logging.error("Objects that violate the constraints: %s" % otherObjects)
raise db.BadPropertyError("Other instances of %s exist with %s = %s" %
(self.__class__, unique, param))
return super(DbEntity, self).put()
class Team(DbEntity):
name = db.StringProperty(required=True)
def __init__(self, parent=None, key_name=None, _app=None, **kwds):
# set the unique property names
super(Team, self).__init__(unique_properties = ['name'], **kwds)
## end
[1]
t....@gmail.com <t....@gmail.com> #22
re: Comment 23
This code contains a race condition and will not ensure uniqueness (it should, most
of the time, but there's no uniqueness guarantee).
This code contains a race condition and will not ensure uniqueness (it should, most
of the time, but there's no uniqueness guarantee).
ja...@gmail.com <ja...@gmail.com> #23
sorry my bad english but ehy me get 2 insert when me wait 1
ro...@gmail.com <ro...@gmail.com> #24
This is a total kludge but what if you did memcache.add("dblock", "dblock", 10), then do a query to see if there are any existing objects in the datastore with your unique fields before doing the db.put() and then memcache.delete("dblock"). If the memcache.add() returns false, just wait a random number of milliseconds and try again.
em...@gtempaccount.com <em...@gtempaccount.com> #25
Yes, unique constraints are critical to certain use cases. My problem is that I allow the datastore to create unique, immutable numeric IDs for the Key, which are used for entity relationships, but I need mutable secondary key for human readable keys that may change. I ran into a case where a request was repeated quickly (within 50ms) and so two JVMs handled them simultaneously. Within a transaction, the code queried on the secondary key to see if it already existed. Both handlers got null, since the entity did not yet exist. They then created it and the datastore assigned two unique numeric primary keys to the new entities. This left me with two entities with duplicate secondary keys.
I understand there is a technique that involves creating an auxiliary entity to be used as a mutex for the secondary key. In JDO/JPA that is a bunch of ugly, obfuscating code that requires extra java classes, etc. JDO/JPA pretty much assume the semantics of a full SQL datastore, so this kind of kludge really shows that JDO/JPA is not a great solution for the Google datastore. The technique is much less offensive if you are using the low level api. But, all my code now uses JDO, so it it possible to implement the uniqueness annotation?
From my point of view, it would be OK to require an index and/or assume that writes to such entities would take longer. I understand that the stage 1 (commit phase) of write does not update indices, so that (as it now stands), the entity is effectively committed (logged and visible to get()) prior to any indices being updated. Indices are updated in phase 2, which can occur after the commit call returns. However, uniqueness constraints are very important, so much so that it would be OK with me that my commit waited for phase2 to finish before it returned. This would allow the indexing phase to enforce the uniqueness constraint. It would also be ok that the non-unique entity committed in phase 1 is visible to get() until phase2 rolls it back.
At at the very least, having a uniqueness constraint that only allows ONE of the entities into the index might be good enough. In other words, if I query by secondary key, I will get a single entity and it will always be the same one.
I understand there is a technique that involves creating an auxiliary entity to be used as a mutex for the secondary key. In JDO/JPA that is a bunch of ugly, obfuscating code that requires extra java classes, etc. JDO/JPA pretty much assume the semantics of a full SQL datastore, so this kind of kludge really shows that JDO/JPA is not a great solution for the Google datastore. The technique is much less offensive if you are using the low level api. But, all my code now uses JDO, so it it possible to implement the uniqueness annotation?
From my point of view, it would be OK to require an index and/or assume that writes to such entities would take longer. I understand that the stage 1 (commit phase) of write does not update indices, so that (as it now stands), the entity is effectively committed (logged and visible to get()) prior to any indices being updated. Indices are updated in phase 2, which can occur after the commit call returns. However, uniqueness constraints are very important, so much so that it would be OK with me that my commit waited for phase2 to finish before it returned. This would allow the indexing phase to enforce the uniqueness constraint. It would also be ok that the non-unique entity committed in phase 1 is visible to get() until phase2 rolls it back.
At at the very least, having a uniqueness constraint that only allows ONE of the entities into the index might be good enough. In other words, if I query by secondary key, I will get a single entity and it will always be the same one.
da...@gmail.com <da...@gmail.com> #26
An obvious case where you want uniqueness is with user logins. My app uses email addresses for the login. We set the key_name of our account model and use get_or_insert() to effect uniqueness.
The downfall is when the user wants to change their email address. We have to clone not just their account entity, but all the entities underneath it (of which there are potentially many thousands.) Of course, we want the user to be able to use their account during this time, so there's a lot of logic that pains me to have to keep around just to deal with the case of accounts in motion...
The downfall is when the user wants to change their email address. We have to clone not just their account entity, but all the entities underneath it (of which there are potentially many thousands.) Of course, we want the user to be able to use their account during this time, so there's a lot of logic that pains me to have to keep around just to deal with the case of accounts in motion...
ho...@gmail.com <ho...@gmail.com> #27
is there a suitable work around for java to ensure uniqueness of a property in the datamodel?
ja...@gmail.com <ja...@gmail.com> #28
We've used this technique on a number of our applications: http://squeeville.com/2009/01/30/add-a-unique-constraint-to-google-app-engine/
It's simple but has some possibility of leaving unique values unable to be used, though it will guarantee that a unique value is actually unique. That is, because there is no transaction between the Unique model and the model holding the unique values, a failure might cause a unique value to be used. It fails on the side of guaranteeing uniqueness.
The example is Python, but is simple enough that there will be a direct Java analogy.
It's simple but has some possibility of leaving unique values unable to be used, though it will guarantee that a unique value is actually unique. That is, because there is no transaction between the Unique model and the model holding the unique values, a failure might cause a unique value to be used. It fails on the side of guaranteeing uniqueness.
The example is Python, but is simple enough that there will be a direct Java analogy.
ab...@gmail.com <ab...@gmail.com> #29
you can also combine the technique in #30 with cross group transactions. this will protect you against losing unique values.
you create an entity group for each unique field. and then do a cross group transaction on checking/updating that entity group along with the entity group of your model.
this limits you to 4 unique fields per model, and is probably terrible for performance since you're effectively serializing writes across each field. but if you really really need uniqueness...
you create an entity group for each unique field. and then do a cross group transaction on checking/updating that entity group along with the entity group of your model.
this limits you to 4 unique fields per model, and is probably terrible for performance since you're effectively serializing writes across each field. but if you really really need uniqueness...
sp...@gmail.com <sp...@gmail.com> #30
After investigating a bit about this problem, I got two possible working solutions, that could be implemented at the application level:
@ndb.transactional
def put(self):
from google.appengine.ext.db import STRONG_CONSISTENCY
key = User.gql("WHERE ANCESTOR IS :1 AND email = :2", self.key.parent(), self.url).get(keys_only=True, read_policy=STRONG_CONSISTENCY)
assert key is None or key == self.key
ndb.Model.put(self)
def put(self):
from google.appengine.api import memcache
memcache_key = "UserEmail" + repr (self.url)
assert memcache.incr (memcache_key) is None
try:
from google.appengine.ext.db import STRONG_CONSISTENCY
key = User.gql("WHERE ANCESTOR IS :1 AND email = :2", self.key.parent(), self.url).get(keys_only=True, read_policy=STRONG_CONSISTENCY)
assert key is None or key == self.key
ndb.Model.put(self)
finally:
memcache.delete(memcache_key)
Of course I prefer the first one, and according to the documentation both should work.
Yet some people insist on the using memcache to ensure there are no race conditions.
The only reasoning I see is because of the query, but even so, in the same entity group, transaction are serializable according to the documentation, so it should work just fine. What am I missing?
@ndb.transactional
def put(self):
from google.appengine.ext.db import STRONG_CONSISTENCY
key = User.gql("WHERE ANCESTOR IS :1 AND email = :2", self.key.parent(), self.url).get(keys_only=True, read_policy=STRONG_CONSISTENCY)
assert key is None or key == self.key
ndb.Model.put(self)
def put(self):
from google.appengine.api import memcache
memcache_key = "UserEmail" + repr (self.url)
assert memcache.incr (memcache_key) is None
try:
from google.appengine.ext.db import STRONG_CONSISTENCY
key = User.gql("WHERE ANCESTOR IS :1 AND email = :2", self.key.parent(), self.url).get(keys_only=True, read_policy=STRONG_CONSISTENCY)
assert key is None or key == self.key
ndb.Model.put(self)
finally:
memcache.delete(memcache_key)
Of course I prefer the first one, and according to the documentation both should work.
Yet some people insist on the using memcache to ensure there are no race conditions.
The only reasoning I see is because of the query, but even so, in the same entity group, transaction are serializable according to the documentation, so it should work just fine. What am I missing?
lu...@potatolondon.com <lu...@potatolondon.com> #31
Just wanted to bump this issue a little bit. In Djangae (https://github.com/potatolondon/djangae/ ) we've implemented a kind of unique constraint support by creating marker instances for a unique combination. The key of the marker is constructed from the unique combination (enforcing uniqueness) and markers are acquired (in independent transactions) before a Put() and released after a Delete().
Markers also have a creation time, and an associated instance. If a marker for a unique combination already exists, then we check to see if its associated instance still exists (by doing an Ancestor, keys_only query on its key) before raising exception.
When creating new instances (where the Key id hasn't yet been generated) then we create a marker without an associated instance key, but we ignore such markers for unique constraint checking after a few seconds.
Obviously, this is a bit of a kludge, but it seems to work, and we've tried the approach on some pretty high traffic sites - it is costly though, both in performance and quota, but sometimes you really need to enforce uniqueness.
It would be much better if a similar approach could be implemented in the datastore itself. This is still a very sought after feature.
Markers also have a creation time, and an associated instance. If a marker for a unique combination already exists, then we check to see if its associated instance still exists (by doing an Ancestor, keys_only query on its key) before raising exception.
When creating new instances (where the Key id hasn't yet been generated) then we create a marker without an associated instance key, but we ignore such markers for unique constraint checking after a few seconds.
Obviously, this is a bit of a kludge, but it seems to work, and we've tried the approach on some pretty high traffic sites - it is costly though, both in performance and quota, but sometimes you really need to enforce uniqueness.
It would be much better if a similar approach could be implemented in the datastore itself. This is still a very sought after feature.
[Deleted User] <[Deleted User]> #32
Bump.
I believe this is quite an important feature to be implemented.
I have run into similar constraints without it.
I believe this is quite an important feature to be implemented.
I have run into similar constraints without it.
fo...@deeptent.com <fo...@deeptent.com> #33
Upvote + 1
ja...@aro.digital <ja...@aro.digital> #34
Are we seriously still waiting for this? +1, I'll be using Cloud SQL for now thanks.
ot...@gmail.com <ot...@gmail.com> #35
Maybe you should solve this before solving quantum supremacy Google. Come on!
ju...@gmail.com <ju...@gmail.com> #36
We need this!
mi...@gmail.com <mi...@gmail.com> #37
@Google, come one! This has been a known issue since 2008. Get this solved and ensure that firebase/firestore is competitive with other like database products.
sr...@gmail.com <sr...@gmail.com> #38
much awaited feature.
mi...@notflip.be <mi...@notflip.be> #39
Is there any good method available to get this working
sa...@gmail.com <sa...@gmail.com> #40
wow. I commented on this issue over a decade ago. I don't think there's a good fix, but now you can use Google Cloud SQL instead, and set unique=true in your table schema...
ga...@gmail.com <ga...@gmail.com> #41
Are we getting this? Ever?
Description
across all model entities.