Add unique=True/False to db.property constructor to ensure uniqueness [35875869]

Assigned

Feature Request

Status Update

No update yet.

Description

ed...@gmail.com

created issue #1

Apr 13, 2008 03:32AM

Add unique=True/False to db.property constructor to ensure uniqueness
across all model entities.

Comments

ed...@gmail.com <ed...@gmail.com> #2Apr 14, 2008 06:38AM

[Comment deleted]

sp...@gtempaccount.com <sp...@gtempaccount.com> #3May 6, 2008 09:43PM

I'd rather see this added as an index feature, so the uniqueness could be spread
across multiple fields (and an index would almost certainly be needed to support this
efficiently, anyway).

ed...@gmail.com <ed...@gmail.com> #4May 6, 2008 10:15PM

I totally agree that this should be implemented as an index. It could (and I think
should) still be specified in the schema/property constructor and the index
automatically generated.

ed...@gmail.com <ed...@gmail.com> #5Jul 10, 2008 03:59PM

It would also be great to have a way, in the schema declarations, to specify
composite unique indexes/properties, especially for reference properties.

sa...@gmail.com <sa...@gmail.com> #6Jul 10, 2008 04:05PM

Yes, this would be super useful.

be...@gmail.com <be...@gmail.com> #7Aug 29, 2008 03:37PM

Yes, this is a very common usecase +1

am...@gmail.com <am...@gmail.com> Sep 25, 2008 10:31PM

Assigned to am...@gmail.com.

ma...@gmail.com <ma...@gmail.com> #8Nov 6, 2008 01:57PM

I need this feature.

tw...@gmail.com <tw...@gmail.com> #9Nov 6, 2008 05:12PM

Can't this be done with a carefully contstructed key name?

ed...@gmail.com <ed...@gmail.com> #10Nov 6, 2008 05:27PM

No. It can't be done with a carefully constructed key name... because the key name is
immutable once it is set at creation. For example, say you have a User model with an
email address in it. You want the email address to be unique across all User
entities. If you use the email address as key name you are out of luck when the user
changes his/her email.

be...@gmail.com <be...@gmail.com> #11Nov 6, 2008 05:27PM

no i don't think so, because keys are used to reference objects, so if you change the
key you have to update all referencing properties.

uniqueness is not the same use-case as a primary key

ry...@google.com <ry...@google.com> #12Nov 6, 2008 07:48PM

hi all! we've discussed this a bit internally, and we agree, it would be a useful
feature. it's definitely one of the most widely used constraints in SQL schemas.

unfortunately, it would be pretty difficult to implement with the current datastore
architecture, due in large part to our index consistency model, described in

http://code.google.com/appengine/articles/transaction_isolation.html . we might have
to chalk this up as another place the datastore doesn't provide traditional
relational database features.

at first, the key_name approach seemed tempting to me too. unfortunately, as edoardo
and bernddorn mention, it effectively makes those properties read only. having said
that, i suspect that "unique" properties like these actually are often read only, so
that might not be quite as big a drawback.

out of curiosity, would this feature be useful to anyone if it was limited to entity
groups? ie, you could specify that property values must be unique within each
individual entity group, but they could be repeated across entity groups?

ed...@gmail.com <ed...@gmail.com> #13Nov 6, 2008 07:56PM

ryanb: even properties that are thought to be immutable/readonly (like unique
identifiers, say ISSN/ISBN etc) may actually change in the future (and they did).
That is why, in general, it is a very bad idea to use properties as key_names (that's
the all thinking behind using a simple integer as primary key in relational database
design). Moreover, there are unique properties (say email addresses) that you want
unique and are most likely to change.

The correct way of implementing this would be to have the unique constraint apply
within all entities of a specific KIND, not just an entity group as you suggested.

Another way of going at it would be to be able to query the datastore within a
transaction.

be...@gmail.com <be...@gmail.com> #14Nov 6, 2008 08:04PM

for me this would be totally ok, afaik an entity-group is needed anyways to check
uniqueness in a transaction safe manner, isnt it?

even though in my applications i do not check uniqueness transaction safe. i thought
that if this would happen on the datastore end it would be possible implement it on
the storage/index level, which would make it much less error prune than doing it on
the application level via properties.

ye...@gmail.com <ye...@gmail.com> #15Nov 6, 2008 09:16PM

Transaction isn't needed to guarantee a property is unique. Do a put and followed by
a query on that property, if count is greater than 1, delete self. If you worry about
exception, you can mark one property of the new object to indicate it's new and flip
it once verified unique. If there's contention of put, both object should delete them
self, there shouldn't be 2 object both marked old but with same property value.

Of course solve this on index level is much efficient.

th...@gmail.com <th...@gmail.com> #16Nov 6, 2008 09:20PM

question: Are entity ID's guaranteed to be unique?

If so, would it be possible to create a property that does something like:

entity.index = eventual_id

instead of:

entity.put()
entity.index = entity.key().id()

[Deleted User] <[Deleted User]> #17Nov 7, 2008 12:32AM

[Comment deleted]

ry...@google.com <ry...@google.com> #18Nov 7, 2008 06:23PM

generally, no, IDs are not unique. however, they will be unique for entities with the
same kind and the same parent, or root entities of the same kind.

in general, the full path is the only thing that's always guaranteed to be unique.
see

http://code.google.com/appengine/docs/datastore/keysandentitygroups.html#Paths_and_Key_Uniqueness
.

wk...@gmail.com <wk...@gmail.com> #19Nov 7, 2008 07:01PM

I think this "compromise" is pretty much useless. The whole point of having unique
entities is to have some slightly more complex guarantees about consistency than what
you currently provide with key_name. Is this feature really impossible to implement
in a scalable way or is this a specific problem of bigtable or the datastore?

What about allowing to safely change the key_name? Could you somehow allow for
locking two entity groups at the same time? If a deadlock occurs you could just
revert one of the transactions and let the other continue to run. This might need
some more thought though. :)

bo...@gmail.com <bo...@gmail.com> #20Dec 17, 2008 02:41AM

I ran into this and between a thread in the groups and this ticket, it's been
helpful. I did come up with a solution for my specific use case. I'm using a floating
point number as a score and need it to be unique because I am paging off of the
value. I also needed it to not be immutable, because scores can change. What I did
was override the put method, with a check to confirm the value being unique and if
not, then adjusted it and tried again.

def put(object):
valid = False
while valid == False:
cquery = db.GqlQuery('SELECT * FROM Story WHERE score = :1', object.score)
results = cquery.fetch(1)
if len(results) > 0:
value = value + .01
else:
valid = True
super(Story, Story).put(object)

One thing that might be useful from the appengine standpoint would be to add an
attribute similar to validator for properties, like unique_validator = None, which
could then be a hook for developers to create their own functions. Or if you needed a
more generic function that could raise an exception, it could be something similar to
the above, except it would raise an exception on the check for an existing value, and
it could then be up to the developer to catch the exception and adjust the value
accordingly before reattempting the put operation. Then you could just have a unique
= True/False attribute.

ka...@gmail.com <ka...@gmail.com> #21Jan 9, 2009 12:20PM

In JPA (Java) there is an annotation @UniqueConstraint [1] that can be put on the
class-level table annotation to identify which properties should be unique (either
alone or alongside others). This is usually used in addition to property-level
annotations, which have that unique=True/False property.

Before calling put(), the information from the annotation (or a simple property
_unique_constraints) is used to issue some queries, and, if no constraints are
violated, the entity is stored. If any constraint is violated an appropriate error
can be raised.

I have implemented a very basic version of my suggestion, which is enough for what I
need:

## start

class DbEntity(db.Model):
_unique_properties = None
timestamp = db.DateTimeProperty(auto_now=True)

def __init__(self, parent=None, key_name=None, _app=None, unique_properties = None,
**kwds):
super(DbEntity, self).__init__(parent, key_name, _app, **kwds)
if unique_properties:
logging.debug("Unique properties: %s" % unique_properties)
self._unique_properties = unique_properties

def put(self):
if self._unique_properties:
logging.debug('checking unique properties for : %s ...' % self.__class__)
for unique in self._unique_properties:
gqlString = 'WHERE %s = :1' % unique
param = eval('self.%s' % unique)

logging.info('GQL: self.gql("%s", %s)' % (gqlString, param))
query = self.gql(gqlString, eval('self.%s' % unique))
otherObjects = query.fetch(10)
if len(otherObjects) > 0:
logging.error("Objects that violate the constraints: %s" % otherObjects)
raise db.BadPropertyError("Other instances of %s exist with %s = %s" %
(self.__class__, unique, param))
return super(DbEntity, self).put()

class Team(DbEntity):
name = db.StringProperty(required=True)

def __init__(self, parent=None, key_name=None, _app=None, **kwds):
# set the unique property names
super(Team, self).__init__(unique_properties = ['name'], **kwds)

## end

[1]

http://java.sun.com/javaee/5/docs/api/javax/persistence/UniqueConstraint.html

t....@gmail.com <t....@gmail.com> #22Jan 9, 2009 02:48PM

re: Comment 23

This code contains a race condition and will not ensure uniqueness (it should, most
of the time, but there's no uniqueness guarantee).

ja...@gmail.com <ja...@gmail.com> #23Mar 25, 2009 01:01PM

sorry my bad english but ehy me get 2 insert when me wait 1

ro...@gmail.com <ro...@gmail.com> #24Oct 3, 2010 03:51AM

This is a total kludge but what if you did memcache.add("dblock", "dblock", 10), then do a query to see if there are any existing objects in the datastore with your unique fields before doing the db.put() and then memcache.delete("dblock"). If the memcache.add() returns false, just wait a random number of milliseconds and try again.

em...@gtempaccount.com <em...@gtempaccount.com> #25Mar 7, 2011 08:35PM

Yes, unique constraints are critical to certain use cases. My problem is that I allow the datastore to create unique, immutable numeric IDs for the Key, which are used for entity relationships, but I need mutable secondary key for human readable keys that may change. I ran into a case where a request was repeated quickly (within 50ms) and so two JVMs handled them simultaneously. Within a transaction, the code queried on the secondary key to see if it already existed. Both handlers got null, since the entity did not yet exist. They then created it and the datastore assigned two unique numeric primary keys to the new entities. This left me with two entities with duplicate secondary keys.

I understand there is a technique that involves creating an auxiliary entity to be used as a mutex for the secondary key. In JDO/JPA that is a bunch of ugly, obfuscating code that requires extra java classes, etc. JDO/JPA pretty much assume the semantics of a full SQL datastore, so this kind of kludge really shows that JDO/JPA is not a great solution for the Google datastore. The technique is much less offensive if you are using the low level api. But, all my code now uses JDO, so it it possible to implement the uniqueness annotation?

From my point of view, it would be OK to require an index and/or assume that writes to such entities would take longer. I understand that the stage 1 (commit phase) of write does not update indices, so that (as it now stands), the entity is effectively committed (logged and visible to get()) prior to any indices being updated. Indices are updated in phase 2, which can occur after the commit call returns. However, uniqueness constraints are very important, so much so that it would be OK with me that my commit waited for phase2 to finish before it returned. This would allow the indexing phase to enforce the uniqueness constraint. It would also be ok that the non-unique entity committed in phase 1 is visible to get() until phase2 rolls it back.

At at the very least, having a uniqueness constraint that only allows ONE of the entities into the index might be good enough. In other words, if I query by secondary key, I will get a single entity and it will always be the same one.

da...@gmail.com <da...@gmail.com> #26Apr 30, 2011 07:18AM

An obvious case where you want uniqueness is with user logins. My app uses email addresses for the login. We set the key_name of our account model and use get_or_insert() to effect uniqueness.

The downfall is when the user wants to change their email address. We have to clone not just their account entity, but all the entities underneath it (of which there are potentially many thousands.) Of course, we want the user to be able to use their account during this time, so there's a lot of logic that pains me to have to keep around just to deal with the case of accounts in motion...

ho...@gmail.com <ho...@gmail.com> #27Nov 22, 2011 09:20AM

is there a suitable work around for java to ensure uniqueness of a property in the datamodel?

ja...@gmail.com <ja...@gmail.com> #28Nov 22, 2011 10:53PM

We've used this technique on a number of our applications:

http://squeeville.com/2009/01/30/add-a-unique-constraint-to-google-app-engine/

It's simple but has some possibility of leaving unique values unable to be used, though it will guarantee that a unique value is actually unique. That is, because there is no transaction between the Unique model and the model holding the unique values, a failure might cause a unique value to be used. It fails on the side of guaranteeing uniqueness.

The example is Python, but is simple enough that there will be a direct Java analogy.

ab...@gmail.com <ab...@gmail.com> #29Feb 14, 2012 03:08PM

you can also combine the technique in #30 with cross group transactions. this will protect you against losing unique values.

you create an entity group for each unique field. and then do a cross group transaction on checking/updating that entity group along with the entity group of your model.

this limits you to 4 unique fields per model, and is probably terrible for performance since you're effectively serializing writes across each field. but if you really really need uniqueness...

sp...@gmail.com <sp...@gmail.com> #30Sep 6, 2012 12:53PM

After investigating a bit about this problem, I got two possible working solutions, that could be implemented at the application level:

@ndb.transactional
def put(self):
from google.appengine.ext.db import STRONG_CONSISTENCY
key = User.gql("WHERE ANCESTOR IS :1 AND email = :2", self.key.parent(), self.url).get(keys_only=True, read_policy=STRONG_CONSISTENCY)
assert key is None or key == self.key
ndb.Model.put(self)

def put(self):
from google.appengine.api import memcache
memcache_key = "UserEmail" + repr (self.url)
assert memcache.incr (memcache_key) is None
try:
from google.appengine.ext.db import STRONG_CONSISTENCY
key = User.gql("WHERE ANCESTOR IS :1 AND email = :2", self.key.parent(), self.url).get(keys_only=True, read_policy=STRONG_CONSISTENCY)
assert key is None or key == self.key
ndb.Model.put(self)
finally:
memcache.delete(memcache_key)

Of course I prefer the first one, and according to the documentation both should work.
Yet some people insist on the using memcache to ensure there are no race conditions.

The only reasoning I see is because of the query, but even so, in the same entity group, transaction are serializable according to the documentation, so it should work just fine. What am I missing?

lu...@potatolondon.com <lu...@potatolondon.com> #31Feb 6, 2015 03:18PM

Just wanted to bump this issue a little bit. In Djangae (

https://github.com/potatolondon/djangae/) we've implemented a kind of unique constraint support by creating marker instances for a unique combination. The key of the marker is constructed from the unique combination (enforcing uniqueness) and markers are acquired (in independent transactions) before a Put() and released after a Delete().

Markers also have a creation time, and an associated instance. If a marker for a unique combination already exists, then we check to see if its associated instance still exists (by doing an Ancestor, keys_only query on its key) before raising exception.

When creating new instances (where the Key id hasn't yet been generated) then we create a marker without an associated instance key, but we ignore such markers for unique constraint checking after a few seconds.

Obviously, this is a bit of a kludge, but it seems to work, and we've tried the approach on some pretty high traffic sites - it is costly though, both in performance and quota, but sometimes you really need to enforce uniqueness.

It would be much better if a similar approach could be implemented in the datastore itself. This is still a very sought after feature.

[Deleted User] <[Deleted User]> #32Jul 4, 2018 08:10AM

Bump.

I believe this is quite an important feature to be implemented.
I have run into similar constraints without it.

fo...@deeptent.com <fo...@deeptent.com> #33Dec 20, 2018 05:10AM

Upvote + 1

ja...@aro.digital <ja...@aro.digital> #34Jan 6, 2019 03:54AM

Are we seriously still waiting for this? +1, I'll be using Cloud SQL for now thanks.

ot...@gmail.com <ot...@gmail.com> #35Dec 7, 2019 04:21AM

Maybe you should solve this before solving quantum supremacy Google. Come on!

ju...@gmail.com <ju...@gmail.com> #36Aug 12, 2020 05:40PM

We need this!

mi...@gmail.com <mi...@gmail.com> #37Aug 21, 2020 11:26PM

@Google, come one! This has been a known issue since 2008. Get this solved and ensure that firebase/firestore is competitive with other like database products.

sr...@gmail.com <sr...@gmail.com> #38Nov 30, 2020 08:46AM

much awaited feature.

mi...@notflip.be <mi...@notflip.be> #39Dec 5, 2021 07:04PM

Is there any good method available to get this working

sa...@gmail.com <sa...@gmail.com> #40Dec 5, 2021 10:03PM

wow. I commented on this issue over a decade ago. I don't think there's a good fix, but now you can use Google Cloud SQL instead, and set unique=true in your table schema...

ga...@gmail.com <ga...@gmail.com> #41Mar 25, 2023 08:39PM

Are we getting this? Ever?

va...@google.com <va...@google.com> May 22, 2024 06:45AM

Reassigned to va...@google.com.

va...@google.com <va...@google.com> Jan 28, 2025 09:08AM

Reassigned to gc...@google.com.