#18557 closed Uncategorized (duplicate)
get_or_create() causes a race condition with MySQL
Reported by: | Cal Leeming | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.4 |
Severity: | Normal | Keywords: | |
Cc: | cal@…, django@… | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Hi,
When using MySQL, get_or_create() has a race condition under high load scenarios.
Up until now, the only fix was to use READ COMMITED transaction isolation, but this can break legacy apps.
Instead - we have been using the following fix in production, and it works great under high load or multi threaded / multi node job queues.
# This needs uploading tomorrow class ExtendedManager( Manager ): @transaction.commit_on_success def get_or_create(self, *args, **kwargs): transaction.commit() created = None try: return super(ExtendedManager, self).get_or_create(*args, **kwargs) except IntegrityError, e: transaction.commit() print "RACE 3: %s, %s" % ( str(e) , kwargs) # Ensure the error code matches 1062 (duplicate entry) if not e.args[0] == 1062: raise e _res = self.all().filter(**kwargs) if not _res: raise Exception, "Object busy or not yet ready: %s" % ( e ) if len(_res) > 1: raise Exception, "get_or_create(): duplicate object found, this should never happen" return _res[0], False
This has been discussed here:
http://stackoverflow.com/questions/2235318/how-do-i-deal-with-this-race-condition-in-django
And here:
https://groups.google.com/forum/?fromgroups#!topic/django-developers/VNpt-sxSmho
Is there any chance this patch would make it into the core?
Cal
Change History (7)
comment:1 by , 12 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
comment:2 by , 12 years ago
Resolution: | wontfix |
---|---|
Status: | closed → reopened |
@akaariai You raise a very good point about it breaking transaction control, and enabling such a flag would depend on the developer ensuring they knew exactly what they were doing, and when it is safe to use it.
I'm thinking something along these lines:
MyModel.objects.get_or_create(a=1, b=2, force_commit=True)
Along with a documentation update that says:
Certain databases (such as MySQL) don't gracefully handle get_or_create() when multiple threads are being used to write to the same table. If throughput is high enough, then there is a small race condition where the MySQL index says the unique index exists, but any attempt to fetch that key will result in failure. The only way around this is to commit the transaction you are in, which then allows you to fetch the row. However, if your get_or_create() is in a transaction block with manual commits, then any queries before the get_or_create() call will also be committed. If you plan on using this feature, you must ensure that the get_or_create() call is within a safe context where it is okay for the previous queries to be committed.
Also, to touch on your question about what kind of usage pattern leads to this race condition, it's fairly easy to trigger. You just need two or more threads attempting to perform get_or_create() on the same table, within a close space of each other. A typical scenario could be a queued import job which has to do a get_or_create() on a popular item such as IP address. If both scripts encounter the same IP at the same time, it will cause the race condition to happen. The reason this affected us so badly, is because the majority of our work involves importing and mangling large data sets - where as a low traffic site would almost never see this happen.
If the above suggested patch description sounds good, please let me know and I'll get a patch prepared.
comment:3 by , 12 years ago
Cc: | added |
---|---|
Component: | Core (URLs) → Database layer (models, ORM) |
comment:4 by , 12 years ago
Resolution: | → wontfix |
---|---|
Status: | reopened → closed |
Replying to foxwhisper:
Is there any chance this patch would make it into the core?
To be honest, as is, there is no chance for this code to make it into Django:
- it isn't a patch, just a chunk of code,
- it's specific to MySQL but it appears that it should go into django.db.models,
- there's no explanation of why you're using this technique and why it works, and it isn't strikingly obvious either,
- it messes transaction control,
- no tests, no docs,
- the code itself very far from Django's coding standards (
print
,raise Exception
, etc.)
Regarding transaction control, I see you've put the responsibility on the developer. It's hand-vawing, and I don't think we can put the issue aside like this. Transaction control is one of the more complicated parts of Django and I don't expect more than 0.1% of the framework's users to understand it. (I don't understand it completely myself.)
I acknowledge that the problem exists, but this specific piece of code doesn't look like an appropriate solution, at least not in its current state.
The way to move forward on a ticket that received a wontfix by a core developer is to bring up the issue on django-developers. Thanks!
comment:5 by , 12 years ago
Cc: | added |
---|
comment:6 by , 12 years ago
Mailing-list discussion: https://groups.google.com/d/topic/django-developers/wvpysb-fXtc/discussion
You can't unconditionally do commit() in get_or_create. That breaks transaction control. So, my opinion is that no chance to get into core as is.
It seems this issue seems to raise its head from time to time. I haven't hit this issue ever (maybe because I don't use get_or_create() that much, and I don't use MySQL). I am interested to hear what kind of usage pattern leads into concurrency problems in get_or_create()?
If the .commit() way of get_or_create is really wanted, then it needs to be a separate method or at least a flag to get_or_create() is needed. I won't object to such a patch, but neither will I pursue its commit.
I will mark this as wontfix - please reopen if you are going with the flag/different method. The wontfix is specifically to the suggested approach. Also, #13906 is related, though it is about the default transaction isolation level.