#33516 closed Bug (invalid)
QuerySet update() under postgres with select_for_update(skip_locked=True) results in neither FOR UPDATE nor SKIP LOCKED
Reported by: | Grant Gainey | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 3.2 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Versions:
$ cat /etc/fedora-release Fedora release 34 (Thirty Four) $ pip freeze | grep -i django Django==3.2.12 django-crispy-forms==1.13.0 django-currentuser==0.5.3 django-extensions==3.1.5 django-filter==21.1 django-guardian==2.4.0 django-guid==3.2.1 django-import-export==2.7.1 django-lifecycle==0.9.3 django-readonly-field==1.1.1 django-storages==1.12.3 djangorestframework==3.13.1 djangorestframework-queryfields==1.0.0 $ psql --version psql (PostgreSQL) 13.4
Problem Statement
pulpcore is a Django app that does A Lot Of Concurrent Stuff. One of those Things includes "update the last-touched timestamp of large batches of entities at the end of a workflow". We use queryset.update(timestamp_of_interest=now())
for this.
Batch updates don't guarantee row-update-order in Postgresql. In heavy-concurrency contexts, when updating overlapping content in multiple threads, this can result in deadlocks as postgres acquires row-locks in arbitrary orders.
To get around this, in raw postgres, one uses UPDATE...WHERE id IN (SELECT id FROM table WHERE expression ORDER BY id FOR UPDATE)
. The SELECT-FOR-UPDATE-ORDER-BY enforces row-lock-ordering prior to the UPDATE, and everyone is happily synchronized and all the dining philosophers get to eat.
If (as in our specific use-case) you don't actually care who updates the row, just that it is being updated "now()-ish", you can regain some concurrency by using Postgres' SKIP LOCKED
functionality.
So - we attempted to use the following to accomplish that in our update workflow:
TestModel.objects.filter(pk__in=content_q).order_by("pk").select_for_update( skip_locked=True ).update(timestamp_of_interest=now())
By the time we get to Postgres, this does not emit FOR UPDATE or SKIP LOCKED, and therefore doesn't solve our deadlock problem. Cue sad-trombone.
Specifically, from the driver-example below, this code:
def update_timestamp(index): content_q = ( TestModel.objects.filter() .order_by("pk") .values_list("pk", flat=True) ) TestModel.objects.filter(pk__in=content_q).order_by("pk").select_for_update( skip_locked=True ).update(timestamp_of_interest=now())
generates this SQL:
2022-02-15 19:12:47.276 UTC [337361] LOG: statement: UPDATE "core_testmodel" SET "timestamp_of_interest" = '2022-02-15T19:12:47.187475+00:00'::timestamptz WHERE "core_testmodel"."pulp_id" IN (SELECT U0."pulp_id" FROM "core_testmodel" U0)
Running the driver-example will generate deadlocks; the more TestModel entries exist, the more often you hit a deadlock. With 5K entries, test runs hit deadlocks ~5-10 times. With 20K entries, it was more like ~60-70% of the 100 runs.
Workaround
Right now, we implemented a workaround in raw SQL. You can see it here:
We would much much rather not have to use raw-sql.
Full Example:
- Here's a really simple model, stored as
pulpcore/apps/models/test.py
import uuid from django.db import models from django_lifecycle import LifecycleModel class TestModel(LifecycleModel): pulp_id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False) timestamp_of_interest = models.DateTimeField(auto_now=True)
- Here's a migration that creates that model:
# Generated by Django 3.2.12 on 2022-02-15 18:53 from django.db import migrations, models import django_lifecycle.mixins import uuid class Migration(migrations.Migration): dependencies = [ ] operations = [ migrations.CreateModel( name='TestModel', fields=[ ('pulp_id', models.UUIDField(default=uuid.uuid4, editable=False, primary_key=True, serialize=False)), ('timestamp_of_interest', models.DateTimeField(auto_now=True)), ], options={ 'abstract': False, }, bases=(django_lifecycle.mixins.LifecycleModelMixin, models.Model), ), ]
- Here's some python that adds 5000 rows to the resulting table (there is probably a better/faster way to do this, but this is obvious):
from pulpcore.app.models import TestModel for tm in range(5000): TestModel.objects.create()
- Turning on statement-loggin in Postgres and restarting:
$ sudo vim /var/lib/pgsql/data/postgresql.conf ... log_statement = 'all' # none, ddl, mod, all $ sudo systemctl restart postgresql
- And here, at last, is driver-code that will generate deadlocks and let us inspect the emitted SQL:
from threading import Thread from pulpcore.app.models import TestModel from django.utils.timezone import now durations = [] def update_timestamp(index): content_q = ( TestModel.objects.filter() .order_by("pk") .values_list("pk", flat=True) ) TestModel.objects.filter(pk__in=content_q).order_by("pk").select_for_update( skip_locked=True ).update(timestamp_of_interest=now()) print(">>>done") for r in range(10): threads = [] for i in range(10): threads.append(Thread(target=update_timestamp, args=(i,))) for t in threads: t.start() for t in threads: t.join()
Change History (2)
comment:1 by , 3 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
comment:2 by , 3 years ago
Ah-ha! Perfect, that explains everything!
Two comments/suggestions:
- Could it be possible for select_for_update() to throw an exception when it's not going to be emitted? Would have helped me target the problem way sooner if the method yelled at me for using it incorrectly
- Similarly - when called 'correctly', select_for_update() throws an exception if you're not in a transaction. When called incorrectly, it doesn't. Would be great if the behavior was consistent.
For posterity's sake, below is my test-script using the suggested changes, that works on a 20K table with no deadlocks. Thanks for the pointer!
import statistics import time from threading import Thread from pulpcore.app.models import Content from django.db import transaction from django.utils.timezone import now durations = [] def update_timestamp(index): print(">>>in update_timedstamp index {}".format(index)) start = time.time() with transaction.atomic(): content_q = ( Content.objects.filter() .order_by("pk") .select_for_update(skip_locked=True) .values_list("pk", flat=True) ) Content.objects.filter(pk__in=content_q).update(timestamp_of_interest=now()) end = time.time() durations.append(end-start) print(">>>done") for r in range(10): threads = [] for i in range(10): threads.append(Thread(target=update_timestamp, args=(i,))) for t in threads: t.start() for t in threads: t.join() print("Avg time : {}".format(sum(durations) / len(durations))) print("Median time : {}".format(statistics.median(durations))) print("StdDev : {}".format(statistics.stdev(durations)))
And the emitted SQL:
2022-02-16 13:46:02.517 UTC [368804] LOG: statement: UPDATE "core_content" SET "timestamp_of_interest" = '2022-02-16T13:46:02.516956+00:00'::timestamptz WHERE "core_content"."pulp_id" IN (SELECT U0."pulp_id" FROM "core_content" U0 ORDER BY U0."pulp_id" ASC FOR UPDATE SKIP LOCKED)
In the latest version of Django the ordering should not be cleared when
select_for_update
is used on the subquery provided to a__in
lookup.There was some recent changes in this area in 4.0 and I haven't confirmed if this is an issue on 3.2 but in all cases
order_by.select_for_update
has to be called on the subquery and not the outer one for this to work as the resulting SQL hints at and that's likely why your solution isn't working here.