﻿id	summary	reporter	owner	description	type	status	component	version	severity	resolution	keywords	cc	stage	has_patch	needs_docs	needs_tests	needs_better_patch	easy	ui_ux
33516	QuerySet update() under postgres with select_for_update(skip_locked=True) results in neither FOR UPDATE nor SKIP LOCKED	Grant Gainey	nobody	"=== Versions:
{{{
$ cat /etc/fedora-release 
Fedora release 34 (Thirty Four)

$ pip freeze | grep -i django
Django==3.2.12
django-crispy-forms==1.13.0
django-currentuser==0.5.3
django-extensions==3.1.5
django-filter==21.1
django-guardian==2.4.0
django-guid==3.2.1
django-import-export==2.7.1
django-lifecycle==0.9.3
django-readonly-field==1.1.1
django-storages==1.12.3
djangorestframework==3.13.1
djangorestframework-queryfields==1.0.0

$ psql --version
psql (PostgreSQL) 13.4
}}}

=== Problem Statement
pulpcore is a Django app that does A Lot Of Concurrent Stuff. One of those Things includes ""update the last-touched timestamp of large batches of entities at the end of a workflow"". We use `queryset.update(timestamp_of_interest=now())` for this.

Batch updates don't guarantee row-update-order in Postgresql. In heavy-concurrency contexts, when updating overlapping content in multiple threads, this can result in deadlocks as postgres acquires row-locks in arbitrary orders.

To get around this, in raw postgres, one uses `UPDATE...WHERE id IN (SELECT  id FROM table WHERE expression ORDER BY id FOR UPDATE)`. The SELECT-FOR-UPDATE-ORDER-BY enforces row-lock-ordering prior to the UPDATE, and everyone is happily synchronized and all the dining philosophers get to eat.

If (as in our specific use-case) you don't actually care **who** updates the row, just that it is being updated ""now()-ish"", you can regain some concurrency by using Postgres' `SKIP LOCKED` functionality.

So - we attempted to use the following to accomplish that in our update workflow:
{{{
TestModel.objects.filter(pk__in=content_q).order_by(""pk"").select_for_update(
        skip_locked=True
    ).update(timestamp_of_interest=now())
}}}

By the time we get to Postgres, this **does not emit FOR UPDATE or SKIP LOCKED**, and therefore doesn't solve our deadlock problem. Cue sad-trombone.

Specifically, from the driver-example below, this code:
{{{
def update_timestamp(index):
    content_q = (
        TestModel.objects.filter()
            .order_by(""pk"")
            .values_list(""pk"", flat=True)
    )
    TestModel.objects.filter(pk__in=content_q).order_by(""pk"").select_for_update(
        skip_locked=True
    ).update(timestamp_of_interest=now())
}}}

generates this SQL:
{{{
2022-02-15 19:12:47.276 UTC [337361] LOG:  statement: UPDATE ""core_testmodel"" SET ""timestamp_of_interest"" = '2022-02-15T19:12:47.187475+00:00'::timestamptz WHERE ""core_testmodel"".""pulp_id"" IN (SELECT U0.""pulp_id"" FROM ""core_testmodel"" U0)
}}}

Running the driver-example will generate deadlocks; the more TestModel entries exist, the more often you hit a deadlock. With 5K entries, test runs hit deadlocks ~5-10 times. With 20K entries, it was more like ~60-70% of the 100 runs.

=== Workaround
Right now, we implemented a workaround in raw SQL. You can see it here: 

https://github.com/pulp/pulpcore/pull/2159/files#diff-81f6a78175bb93934b6beff952646d3ca1ef3731f1ff14492d4ec77bfc3fdf82R132-R141

We would **much much rather** not have to use raw-sql.

=== Full Example:

* Here's a really simple model, stored as `pulpcore/apps/models/test.py`
{{{
import uuid
from django.db import models
from django_lifecycle import LifecycleModel


class TestModel(LifecycleModel):
    pulp_id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
    timestamp_of_interest = models.DateTimeField(auto_now=True)
}}}

* Here's a migration that creates that model:
{{{
# Generated by Django 3.2.12 on 2022-02-15 18:53

from django.db import migrations, models
import django_lifecycle.mixins
import uuid


class Migration(migrations.Migration):

    dependencies = [
    ]

    operations = [
        migrations.CreateModel(
            name='TestModel',
            fields=[
                ('pulp_id', models.UUIDField(default=uuid.uuid4, editable=False, primary_key=True, serialize=False)),
                ('timestamp_of_interest', models.DateTimeField(auto_now=True)),
            ],
            options={
                'abstract': False,
            },
            bases=(django_lifecycle.mixins.LifecycleModelMixin, models.Model),
        ),
    ]
}}}

* Here's some python that adds 5000 rows to the resulting table (there is probably a better/faster way to do this, but this is obvious):
{{{
from pulpcore.app.models import TestModel

for tm in range(5000):
    TestModel.objects.create()
}}}

* Turning on statement-loggin in Postgres and restarting:
{{{
$ sudo vim /var/lib/pgsql/data/postgresql.conf
...
log_statement = 'all'                   # none, ddl, mod, all
$ sudo systemctl restart postgresql
}}}

* And here, at last, is driver-code that will generate deadlocks and let us inspect the emitted SQL:
{{{
from threading import Thread
from pulpcore.app.models import TestModel
from django.utils.timezone import now


durations = []

def update_timestamp(index):
    content_q = (
        TestModel.objects.filter()
            .order_by(""pk"")
            .values_list(""pk"", flat=True)
    )
    TestModel.objects.filter(pk__in=content_q).order_by(""pk"").select_for_update(
        skip_locked=True
    ).update(timestamp_of_interest=now())
    print("">>>done"")

for r in range(10):
    threads = []
    for i in range(10):
        threads.append(Thread(target=update_timestamp, args=(i,)))
    for t in threads:
        t.start()
    for t in threads:
        t.join()
}}}

"	Bug	closed	Database layer (models, ORM)	3.2	Normal	invalid			Unreviewed	0	0	0	0	0	0
