Opened 8 years ago

Closed 8 years ago

#27193 closed New feature (fixed)

ORDER BY clause not included in subqueries using select_for_update()

Reported by: sqwishy Owned by: nobody
Component: Database layer (models, ORM) Version: 1.9
Severity: Normal Keywords:
Cc: Shai Berger Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

This was tested using Django 1.9.5. I believe the code below fails to generate the correct SQL. I explicitly set an ordering in the subquery but it is not expressed in the generated SQL. This alters the behaviour of the SELECT statement when the FOR UPDATE clause is used. My use case is to lock rows in a particular order with the goal of preventing deadlocks.

Example code:

with transaction.atomic():
    print(User.objects.filter(id__in=User.objects.order_by('id').select_for_update()))

PostgreSQL logs:

2016-09-08 02:59:43 ACWST [30398-20] testsystem@testsystem LOG:  statement: BEGIN
2016-09-08 02:59:43 ACWST [30398-21] testsystem@testsystem LOG:  statement: SELECT "testapp_user"."id", "testapp_user"."password", ... FROM "testapp_user" WHERE "testapp_user"."id" IN (SELECT "testapp_user"."id" FROM "testapp_user" FOR UPDATE) LIMIT 21
2016-09-08 02:59:43 ACWST [30398-22] testsystem@testsystem LOG:  statement: COMMIT

It looks like the order_by is being cleared on the queryset object here https://github.com/django/django/blob/master/django/db/models/sql/compiler.py#L476

I can verify this as it looks like ORDER BY does show up when I include distinct() in the subquery - which isn't a suitable workaround as DISTINCT is not compatible with FOR UPDATE - and by slicing the subquery as to set an offset or a limit.

For now, an acceptable workaround seems to make the SELECT FOR UPDATE in a separate query instead of a subquery. That at least seems to accomplish the goal I am trying to achieve.

Change History (6)

comment:1 by Shai Berger, 8 years ago

Resolution: needsinfo
Status: newclosed

Hi,

You said:

My use case is to lock rows in a particular order with the goal of preventing deadlocks.

Can you please point to some documentation, or example code with execution, that shows this idea may work? As far as I know, the locking is atomic -- it is done for all rows matching the criteria together, and thus the order-by clause on the inner select is indeed meaningless and can be dropped.

If and when you show this, please re-open the ticket.

comment:2 by sqwishy, 8 years ago

Resolution: needsinfo
Status: closednew

Replying to shaib:

As far as I know, the locking is atomic -- it is done for all rows matching the criteria together, and thus the order-by clause on the inner select is indeed meaningless and can be dropped.

How would you get deadlocks with updates if that were the case?

I can't find where the documentation explicitly states that locking is done row-by-row. Depending on how you read it, the information at the following link may suggest that rows are locked while a query is running rather than all at once atomically. See particularly the second paragraph under "13.2.1. Read Committed Isolation Level". https://www.postgresql.org/docs/current/static/transaction-iso.html#XACT-READ-COMMITTED

This is a bit of code where adding the order by seems to prevent deadlocks.

from django.db import transaction
from django.contrib.auth.models import Group
from django.db.models import F

def setup():
    for i in range(20):
        Group.objects.get_or_create(name='test_27193_%i' % i)

def test():
    for _ in range(20):
        with transaction.atomic():
            Group.objects.filter(
                id__in=Group.objects.filter(name__startswith='test_27193').select_for_update()
            ).update(name=F('name'))
            #Group.objects.filter(
            #    id__in=Group.objects.filter(name__startswith='test_27193').select_for_update().order_by('id')[1:]
            #).update(name=F('name'))

setup()
test()

I have that saved to a file aptly named script.py and run it 20 times at once (I'm sure there is a less ugly way to do this):

seq 20 | parallel --ungroup -j 20 "echo {}; ./manage.py shell --plain < script.py"

Doing this with the script as shown, exceptions are raised in python about deadlocks and PostgreSQL logs about it. If you comment out that first query and uncomment the second one everything runs fine. The slice there is just to convince django not to reset the order-by value in the queryset.

Last edited 8 years ago by sqwishy (previous) (diff)

comment:3 by Shai Berger, 8 years ago

Well, I don't quite see it in the doc you linked, but it's mentioned in this thread and your code example is convincing. I stand corrected.

comment:4 by Shai Berger, 8 years ago

Cc: Shai Berger added
Component: UncategorizedDatabase layer (models, ORM)
Triage Stage: UnreviewedAccepted
Type: UncategorizedNew feature

That said, 1.9 and even 1.10 are past the phase where this can be put in them. Accepting tentatively, assuming it can be reproduced in master (I don't have time to check right now)

comment:5 by Tim Graham, 8 years ago

Has patch: set

comment:6 by Tim Graham <timograham@…>, 8 years ago

Resolution: fixed
Status: newclosed

In 8ac115c7:

Fixed #27193 -- Preserved ordering in select_for_update subqueries.

Note: See TracTickets for help on using tickets.
Back to Top