Opened 3 years ago

Closed 2 years ago

Last modified 18 months ago

#26522 closed Bug (fixed)

Non-deterministic crash in django.db.models.sql.Query.combine()

Reported by: Ole Laursen Owned by: nobody
Component: Database layer (models, ORM) Version: 1.9
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hi,

There's a logical error somewhere in Query.combine or friends.

I have a pretty complicated query with many-to-many self-joins and ORs, and after the upgrade to Django 1.9 and Python 3 combine() sometimes, but not always, crashes with the assertion error

assert set(change_map.keys()).intersection(set(change_map.values())) == set()

Inspecting the change_map, it does indeed contain a circular reference (BTW shouldn't you use "not" to test for an empty set rather than comparing with set()?).

At first, I didn't understand how it could crash sometimes and sometimes not - we're talking about the same query being executed through a view. But when I examine change_map, its content does indeed change from reload to reload - I suppose this may have something to do with the order of dicts/sets the combine() algorithm is using.

Here comes some code, I cut out a bunch of unused stuff, but otherwise it's the same:

class Invoice(models.Model):
    customer = models.ForeignKey(Customer)

    date_created = models.DateField(default=datetime.date.today, db_index=True)
    date_sent = models.DateField(null=True, blank=True)
    date_due = models.DateField(null=True, blank=True)
    date_paid = models.DateField(null=True, blank=True)
    date_credited = models.DateField(null=True, blank=True)
    date_collect = models.DateField(null=True, blank=True)

    invoice_type = models.CharField(default="invoice", max_length=32)

    reminders = models.ManyToManyField("Invoice", related_name="reminded_set", blank=True)
    reminder_counter = models.IntegerField(null=True, blank=True)

That's the model, and in the view:

    import datetime
    from django.db.models import Q

    date = datetime.datetime.now()

    invoices = Invoice.objects.filter(
        Q(date_created__lte=date),
        Q(date_paid__gt=date) | Q(date_paid=None),
        Q(date_credited__gt=date) | Q(date_credited=None),
        customer=1,
    )

    filtered_invoices = Invoice.objects.none()

    not_due = Q(date_due__gte=date) | Q(date_due=None)
    not_reminded_yet = ~Q(reminders__date_created__lte=date)
    not_collected = Q(date_collect__gt=date) | Q(date_collect=None)

    filtered_invoices |= invoices.filter(not_due, not_collected, date_sent__lte=date, invoice_type="invoice")

    filtered_invoices |= invoices.filter(not_collected, not_reminded_yet, date_sent__lte=date, date_due__lt=date, invoice_type="invoice")

    for r in [1, 2, 3]:
        qs = invoices.filter(not_collected, reminders__date_created__lte=date, reminders__reminder_counter=r, invoice_type="invoice")
        for i in range(r + 1, 3 + 1):
            qs = qs.filter(~Q(reminders__reminder_counter=i) | Q(reminders__reminder_counter=i, reminders__date_created__gt=date))
        filtered_invoices |= qs

I realize it's pretty complicated but I'm not sure what's essential and what's not. I hope someone knowledgeable of how combine() works can figure out why ordering in some cases matters.

Attachments (1)

combinebug.tar.gz (2.8 KB) - added by Ole Laursen 3 years ago.
Test project for reproducing the problem

Download all attachments as: .zip

Change History (16)

comment:1 Changed 3 years ago by Ole Laursen

Oh, I forgot to mention - it crashes on the very last line filtered_invoices |= qs in the above with r = 2. So after having constructed a pretty complicated thing.

And it's these lines in combine() that makes it crash:

    # Now relabel a copy of the rhs where-clause and add it to the current
    # one.
    w = rhs.where.clone()
    w.relabel_aliases(change_map)

change_map contains

   {'T4': 'T6', 'T5': 'T7', 'T6': 'T8', 'myapp_invoice_reminders': 'T5'}

so T5 is both in the keys and in the values which is not allowed by the assertion in change_aliases.

Last edited 3 years ago by Ole Laursen (previous) (diff)

comment:2 Changed 3 years ago by Tim Graham

Could you bisect the regression? If you can provide the test app that you use for that process, it'll be helpful.

Changed 3 years ago by Ole Laursen

Attachment: combinebug.tar.gz added

Test project for reproducing the problem

comment:3 Changed 3 years ago by Ole Laursen

I've attached a test project. Run it with

  cd combinebug
  DJANGO_SETTINGS_MODULE=combinebug.settings PYTHONPATH=../Django-1.7.5:. python combinebug/triggerbug.py

If it crashes, congrats, you've reproduced it. If not, try running it multiple times.

I tested it with Python 2.7 and Django 1.7.5 and could reproduce the crash and Python 3 and Django 1.9.4 and could also reproduce it.

1.6.2 seems to be working - that was the version I upgraded the whole project from and it's been stable.

comment:4 Changed 3 years ago by Tim Graham

Summary: Bug in django.db.models.sql.Query.combineNon-determinstic crash in django.db.models.sql.Query.combine()
Triage Stage: UnreviewedAccepted

comment:5 Changed 3 years ago by Tim Graham

#26959 looks like a duplicate and includes a more minimal model to reproduce.

comment:6 Changed 2 years ago by Bo Marchman

Has patch: set

PR at https://github.com/django/django/pull/8139

I ran into this bug yesterday and spent some time trying to figure out what's going on. It seems like the circular reference arises when alias_map is iterated in Query.join in order to find the next alias to reuse:

reuse = [a for a, j in self.alias_map.items()
               if (reuse is None or a in reuse) and j == join]
 if reuse:
     self.ref_alias(reuse[0])
     return reuse[0]

Because alias_map is a dict, the alias to reuse is chosen non-deterministically. Using an OrderedDict for alias_map solves this problem.

It can also by solved by changing the reuse code to

reuse = [a for a, j in self.alias_map.items()
               if (reuse is None or a in reuse) and j == join]
if reuse:
    if join.table_alias in reuse:
        to_use = join.table_alias
    else:
        to_use = reuse[0]

    self.ref_alias(to_use)
    return to_use

But the OrderedDict seems cleaner to me.

I don't understand much of the join internals here so I'd love to hear if someone has a better solution.

comment:7 Changed 2 years ago by Tim Graham

Triage Stage: AcceptedReady for checkin

comment:8 Changed 2 years ago by Tim Graham <timograham@…>

Resolution: fixed
Status: newclosed

In 9bbb6e2d:

Fixed #26522 -- Fixed a nondeterministic AssertionError in QuerySet combining.

Thanks Andrew Brown for the test case.

comment:9 Changed 21 months ago by Gwildor Sok

I just ran into this problem as well after updating a project from 1.8 to 1.11.2. Unfortunately, implementing the fix by changing alias_map to be an OrderedDict did not help. I managed to work around it by just rewriting the query, (because it was doing some dumb things which were obfuscated by chaining manager methods, as you can see below), but I thought I would share the smallest test case I could reproduce it with, just in case someone else runs into it as well.

My models:

class Assessment(models.Model):
    pass

class AnswerGroup(models.Model):
    assessment = models.ForeignKey(Assessment)

    author = models.ForeignKey(
        User, related_name='created_answergroups')
    about = models.ForeignKey(
        User, blank=True, null=True, related_name='subject_answergroups)

And my test code where I get the same AssertionError:

u = User.objects.get(pk=1)

assessments = Assessment.objects.all()

groups = AnswerGroup.objects.filter(assessment__in=assessments)

groups_1 = groups.filter(author=u, about=u)
groups_2 = groups.filter(author=u).exclude(about=None).exclude(about=u)

list(groups_1 | groups_2)

Note that when I remove the assessment__in filter, the code works fine. Also note that change_map is something like {u'auth_user': 'T4', 'T4': u'auth_user'} both with and without the filter, and thus both when an error occurs and when it does not.

I've worked around it by just using one filter and ditching the combine by just using filter(author=u).exclude(about=None), but of course not everyone has the option to rewrite their query that easily.

comment:10 Changed 20 months ago by Tim Graham

Summary: Non-determinstic crash in django.db.models.sql.Query.combine()Non-deterministic crash in django.db.models.sql.Query.combine()

comment:11 Changed 20 months ago by Tim Graham <timograham@…>

In f7f5edd:

Removed obsolete Query.tables attribute.

Obsolete since Query.alias_map became an OrderedDict (refs #26522).

comment:12 Changed 18 months ago by Mikalai Radchuk

It's a very tricky issue. I think, it can damage data in some cases.

In my case Django ORM didn't fail with assertion error: it generated wrong SQL query so I got wrong data from my DB. On development environment I didn't get this issue because it appears randomly. I'm a lucky person - in my case I query for some data for UI, but if you perform some calculations using data from a queryset that silently returns wrong data - you are in trouble. It's quite possible that 4/5 of your processes work properly, but one gives wrong results.

Here is simplified version of my issue.

Models:

from django.db import models


class Category(models.Model):
    title = models.CharField(max_length=255, unique=True)

    class Meta:
        verbose_name_plural = 'categories'

    def __str__(self):
        return self.title


class Document(models.Model):
    title = models.CharField(blank=True, max_length=255)
    categories = models.ManyToManyField('Category', blank=True, related_name='+')

    def __str__(self):
        return self.title

Build a query set that could return wrong data:

from app_with_models.models import Document, Category

queryset = Document.objects.all()

# Exclude documents without taxonomy (Category) tags
queryset = queryset.filter(categories__in=Category.objects.all())

# Apply predefined taxonomy filters
category_predefined = [1]  # Should be in fixtures
queryset = queryset.filter(categories__in=category_predefined)

queryset = queryset.distinct()

used_category_ids = queryset.values_list('categories__pk', flat=True)
print(used_category_ids.query)

This code should generate the following SQL query (formatted):

SELECT DISTINCT
  "app_with_models_document_categories"."category_id"
FROM "app_with_models_document"
  LEFT OUTER JOIN "app_with_models_document_categories" ON ("app_with_models_document"."id" = "app_with_models_document_categories"."document_id")
  INNER JOIN "app_with_models_document_categories" T4 ON ("app_with_models_document"."id" = T4."document_id")
WHERE (
  "app_with_models_document_categories"."category_id" IN (SELECT U0."id" AS Col1 FROM "app_with_models_category" U0)
  AND T4."category_id" IN (1)
);

But in some cases it generates this:

SELECT DISTINCT
  T4."category_id"
FROM "app_with_models_document"
  LEFT OUTER JOIN "app_with_models_document_categories" ON ("app_with_models_document"."id" = "app_with_models_document_categories"."document_id")
  INNER JOIN "app_with_models_document_categories" T4 ON ("app_with_models_document"."id" = T4."document_id")
WHERE (
  "app_with_models_document_categories"."category_id" IN (SELECT U0."id" AS Col1 FROM "app_with_models_category" U0)
  AND T4."category_id" IN (1)
);

This query generates absolutely wrong results. In my example it will always return 1 as category_id.

Some more details:

  • Can be reproduced on Django==1.10.7, Django==1.11.4 and, most likely, on order versions too. Fixed in 9bbb6e2d2536c4ac20dc13a94c1f80494e51f8d9
  • Can be reproduced on Python 3.5, but not on Python 3.6 (at least, I didn't manage to do that). I guess it's because of new dict implementation.
  • I've tested with Postgres 9.6.3 and psycopg2==2.6.1 and psycopg2==2.7.3.1, but it looks like the issue is not DB engine/adapter specific.

Hope it helps.

Can it be backported into supported (or at least into LTS)? I would really appreciate it. Me an my colleague have spent 4 days debugging our code an I'm afraid that it could cause more serious issues for other people.

comment:13 Changed 18 months ago by Tim Graham

I guess it's okay to backport to 1.11. There's some indication based on the comments that this may be a regression. PR for the stable/1.11.x branch.

comment:14 Changed 18 months ago by Tim Graham <timograham@…>

In 8d66bff:

[1.11.x] Fixed #26522 -- Fixed a nondeterministic AssertionError in QuerySet combining.

Thanks Andrew Brown for the test case.

Backport of 9bbb6e2d2536c4ac20dc13a94c1f80494e51f8d9 from master

comment:15 Changed 18 months ago by Tim Graham <timograham@…>

In 04050bff:

Refs #26522 -- Forwardported 1.11.5 release note.

Note: See TracTickets for help on using tickets.
Back to Top