Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#9006 closed (wontfix)

QuerySet indexing by __getitem__ gets wrong answer in edge cases

Reported by: smoluf Owned by: mtredinnick
Component: Core (Other) Version: 1.0
Severity: Keywords: queryset index race
Cc: andrew.tennikoff@…, carljm Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

I've just spent a few hours in pdb tracking this down. I've got about as far as I'm able into the Django libraries looking for the culprit, but I've run out of ideas due to my limited knowledge of Django's inner workings. I need someone with a bit more experience to look at this one.

I have a Membership model which relates Users to Groups.

# models.py
class Membership (models.Model):
    user = models.ForeignKey(User)
    group = models.ForeignKey(Group)
    school_year = models.ForeignKey(SchoolYear, related_name='memberships')

This query turns up the following results:

>>> Membership.objects.filter(user__username="jammons")
[<Membership: Office of Technology Services (Jeff Ammons, 2008 - 2009)>, <Membership: UPS Fencing Club (Jeff Ammons, 2008 - 2009)>, <Membership: Adelphian Concert Choir (Jeff Ammons, 2008 - 2009)>]

These results are expected and good. However, indexing does the Wrong Thing. For some reason, __getitem__ returns the same model for [0] and [2]:

>>> qs[0]
<Membership: Adelphian Concert Choir (Jeff Ammons, 2008 - 2009)>
>>> qs[1]
<Membership: UPS Fencing Club (Jeff Ammons, 2008 - 2009)>
>>> qs[2]
<Membership: Adelphian Concert Choir (Jeff Ammons, 2008 - 2009)>
>>> 

The result is that the real qs[0] is unreachable, and furthermore, inline formset validation is therefore broken when this occurs, because the wrong self.instance is set on the first form, causing it to .exclude() the wrong instance from its unique_check and resulting in the form believing that it is a duplicate.

Engaging pdb and then entering "continue" (running the query in the debugger without actually inspecting the trace) returns the same answer.

>>> def test():
...     print Membership.objects.filter(user__username="jammons")[0]
...
>>> import pdb
>>> pdb.runcall(test)
> <console>(2)test()
(Pdb) continue
Adelphian Concert Choir (Jeff Ammons, 2008 - 2009)

HOWEVER: Running pdb and stepping through the code produces the right answer:

>>> pdb.runcall(test)
> <console>(2)test()
(Pdb) step

...lots of stepping through the various nested calls...

(Pdb) step
Office of Technology Services (Jeff Ammons, 2008 - 2009)

Upon inspecting django.db.connection.queries, what seems to be happening is that the first two methods hit the _result_cache to do the indexing, while the third runs new queries instead.

Theory 1: This is some sort of bizarre race condition where the _result_cache is being deleted, but somehow lingers on long enough that execution speed is a factor in determining whether the indexing can access it or not.

Theory 2: By trial and error, it seemed that stepping through _clone() (django/db/models/query.py, line 218) in pdb gave the right answer, while allowing the debugger to run it at full speed gave the wrong. Perhaps there's something in __clone() that depends on execution speed. This is not 100%, though, as at least on one occasion it produced the wrong answer.

Theory 3: There is some issue with the laziness of QuerySets and database query time.

Someone else is probably better qualified than I am to speculate on the potential causes. I'm happy to provide more detailed information if you need it. Good luck!

Attachments (1)

queryset_getitem_ordering_fix.diff (1009 bytes) - added by john 7 years ago.
Patch to apply a default ordering to querysets that lack them before slicing or indexing them.

Download all attachments as: .zip

Change History (12)

comment:1 Changed 7 years ago by anonymous

  • Component changed from Uncategorized to Core framework
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 7 years ago by anonymous

I encountered the similar problem with my web application. The cause of error is the unpredictable order of the returned rows.
During __getitem__

calling if the indexed item is absent in the cache, the queryset object is cloning and building new sql expression with new OFFSET value and LIMIT 1. The builded expression doesn't include ORDER BY part, so in general SQL backend can return any row.

The best solution that I found is to add

class Meta: 
  ordering = ['id']

to the model definition.
Maybe framework have to do it by default for models without specified ordering?

comment:3 Changed 7 years ago by drozzy

I have the same problem.
When the getitem is called with index 0 on a queryset like:
[<A: blah, bluh, booh>, <B: blah, bleh, booh>]

it actually returns <B> always, which is index of 1.

This only happens when saving objects through inlinemodel formsets. Ordering id as pointed out above fixes the problem.
Strangely enough this does not happen to the model which only has 3 attributes. The model that breaks in my case has 6 attributes.

comment:4 Changed 7 years ago by mtredinnick

  • Owner changed from nobody to mtredinnick
  • Status changed from new to assigned

Changed 7 years ago by john

Patch to apply a default ordering to querysets that lack them before slicing or indexing them.

comment:5 Changed 7 years ago by john

This may just be due to the fact that LIMIT queries without ORDER BY aren't guaranteed to return consistent result sets. The problem may be specific to PostgreSQL (see http://www.postgresql.org/docs/8.3/interactive/queries-limit.html); I haven't had time to research the other supported databases.

This patch doesn't break anything, according to runtests, and it solved the problem I was having in the admin interface (see #9076).

comment:6 Changed 7 years ago by anonymous

  • Cc andrew.tennikoff@… added

comment:7 Changed 7 years ago by carljm

  • Cc carljm added

comment:8 Changed 7 years ago by jacob

  • Resolution set to invalid
  • Status changed from assigned to closed

Yeah, the problem here is that you don't have an ORDER BY SQL clause (i.e. order_by() on the queryset or ordering on the model's Meta). Without an ordering the database is free to return results in whatever order it chooses. If you need consistent ordering, you need to specify it.

comment:9 Changed 7 years ago by smoluf

  • Resolution invalid deleted
  • Status changed from closed to reopened

Hold on a minute. This is more than just a user error problem; this has consequences for users as well as the built-in Django forms module.

For users:
As documented, there is no reason to expect that indexing into a QuerySet should be an undefined behavior. At the absolute minimum this should be documented, but I will argue that a QuerySet should have *some* ordering, even if it is arbitrary. I should not, for example, have duplicate or missing values if I manually retrieve the model object at every index of a QuerySet (although I can expect the ordering to be arbitrary).

For Django:
My argument above is reinforced by the fact that Django itself makes this assumption. Inline formsets use indexing in their validation code, and this undefined behavior means that their mechanism is broken (this is how I found out about the bug in the first place). This is simply not acceptable - either QuerySets need to have a definite ordering, or Django inline formsets need to force an ordering on the QuerySets they index into while validating.

comment:10 follow-up: Changed 7 years ago by jacob

  • Resolution set to wontfix
  • Status changed from reopened to closed

First, please don't reopen tickets closed by a committer. The correct way to revisit issues is to take it up on django-dev.

As for your specific points: you're asking here that we shield users from the reality of using a relational database. It's basic RDMBS behavior that if you don't provide an ordering, the order is undefined. Preventing valid behavior from the underlying database isn't the Django way at all.

[As for the point about inline formsets: that's a bug in formsets; please open a new one about that so we don't loose track.]

comment:11 in reply to: ↑ 10 Changed 7 years ago by kmtracey

Replying to jacob:

[As for the point about inline formsets: that's a bug in formsets; please open a new one about that so we don't loose track.]

#10163 and #9758 are already open covering the admin/formset issues.

Note: See TracTickets for help on using tickets.
Back to Top