Opened 16 years ago

Closed 16 years ago

Last modified 13 years ago

#9041 closed (invalid)

invalid count result when paired with select_related

Reported by: Mathieu Bouchard <mbouchar@…> Owned by: nobody
Component: Database layer (models, ORM) Version: 1.0
Severity: Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When using a select_related, I sometimes get invalid results if the select_related method reject some data because they are not valid.
If, for example, I have a model that requires a reference to another object and that reference is, for some reason, None, these entries
will be rejected by the select_related.

The problem is that the Paginator object uses the count method to tell the user how many items are found, but when we try to use these items, we only get an empty page.

>>> a = Session.objects.filter(Key__value = "www").select_related()
>>> a.count()
29L
>>> len(a)
3
>>> from django.core.paginator import Paginator                              
>>> from metrix2.manager.models import Session
>>> q = Session.objects.filter(Key__string = "2PSpuaRIZGNj").select_related()
>>> p = Paginator(q, 15)
>>> p.count
29L
>>> p.num_pages
2
>>> p.page(1).object_list
[<Session: xxx>, <Session: yyy>, <Session: zzz>]
>>> p.page(2).object_list
[]

Change History (7)

comment:1 by Julien Phalip, 16 years ago

Component: UncategorizedDatabase layer (models, ORM)

comment:2 by Jacob, 16 years ago

milestone: 1.1
Triage Stage: UnreviewedAccepted

comment:3 by Malcolm Tredinnick, 16 years ago

This is difficult to diagnose, since the problem is highly dependent on the original models. In particular, which relations are nullable (have null=True). Can you provide a description of the models you're using. I don't need all the gory details, but at least the outline of Session, Key and how they're linked together.

comment:4 by Malcolm Tredinnick, 16 years ago

I also need to know the other models that Session links to: in other words, what else should be being pulled in by the select_related() call.

comment:5 by mbouchar, 16 years ago

As I said in my first message, this happens only when the data in the db is not valid for django (when integrating legacy data, for example, or when the django model has changed). For example, if we have the following model

from django.db import models

class ForeignModel(models.Model):
    pass

class Key(models.Model):
    value = models.CharField(blank = False, max_length = 12)
    foreignmodel = models.ForeignKey(ForeignModel, blank = True, null = True)

class Session(models.Model):
    key = models.ForeignKey(Key)

If we add some data (and some where the foreignmodel value is null), and we then change the foreignmodel field to require a value, this triggers the bug:

foreignmodel = models.ForeignKey(ForeignModel)

Of course, the data is then invalid and the database does not reflect the django model. The QuerySet count() method is using "select count(field)" for obvious speed reasons, but do not seem to perform the select_related.

For example, the current implementation of:

>>> Session.objects.filter(key__value = "1").select_related().count()
15

Seems to be translated to the following SQL code:

> SELECT COUNT(S.id) FROM example_session S, example_key K WHERE S.key_id = K.id AND K.value = "1";
15

It should probably be modified to the following SQL code:

> SELECT COUNT(S.id) FROM example_session S, example_key K, example_foreignmodel F WHERE S.key_id = K.id AND K.foreignmodel_id = F.id AND K.value = "1";
7

This would return the same data as the following django code:

>>> len(Session.objects.filter(key__value = "1").select_related())
7

comment:6 by Malcolm Tredinnick, 16 years ago

Resolution: invalid
Status: newclosed

This example seems to be based on some misunderstanding, possibly in how select_related() works (since you're telling Django to use it in an inappropriate situation).

Pulling in the extra table in your "preferred" SQL should have precisely zero effect for the count query you asked for. None of the fields in that table are being used, so Django (correctly) ignores the select_related() piece, which is just an optimisation for when we are pulling back data from that table. You are trying to use select_related() as some way to filter out rows by only counting rows that have a matching row in the corresponding related table (via the inner join on the tables).

This isn't a question of Django needing to know about all the columns in the related table or not. It simply doesn't care, since it makes no difference to the final result (counting the id values in the "S") table. You aren't asking for a count where values in the related table exist, you are asking simply for a count (you can do the former query, but you have to explicitly say so).

You have told Django that every instance of Key has a corresponding entry in ForeignKey and, yet, it appears that that isn't actually correct, which is why the count and the select_related() are returning different results. You can omit columns in the database declaration, but you can't lie about the type of join required to pull in related columns.

Django only knows that you said an inner join would suffice for selecting all Key entries when connecting to ForeignKey, since every row has a corresponding value in the other table. This turns out not to be correct, which is why you're seeing what you're seeing.

Change your model definition to include null=True on the related field and the results will make sense.

There's no Django bug here.

comment:7 by Jacob, 13 years ago

milestone: 1.1

Milestone 1.1 deleted

Note: See TracTickets for help on using tickets.
Back to Top