Opened 16 years ago

Closed 16 years ago

Last modified 16 years ago

#6701 closed (wontfix)

in qs-rf when many to many field referenced in order_by(), count() doesn't match length of queryset

Reported by: matt@… Owned by: nobody
Component: Uncategorized Version: queryset-refactor
Severity: Keywords: qs-rf
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

If you include a many-to-many field which has multiple relationships for some objects you're querying in a call to order_by() then the .count() on the resulting queryset will not match the len() on the queryset if .count() is called before anything that loads the data (as if count() is called after the data is loaded then it returns the length of the data rather than doing the problematic query). I'm guessing Query.get_count() tries to run a simplified query.

Change History (5)

comment:1 by Malcolm Tredinnick, 16 years ago

Resolution: wontfix
Status: newclosed

Ordering by a many-to-many field doesn't make any sense, since an ordering on the individual rows in the m2m table doesn't map to a natural ordering comparing sets of m2m results on the parent model.

get_count() strips any ordering, since it doesn't need it to count the items. I don't think this is worth changing because of the above paragraph.

comment:2 by matt@…, 16 years ago

My post on django-users http://groups.google.com/group/django-users/browse_thread/thread/6acd66bcd9d21de2 perhaps helps show what I'm trying to do. It is useful to order on an m2m IF in the result set you can see what m2m related items correspond to the entries in the result set.

As for not fixing it - if I'd suggest perhaps a note on the documentation then so anyone who, in the future, uses the fact that you CAN specify m2ms in the order_by doesn't get baffled by the count() being wrong sometimes and right sometimes (depending on whether the result cache is populated yet or not). Although I guess the alternative if you believe it doesn't make sense (I still disagree on that one - see my use case in the thread) is to just remove m2m from order_by and document it.

To give an example of the count inconsistency, set up a query_set with an m2m on the order by and multiples and do the following:
first_count = test_set.count()
second_count = len(test_set)
third_count = test_set.count()

And you could get, e.g., first_count being 5, second_count being 6 and then third_count being 6 because of "if self._result_cache is not None: return len(self._result_cache)"

comment:3 by Malcolm Tredinnick, 16 years ago

It doesn't make sense mathematically. This isn't a personal preference thing. It's the fact that you cannot order an SQL query based on multiple results. You're trying to turn an ordering on individual items into an ordering on sets of those items and there's no canonical way to do this. The fact that the results you get are somewhat aligned with what you are hoping for is accidental and not at all guaranteed in general.

At some point, Django will raise an error when you try to specify such a field in an ordering clause; it just doesn't do so yet.

comment:4 by anonymous, 16 years ago

If I have object A related to object B via an m2m and I want to return a results set of A sorted on an attribute of B then I'd expect to get repeats of objects of type A where its related to multiples of object B, e.g. what would be going on behind the scenes would be:

SELECT contract.id, contract.name, supplier.name FROM contract, contract_supplier, supplier WHERE contract.id=contract_supplier.contract AND contract_supplier.supplier=supplier.id ORDER BY supplier.name

Results:
1 | Contract B | Supplier 1
2 | Contract A | Supplier 2
1 | Contract B | Supplier 3

So I'd get two copies of the Contract B object. The suppliers field value on each of those copies would be exactly the same.

And indeed this is what happens at present in qs-rf if I specify order_by on an attribute related via an m2m. It does not make sense UNLESS the results that query_set gives you are tagged with the order_by value that lead to the repetition. I'm not suggesting that the suppliers attribute on the contract object in the results from query_set be any different just because you're sorting on an attribute of the supplier objects - which I think is what you mean by "doesn't make sense mathematically"? In my use case I'd need a way of getting at the order_by column value so I can display to the user what supplier is relevant to the position of that contract result in the set (which is what my django users post was about).

Stepping back from what an order_by of "suppliername" means, I personally would find it useful (as I've said) to be able to have a way to order a result set by an m2m field and get at value of that m2m field that corresponds to a given entry in the result set - it may be that the "django way" of doing that is to use .extras(), which is fair enough.

My apologies if this is the wrong place to discuss this!

comment:5 by Malcolm Tredinnick, 16 years ago

This is absolutely the wrong place to discuss this.

Note: See TracTickets for help on using tickets.
Back to Top