#35751 closed Bug (invalid)
Ordering a model via a m2m field creates unintended side effect for ForeignKeys
Reported by: | Dennis Scheiba | Owned by: | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 5.1 |
Severity: | Normal | Keywords: | ORM ordering |
Cc: | Dennis Scheiba | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Python 3.10 @ Django 5.1.1
Using a many to many relation for ordering (which is something you shouldn't do?) will affect the traversing when the object in question is accessed via a foreign key - the necessary left join for the ordering "spills" into the ORM results, yielding n (number of foreign key references) times m (number of many-to-many relations within the object) objects instead of just the actual n objects referencing the foreign key.
If you comment out the ordering the ORM behaves as expected.
Th ORM results should not "multiply" due to an ordering configuration.
How to reproduce
Given a toy models.py
on a new project which looks like
from django.db import models class Order(models.Model): pass class BookingTime(models.Model): date = models.DateTimeField(auto_now=True) class OrderItem(models.Model): order = models.ForeignKey( Order, on_delete=models.CASCADE, related_name="order_items", ) booking_times = models.ManyToManyField( "BookingTime", related_name="order_items", ) class Meta: ordering = [ # this is the problem! 'booking_times__date', ]
Then on a shell do
In [1]: from foo.models import * In [2]: booking_times = [BookingTime() for _ in range(4)] In [3]: [b.save() for b in booking_times] Out[3]: [None, None, None, None] In [4]: order = Order() In [5]: order.save() In [6]: order_item = OrderItem(order=order) In [7]: order_item.save() In [8]: order_item.booking_times.add(*booking_times) In [9]: order.order_items.count() Out[9]: 1 In [10]: order.order_items.all() Out[10]: <QuerySet [<OrderItem: OrderItem object (1)>, <OrderItem: OrderItem object (1)>, <OrderItem: OrderItem object (1)>, <OrderItem: OrderItem object (1)>]> In [11]: print(order.order_items.all().query) SELECT "foo_orderitem"."id", "foo_orderitem"."order_id" FROM "foo_orderitem" LEFT OUTER JOIN "foo_orderitem_booking_times" ON ("foo_orderitem"."id" = "foo_orderitem_booking_times"."orderitem_id") LEFT OUTER JOIN "foo_bookingtime" ON ("foo_orderitem_booking_times"."bookingtime_id" = "foo_bookingtime"."id") WHERE "foo_orderitem"."order_id" = 2 ORDER BY "foo_bookingtime"."date" ASC In [12]: order.order_items.all().explain() Out[12]: '5 0 0 SEARCH foo_orderitem USING COVERING INDEX foo_orderitem_order_id_e19c2dbd (order_id=?)\n11 0 0 SEARCH foo_orderitem_booking_times USING COVERING INDEX foo_orderitem_booking_times_orderitem_id_bookingtime_id_49667055_uniq (orderitem_id=?) LEFT-JOIN\n17 0 0 SEARCH foo_bookingtime USING INTEGER PRIMARY KEY (rowid=?) LEFT-JOIN\n35 0 0 USE TEMP B-TREE FOR ORDER BY'
This also happens in a template, e.g.
{% for item in order.order_items.all %} {{ item }} {% endfor %}
also yields 4 times the same item instead of just once (due to the left join?).
Change History (6)
comment:1 by , 8 weeks ago
Description: | modified (diff) |
---|
comment:2 by , 8 weeks ago
Description: | modified (diff) |
---|
comment:3 by , 8 weeks ago
Description: | modified (diff) |
---|
comment:4 by , 8 weeks ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
comment:5 by , 8 weeks ago
I don't have enough expertise in the construction of an ORM to fully understand all implications, but as you mentioned, an ordering by a m2m relation is not explicit enough to be determined, and actually the LOC in question was an overlook from my side. I did not expect that the ordering of an item can be the cause for the multiplication of items and I instead digged through the Manager implementation to find the cause.
Maybe a warning could have been helpful here as it fundamentally changes the way the ORM behaves? I'd think that the edge cases where this actual behavior is wanted can be done in a way which is more canonical within the ORM.
In the end, a warning during boot procedure would have been nice for me and would have saved me some headache - maybe even this issue is already helpful enough for people.
comment:6 by , 8 weeks ago
I personally think that order_by(multivalued__value)
is a foot-gun due to its ambiguity and duplicate row spanning behaviour and should be deprecated but it poses some challenges as it is a behavior that was supported and even documented for over a decade so it's safe to assume that some users might be legitimately depend on it for some nice query needs (e.g. adding distinct()
afterwards to deal with the duplicate row issue).
If we were to raise warnings or consider deprecating this feature we should have an upgrade path that covers these use cases. In your particular case
OrderItem.objects.order_by("booking_times__date").distinct()
which results in
SELECT DISTINCT orderitem.* FROM orderitem LEFT JOIN orderitem_booking_times ON (orderitem_booking_times.orderitem_id = orderitem.id) LEFT JOIN bookingtime ON (orderitem_booking_times.bookingtime_id = bookingtime.id) ORDER BY bookingtime.date
can be replaced with
OrderItem.objects.order_by(Min("booking_times__date"))
which results in
SELECT orderitem.* FROM orderitem LEFT JOIN orderitem_booking_times ON (orderitem_booking_times.orderitem_id = orderitem.id) LEFT JOIN bookingtime ON (orderitem_booking_times.bookingtime_id = bookingtime.id) GROUP BY orderitem.id ORDER BY MIN(bookingtime.date)
Since order_by
is resolved late in the query compilation process it should be doable to present a deprecation warning that suggests the usage of Min
and Max
instead.
This behaviour is described in the documentation under the note that covers It is permissible to specify a multi-valued field to order the results by.
Could you describe what you expecting to happen when specifying
order_by("booking_times__date")
if not for each item to be returned multiple times?What are you expecting
booking_times__date
to even mean? Are you looking fororder_by(Min("booking_times__date"))
instead so the items are ordered by their first booking date, or maybeMax("booking_times__date")
so its the item with the booking that ends first, or maybe you interested in ordering by the shortest booking duration soMax("booking_times__date") - Min("booking_times__date")
?See #20842 which introduced this documentation addition and ticket:20842#comment:3 which explains why
order_by("booking_times__date")
by itself doesn't make much sense and should likely be deprecated.