Opened 7 years ago
Last modified 7 months ago
#28519 assigned New feature
Add filter(), exclude(), and other base QuerySet methods to combined QuerySets (union(), etc.)
Reported by: | Stanislav Karpov | Owned by: | Damir Nafikov |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | union, intersection, difference |
Cc: | Florian Apolloner, harrim4n, Dave Halter, Jaap Roes, Skrattoune, Tom Carrick, Ülgen Sarıkavak | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Hello.
I found a discussion about the filtering of combined QuerySet
(union, intersection, difference): https://github.com/django/django/pull/7727#issuecomment-269283216
As I understood it, as a result of the discussion, it was suggested to filter the queries before union:
If I understood your code correctly (I still did not test it), you are still generating select * from (x union y) -- while this is more powerful it is also unnecessary in most cases. I'd also argue that it is less efficient, cause filtering on that result can (usually and probably should) be pushed down to the underlying queries (when possible) to limit the result set before joining them.
https://github.com/django/django/pull/7727#issuecomment-269461236
But such a decision blocks the possibility of re-use of QuerySet
, which is very convenient when working with GenericForeignKey
.
A similar example of the structure of a real project (only a structure, not an entities).
import typing from django.db import models from django.contrib.contenttypes.fields import GenericForeignKey, GenericRelation class Manager(models.Model): first_name = models.CharField(max_length=255) last_name = models.CharField(max_length=255) class Organization(models.Model): # pickup title = models.CharField(max_length=255) foundation_date = models.DateField() manager = models.ForeignKey(Manager, on_delete=models.CASCADE) class Warehouse(models.Model): # pickup organization = models.ForeignKey(Organization, on_delete=models.CASCADE) address = models.TextField() class City(models.Model): # bus title = models.CharField(max_length=255) code = models.CharField(max_length=10) manager = models.ForeignKey(Manager, on_delete=models.CASCADE) class Depot(models.Model): # bus title = models.CharField(max_length=255) city = models.ForeignKey(City, on_delete=models.CASCADE) class Garage(models.Model): # wagon address = models.TextField() manager = models.ForeignKey(Manager, on_delete=models.CASCADE) class AirbagManager(models.Manager): def filter_by_manager__union_and_inner_join( self, manager: typing.Union[Manager, int], ) -> models.QuerySet: # NOT WORKING (only proposal) manager_id = self._clean_manager_id(manager) return ( self # If I'm not mistaken, now we can't direct call `union` from `Manager`, # but this variant increases the readability .union( # INNER JOIN, INNER JOIN, INNER JOIN self.filter(pickup__warehouse__organization__manager_id=manager_id), # INNER JOIN, INNER JOIN, INNER JOIN self.filter(bus__depot__city__manager_id=manager_id), # INNER JOIN, INNER JOIN self.filter(wagon__garage__manager_id=manager_id), ) ) def filter_by_manager__left_join( self, manager: typing.Union[Manager, int], ) -> models.QuerySet: manager_id = self._clean_manager_id(manager) return self.filter( # LEFT JOIN, LEFT JOIN, LEFT JOIN models.Q(pickup__warehouse__organization__manager_id=manager_id) # LEFT JOIN, LEFT JOIN, LEFT JOIN | models.Q(bus__depot__city__manager_id=manager_id) # LEFT JOIN, LEFT JOIN | models.Q(wagon__garage__manager_id=manager_id), ) def _clean_manager_id(self, manager: typing.Union[Manager, int]) -> int: if isinstance(manager, Manager): return manager.id elif isinstance(manager, int): return manager else: raise ValueError class Airbag(models.Model): serial_number = models.CharField(max_length=255) state = models.IntegerField() vehicle_content_type = models.ForeignKey( 'contenttypes.ContentType', on_delete=models.CASCADE, ) vehicle_id = models.IntegerField(db_index=True) vehicle = GenericForeignKey('vehicle_content_type', 'vehicle_id') objects = AirbagManager() class BaseVehicle(models.Model): production_date = models.DateField() airbags = GenericRelation( Airbag, object_id_field='vehicle_id', content_type_field='vehicle_content_type', related_query_name='%(class)s', ) class Meta: abstract = True class Pickup(BaseVehicle): carrying_kg = models.FloatField() warehouse = models.ForeignKey(Warehouse, on_delete=models.CASCADE) class Bus(BaseVehicle): floors_number = models.IntegerField() depot = models.ForeignKey(Depot, on_delete=models.CASCADE) class Wagon(BaseVehicle): garage = models.ForeignKey(Garage, on_delete=models.CASCADE) Airbag.objects.filter_by_manager__union_and_inner_join(15).filter(state__gt=2) # Expected SQL """ SELECT * FROM ( ( SELECT "airbag"."id", "airbag"."serial_number", "airbag"."state", "airbag"."vehicle_content_type_id", "airbag"."vehicle_id" FROM "airbag" INNER JOIN "pickup" ON ("airbag"."vehicle_id" = "pickup"."id" AND ("airbag"."vehicle_content_type_id" = 46)) INNER JOIN "warehouse" ON ("pickup"."warehouse_id" = "warehouse"."id") INNER JOIN "organization" ON ("warehouse"."organization_id" = "organization"."id") WHERE "organization"."manager_id" = 15 ) UNION ( SELECT "airbag"."id", "airbag"."serial_number", "airbag"."state", "airbag"."vehicle_content_type_id", "airbag"."vehicle_id" FROM "airbag" INNER JOIN "bus" ON ("airbag"."vehicle_id" = "bus"."id" AND ("airbag"."vehicle_content_type_id" = 49)) INNER JOIN "depot" ON ("bus"."depot_id" = "depot"."id") INNER JOIN "city" ON ("depot"."city_id" = "city"."id") WHERE "city"."manager_id" = 15 ) UNION ( SELECT "airbag"."id", "airbag"."serial_number", "airbag"."state", "airbag"."vehicle_content_type_id", "airbag"."vehicle_id" FROM "airbag" INNER JOIN "wagon" ON ("airbag"."vehicle_id" = "wagon"."id" AND ("airbag"."vehicle_content_type_id" = 43)) INNER JOIN "garage" ON ("wagon"."garage_id" = "garage"."id") WHERE "garage"."manager_id" = 15 ) ) AS U WHERE U.state > 2; """ # VS Airbag.objects.filter_by_manager__left_join(15).filter(state__gt=2) # Real SQL """ SELECT "airbag"."id", "airbag"."serial_number", "airbag"."state", "airbag"."vehicle_content_type_id", "airbag"."vehicle_id" FROM "airbag" LEFT OUTER JOIN "pickup" ON ("airbag"."vehicle_id" = "pickup"."id" AND ("airbag"."vehicle_content_type_id" = 46)) LEFT OUTER JOIN "warehouse" ON ("pickup"."warehouse_id" = "warehouse"."id") LEFT OUTER JOIN "organization" ON ("warehouse"."organization_id" = "organization"."id") LEFT OUTER JOIN "bus" ON ("airbag"."vehicle_id" = "bus"."id" AND ("airbag"."vehicle_content_type_id" = 49)) LEFT OUTER JOIN "depot" ON ("bus"."depot_id" = "depot"."id") LEFT OUTER JOIN "city" ON ("depot"."city_id" = "city"."id") LEFT OUTER JOIN "wagon" ON ("airbag"."vehicle_id" = "wagon"."id" AND ("airbag"."vehicle_content_type_id" = 43)) LEFT OUTER JOIN "garage" ON ("wagon"."garage_id" = "garage"."id") WHERE ( ( "organization"."manager_id" = 15 OR "city"."manager_id" = 15 OR "garage"."manager_id" = 15 ) AND "airbag"."state" > 2 ); """
Select from a test database containing 30 million airbags with using UNION and INNER JOIN takes 1.1 seconds, and using LEFT JOIN - 10.0 seconds (as the database grows, the access time will increase). PostgreSQL 9.6 (default Docker image).
Change History (15)
comment:1 by , 7 years ago
Description: | modified (diff) |
---|
comment:2 by , 7 years ago
Cc: | added |
---|---|
Summary: | Add `filter`, `exclude` and other base `QuerySet` methods to combined `QuerySet` → Add filter(), exclude(), and other base QuerySet methods to combined QuerySets (union(), etc.) |
comment:3 by , 7 years ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Version: | 1.11 → master |
I'd like to see it, but it will be a much much more invasive change and is not something I will have the time nor motivation for in the near future.
comment:4 by , 6 years ago
Until (if) this is implemented, I suggest adding a check when .get(), .filter() etc. are called on such a QS. Currently the error messages (if there are any) do not explain the issue, the only way to find out about this is in the documentation of the QS API, section union().
Example:
qs1 = User.objects.all() qs2 = User.objects.filter(username="nonexistant") qs3 = qs1.difference(qs2) print(qs1) print(qs2) print(qs3) print(qs1.get(id=1)) print(qs3.get(id=1))
This fails with (could be any number instead of 4, of course)
MultipleObjectsReturned: get() returned more than one User -- it returned 4!
Even worse, if the difference of qs1 and qs2 only contains one element, this element is returned when calling qs3.get(id=1) without any error message, even though the objects ID is not 1.
comment:5 by , 6 years ago
Cc: | added |
---|
comment:6 by , 4 years ago
Cc: | added |
---|
comment:7 by , 4 years ago
I closed #32032 as a duplicate -- the case of trying to use aggregate()
after union()
. It should be enough to take into account combined queries in Query.rewrite_cols()
.
comment:8 by , 4 years ago
Cc: | added |
---|
comment:9 by , 4 years ago
Cc: | added |
---|
comment:10 by , 3 years ago
I closed #30209 as a duplicate -- the case of trying to use annotate()
after union()
.
comment:11 by , 12 months ago
Cc: | added |
---|
comment:12 by , 8 months ago
Cc: | added |
---|
comment:13 by , 8 months ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Florian, what do you think?