Opened 3 months ago

Last modified 7 weeks ago

#28519 new New feature

Add filter(), exclude(), and other base QuerySet methods to combined QuerySets (union(), etc.)

Reported by: Stanislav Karpov Owned by:
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: union, intersection, difference
Cc: Florian Apolloner Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Stanislav Karpov)

Hello.

I found a discussion about the filtering of combined QuerySet (union, intersection, difference): https://github.com/django/django/pull/7727#issuecomment-269283216
As I understood it, as a result of the discussion, it was suggested to filter the queries before union:

If I understood your code correctly (I still did not test it), you are still generating select * from (x union y) -- while this is more powerful it is also unnecessary in most cases. I'd also argue that it is less efficient, cause filtering on that result can (usually and probably should) be pushed down to the underlying queries (when possible) to limit the result set before joining them.

https://github.com/django/django/pull/7727#issuecomment-269461236

But such a decision blocks the possibility of re-use of QuerySet, which is very convenient when working with GenericForeignKey.

A similar example of the structure of a real project (only a structure, not an entities).

import typing

from django.db import models
from django.contrib.contenttypes.fields import GenericForeignKey, GenericRelation


class Manager(models.Model):
    first_name = models.CharField(max_length=255)
    last_name = models.CharField(max_length=255)


class Organization(models.Model):
    # pickup
    title = models.CharField(max_length=255)
    foundation_date = models.DateField()
    manager = models.ForeignKey(Manager, on_delete=models.CASCADE)


class Warehouse(models.Model):
    # pickup
    organization = models.ForeignKey(Organization, on_delete=models.CASCADE)
    address = models.TextField()


class City(models.Model):
    # bus
    title = models.CharField(max_length=255)
    code = models.CharField(max_length=10)
    manager = models.ForeignKey(Manager, on_delete=models.CASCADE)


class Depot(models.Model):
    # bus
    title = models.CharField(max_length=255)
    city = models.ForeignKey(City, on_delete=models.CASCADE)


class Garage(models.Model):
    # wagon
    address = models.TextField()
    manager = models.ForeignKey(Manager, on_delete=models.CASCADE)


class AirbagManager(models.Manager):
    def filter_by_manager__union_and_inner_join(
        self, manager: typing.Union[Manager, int],
    ) -> models.QuerySet:
        # NOT WORKING (only proposal)

        manager_id = self._clean_manager_id(manager)
        return (
            self
            # If I'm not mistaken, now we can't direct call `union` from `Manager`,
            # but this variant increases the readability
            .union(
                # INNER JOIN, INNER JOIN, INNER JOIN
                self.filter(pickup__warehouse__organization__manager_id=manager_id),
                # INNER JOIN, INNER JOIN, INNER JOIN
                self.filter(bus__depot__city__manager_id=manager_id),
                # INNER JOIN, INNER JOIN
                self.filter(wagon__garage__manager_id=manager_id),
            )
        )

    def filter_by_manager__left_join(
        self, manager: typing.Union[Manager, int],
    ) -> models.QuerySet:
        manager_id = self._clean_manager_id(manager)
        return self.filter(
            # LEFT JOIN, LEFT JOIN, LEFT JOIN
            models.Q(pickup__warehouse__organization__manager_id=manager_id)
            # LEFT JOIN, LEFT JOIN, LEFT JOIN
            | models.Q(bus__depot__city__manager_id=manager_id)
            # LEFT JOIN, LEFT JOIN
            | models.Q(wagon__garage__manager_id=manager_id),
        )

    def _clean_manager_id(self, manager: typing.Union[Manager, int]) -> int:
        if isinstance(manager, Manager):
            return manager.id
        elif isinstance(manager, int):
            return manager
        else:
            raise ValueError


class Airbag(models.Model):
    serial_number = models.CharField(max_length=255)
    state = models.IntegerField()

    vehicle_content_type = models.ForeignKey(
        'contenttypes.ContentType',
        on_delete=models.CASCADE,
    )
    vehicle_id = models.IntegerField(db_index=True)
    vehicle = GenericForeignKey('vehicle_content_type', 'vehicle_id')

    objects = AirbagManager()


class BaseVehicle(models.Model):
    production_date = models.DateField()
    airbags = GenericRelation(
        Airbag,
        object_id_field='vehicle_id',
        content_type_field='vehicle_content_type',
        related_query_name='%(class)s',
    )

    class Meta:
        abstract = True


class Pickup(BaseVehicle):
    carrying_kg = models.FloatField()
    warehouse = models.ForeignKey(Warehouse, on_delete=models.CASCADE)


class Bus(BaseVehicle):
    floors_number = models.IntegerField()
    depot = models.ForeignKey(Depot, on_delete=models.CASCADE)


class Wagon(BaseVehicle):
    garage = models.ForeignKey(Garage, on_delete=models.CASCADE)


Airbag.objects.filter_by_manager__union_and_inner_join(15).filter(state__gt=2)
# Expected SQL
"""
SELECT *
FROM (
  (
    SELECT
      "airbag"."id",
      "airbag"."serial_number",
      "airbag"."state",
      "airbag"."vehicle_content_type_id",
      "airbag"."vehicle_id"
    FROM "airbag"
      INNER JOIN "pickup" ON ("airbag"."vehicle_id" = "pickup"."id" AND ("airbag"."vehicle_content_type_id" = 46))
      INNER JOIN "warehouse" ON ("pickup"."warehouse_id" = "warehouse"."id")
      INNER JOIN "organization" ON ("warehouse"."organization_id" = "organization"."id")
    WHERE "organization"."manager_id" = 15
  )

  UNION

  (
    SELECT
      "airbag"."id",
      "airbag"."serial_number",
      "airbag"."state",
      "airbag"."vehicle_content_type_id",
      "airbag"."vehicle_id"
    FROM "airbag"
      INNER JOIN "bus" ON ("airbag"."vehicle_id" = "bus"."id" AND ("airbag"."vehicle_content_type_id" = 49))
      INNER JOIN "depot" ON ("bus"."depot_id" = "depot"."id")
      INNER JOIN "city" ON ("depot"."city_id" = "city"."id")
    WHERE "city"."manager_id" = 15
  )

  UNION
  (
    SELECT
      "airbag"."id",
      "airbag"."serial_number",
      "airbag"."state",
      "airbag"."vehicle_content_type_id",
      "airbag"."vehicle_id"
    FROM "airbag"
      INNER JOIN "wagon" ON ("airbag"."vehicle_id" = "wagon"."id" AND ("airbag"."vehicle_content_type_id" = 43))
      INNER JOIN "garage" ON ("wagon"."garage_id" = "garage"."id")
    WHERE "garage"."manager_id" = 15
  )
) AS U
WHERE U.state > 2;
"""

# VS

Airbag.objects.filter_by_manager__left_join(15).filter(state__gt=2)
# Real SQL
"""
SELECT
  "airbag"."id",
  "airbag"."serial_number",
  "airbag"."state",
  "airbag"."vehicle_content_type_id",
  "airbag"."vehicle_id"
FROM "airbag"
  LEFT OUTER JOIN "pickup" ON ("airbag"."vehicle_id" = "pickup"."id" AND ("airbag"."vehicle_content_type_id" = 46))
  LEFT OUTER JOIN "warehouse" ON ("pickup"."warehouse_id" = "warehouse"."id")
  LEFT OUTER JOIN "organization" ON ("warehouse"."organization_id" = "organization"."id")

  LEFT OUTER JOIN "bus" ON ("airbag"."vehicle_id" = "bus"."id" AND ("airbag"."vehicle_content_type_id" = 49))
  LEFT OUTER JOIN "depot" ON ("bus"."depot_id" = "depot"."id")
  LEFT OUTER JOIN "city" ON ("depot"."city_id" = "city"."id")

  LEFT OUTER JOIN "wagon" ON ("airbag"."vehicle_id" = "wagon"."id" AND ("airbag"."vehicle_content_type_id" = 43))
  LEFT OUTER JOIN "garage" ON ("wagon"."garage_id" = "garage"."id")
WHERE (
  (
      "organization"."manager_id" = 15
      OR "city"."manager_id" = 15
      OR "garage"."manager_id" = 15
  )
  AND "airbag"."state" > 2
);
"""

Select from a test database containing 30 million airbags with using UNION and INNER JOIN takes 1.1 seconds, and using LEFT JOIN - 10.0 seconds (as the database grows, the access time will increase). PostgreSQL 9.6 (default Docker image).

Change History (5)

comment:1 Changed 3 months ago by Stanislav Karpov

Description: modified (diff)

comment:2 Changed 3 months ago by Tim Graham

Cc: Florian Apolloner added
Summary: Add `filter`, `exclude` and other base `QuerySet` methods to combined `QuerySet`Add filter(), exclude(), and other base QuerySet methods to combined QuerySets (union(), etc.)

Florian, what do you think?

comment:3 Changed 3 months ago by Florian Apolloner

Triage Stage: UnreviewedAccepted
Version: 1.11master

I'd like to see it, but it will be a much much more invasive change and is not something I will have the time nor motivation for in the near future.

comment:4 Changed 7 weeks ago by messfish

Owner: changed from nobody to messfish
Status: newassigned

comment:5 Changed 7 weeks ago by messfish

Owner: messfish deleted
Status: assignednew
Note: See TracTickets for help on using tickets.
Back to Top