Opened 2 years ago

Closed 2 years ago

#33751 closed Uncategorized (duplicate)

Multiple Conditions on LEFT JOIN - FilteredRelations not working?

Reported by: Thorben Luepkes Owned by: nobody
Component: Utilities Version: 4.0
Severity: Normal Keywords: orm, django, join
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Thorben Luepkes)

I think I am running into a similar problem here, where I would like to append multiple conditions onto my join:
I will try to be precise with this as much as possible.
Imagine these two models. whose relation was set up years ago:

class Event(models.Model):
    instance_created_date = models.DateTimeField(auto_now_add=True)
    car = models.ForeignKey(Car, on_delete=models.CASCADE, related_name="car_events")
    ...
    a lot of normal text fields here, but they dont matter for this problem.

and

class Car(models.Model):
    a lot of text fields here, but they dont matter for this problem.
    hide_from_company_search = models.BooleanField(default=False)
    images = models.ManyToManyField(Image, through=CarImage)

Lets say I want to query the amount of events for a given car:

def get_car_events_qs() -> QuerySet:
    six_days_ago = (timezone.now().replace(hour=0, minute=0, second=0, microsecond=0) - timedelta(days=6))
    cars = Car.objects.prefetch_related(
        'car_events',
    ).filter(
        some_conditions_on_fields=False,
    ).annotate(
        num_car_events=Count(
            'car_events',
            filter=Q(car_events__instance_created_date__gt=six_days_ago), distinct=True)
    )

    return cars

The really tricky part for this is the performance of the query: Cars has 450.000 entries, and Events has 156.850.048. All fields that I am using to query are indexed. The query takes around 4 minutes to complete, depending on the db load. It took 18 minutes before adding the indicies.

This above ORM query will result in the following sql:

SELECT
  "core_car"."id",
  		COUNT("analytics_carevent"."id") FILTER (WHERE ("analytics_carevent"."event" = 'view'
			AND "analytics_carevent"."instance_created_date" >= '2022-05-10T07:45:16.672279+00:00'::timestamptz
			AND "analytics_carevent"."instance_created_date" < '2022-05-11T07:45:16.672284+00:00'::timestamptz)) AS "num_cars_view",
	LEFT OUTER JOIN "analytics_carevent" ON ("core_car"."id" = "analytics_carevent"."car_id")
WHERE 
 ... some conditions that dont matter
GROUP BY
	"core_car"."id"

I somehow suspect this FILTER to be a problem.
I tried with

.annotate(num_car_events=Count('car_events'))

and moving the car_events__instance_created_date__gt=six_days_ago into the filter:

.filter(some_conditions_on_fields=False, car_events__instance_created_date__gt=six_days_ago)

But of course this would filter out Cars with no Events, which is not what we want - but it is super fast!
I fiddled a bit with it in raw sql and came to his nice working example, that I now would like to write into ORM, since we dont really want to use rawsql. This query takes 2.2s, which is in our acceptable boundary, but faaaaar less than the 18minutes.

SELECT
	"core_car"."id",
	COUNT(DISTINCT "analytics_carevent"."id") AS "num_cars_view",
FROM
	"core_car"
	LEFT JOIN "analytics_carevent" ON ("core_car"."id" = "analytics_carevent"."car_id" AND "analytics_carevent"."event" = 'view' AND "analytics_carevent"."instance_created_date" > '2022-05-14T00:00:00+02:00'::timestamptz
		AND "analytics_carevent"."instance_created_date" <= '2022-05-15T00:00:00+02:00'::timestamptz)
	
WHERE (some conditions that dont matter)
GROUP BY "core_car"."id";

My question now is:
How can I make the above query into the ORM?
I need to put the "filter" or conditions onto the left join. If I just use filter() it will just put it into the where clause, which is wrong.
I tried:

 two_days_ago = (timezone.now().replace(hour=0, minute=0, second=0, microsecond=0) - timedelta(days=2))
    cars = Car.objects.prefetch_related(
        'car_events',
    ).filter(some_filters,)

and

cars = cars.annotate(events=FilteredRelation('car_events')).filter(car_events__car_id__in=cars.values_list("id", flat=True), car_events__instance_created_date__gt=six_days_ago)

But I dont think this is quite correct. I also need the count of the annotation.
I also tried with

cars = cars.annotate(events=FilteredRelation('car_events', condition=Q(car_events__car_id__in=ads.values_list("id", flat=True)))).filter(events__instance_created_date__gt=six_days_ago)

But this results in an inner join, which i dont think is wanted here

Using Django 4 and latest python release as of this writing. :)

Thanks a lot!

TLDR: Putting a filter or conditions on LEFT JOIN in django, instead of queryset.filter()

Change History (2)

comment:1 by Thorben Luepkes, 2 years ago

Description: modified (diff)

comment:2 by Carlton Gibson, 2 years ago

Resolution: duplicate
Status: newclosed

Cross-posted from https://code.djangoproject.com/ticket/29262#comment:19 (Please don't do that.)

Note: See TracTickets for help on using tickets.
Back to Top