Opened 2 years ago
Last modified 2 years ago
#33751 closed Uncategorized
Multiple Conditions on LEFT JOIN - FilteredRelations not working? — at Version 1
Reported by: | Thorben Luepkes | Owned by: | nobody |
---|---|---|---|
Component: | Utilities | Version: | 4.0 |
Severity: | Normal | Keywords: | orm, django, join |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
I think I am running into a similar problem here, where I would like to append multiple conditions
onto my join:
I will try to be precise with this as much as possible.
Imagine these two models. whose relation was set up years ago:
class Event(models.Model): instance_created_date = models.DateTimeField(auto_now_add=True) car = models.ForeignKey(Car, on_delete=models.CASCADE, related_name="car_events") ... a lot of normal text fields here, but they dont matter for this problem.
and
class Car(models.Model): a lot of text fields here, but they dont matter for this problem. hide_from_company_search = models.BooleanField(default=False) images = models.ManyToManyField(Image, through=CarImage)
Lets say I want to query the amount of events for a given car:
def get_car_events_qs() -> QuerySet: six_days_ago = (timezone.now().replace(hour=0, minute=0, second=0, microsecond=0) - timedelta(days=6)) cars = Car.objects.prefetch_related( 'car_events', ).filter( some_conditions_on_fields=False, ).annotate( num_car_events=Count( 'car_events', filter=Q(car_events__instance_created_date__gt=six_days_ago), distinct=True) ) return cars
The really tricky part for this is the performance of the query: Cars
has 450.000
entries, and Events
has 156.850.048
. All fields that I am using to query are indexed. The query takes around 4 minutes to complete, depending on the db load. It took 18 minutes before adding the indicies.
This above ORM query will result in the following sql:
SELECT "core_car"."id", COUNT("analytics_carevent"."id") FILTER (WHERE ("analytics_carevent"."event" = 'view' AND "analytics_carevent"."instance_created_date" >= '2022-05-10T07:45:16.672279+00:00'::timestamptz AND "analytics_carevent"."instance_created_date" < '2022-05-11T07:45:16.672284+00:00'::timestamptz)) AS "num_cars_view", LEFT OUTER JOIN "analytics_carevent" ON ("core_car"."id" = "analytics_carevent"."car_id") WHERE ... some conditions that dont matter GROUP BY "core_car"."id"
I somehow suspect this FILTER
to be a problem.
I tried with
.annotate(num_car_events=Count('car_events'))
and moving the car_events__instance_created_date__gt=six_days_ago
into the filter
:
.filter(some_conditions_on_fields=False, car_events__instance_created_date__gt=six_days_ago)
But of course this would filter out Cars with no Events, which is not what we want - but it is super fast!
I fiddled a bit with it in raw sql and came to his nice working example, that I now would like to write into ORM, since we dont really want to use rawsql. This query takes 2.2s
, which is in our acceptable boundary, but faaaaar less than the 18minutes.
SELECT "core_car"."id", COUNT(DISTINCT "analytics_carevent"."id") AS "num_cars_view", FROM "core_car" LEFT JOIN "analytics_carevent" ON ("core_car"."id" = "analytics_carevent"."car_id" AND "analytics_carevent"."event" = 'view' AND "analytics_carevent"."instance_created_date" > '2022-05-14T00:00:00+02:00'::timestamptz AND "analytics_carevent"."instance_created_date" <= '2022-05-15T00:00:00+02:00'::timestamptz) WHERE (some conditions that dont matter) GROUP BY "core_car"."id";
My question now is:
How can I make the above query into the ORM?
I need to put the "filter" or conditions onto the left join
. If I just use filter()
it will just put it into the where
clause, which is wrong.
I tried:
two_days_ago = (timezone.now().replace(hour=0, minute=0, second=0, microsecond=0) - timedelta(days=2)) cars = Car.objects.prefetch_related( 'car_events', ).filter(some_filters,)
and
cars = cars.annotate(events=FilteredRelation('car_events')).filter(car_events__car_id__in=cars.values_list("id", flat=True), car_events__instance_created_date__gt=six_days_ago)
But I dont think this is quite correct. I also need the count of the annotation.
I also tried with
cars = cars.annotate(events=FilteredRelation('car_events', condition=Q(car_events__car_id__in=ads.values_list("id", flat=True)))).filter(events__instance_created_date__gt=six_days_ago)
But this results in an inner join, which i dont think is wanted here
Using Django 4 and latest python release as of this writing. :)
Thanks a lot!
TLDR: Putting a filter or conditions on LEFT JOIN
in django, instead of queryset.filter()