Opened 3 years ago
Closed 3 years ago
#33751 closed Uncategorized (duplicate)
Multiple Conditions on LEFT JOIN - FilteredRelations not working?
| Reported by: | Thorben Luepkes | Owned by: | nobody |
|---|---|---|---|
| Component: | Utilities | Version: | 4.0 |
| Severity: | Normal | Keywords: | orm, django, join |
| Cc: | Triage Stage: | Unreviewed | |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description (last modified by )
I think I am running into a similar problem here, where I would like to append multiple conditions onto my join:
I will try to be precise with this as much as possible.
Imagine these two models. whose relation was set up years ago:
class Event(models.Model):
instance_created_date = models.DateTimeField(auto_now_add=True)
car = models.ForeignKey(Car, on_delete=models.CASCADE, related_name="car_events")
...
a lot of normal text fields here, but they dont matter for this problem.
and
class Car(models.Model):
a lot of text fields here, but they dont matter for this problem.
hide_from_company_search = models.BooleanField(default=False)
images = models.ManyToManyField(Image, through=CarImage)
Lets say I want to query the amount of events for a given car:
def get_car_events_qs() -> QuerySet:
six_days_ago = (timezone.now().replace(hour=0, minute=0, second=0, microsecond=0) - timedelta(days=6))
cars = Car.objects.prefetch_related(
'car_events',
).filter(
some_conditions_on_fields=False,
).annotate(
num_car_events=Count(
'car_events',
filter=Q(car_events__instance_created_date__gt=six_days_ago), distinct=True)
)
return cars
The really tricky part for this is the performance of the query: Cars has 450.000 entries, and Events has 156.850.048. All fields that I am using to query are indexed. The query takes around 4 minutes to complete, depending on the db load. It took 18 minutes before adding the indicies.
This above ORM query will result in the following sql:
SELECT
"core_car"."id",
COUNT("analytics_carevent"."id") FILTER (WHERE ("analytics_carevent"."event" = 'view'
AND "analytics_carevent"."instance_created_date" >= '2022-05-10T07:45:16.672279+00:00'::timestamptz
AND "analytics_carevent"."instance_created_date" < '2022-05-11T07:45:16.672284+00:00'::timestamptz)) AS "num_cars_view",
LEFT OUTER JOIN "analytics_carevent" ON ("core_car"."id" = "analytics_carevent"."car_id")
WHERE
... some conditions that dont matter
GROUP BY
"core_car"."id"
I somehow suspect this FILTER to be a problem.
I tried with
.annotate(num_car_events=Count('car_events'))
and moving the car_events__instance_created_date__gt=six_days_ago into the filter:
.filter(some_conditions_on_fields=False, car_events__instance_created_date__gt=six_days_ago)
But of course this would filter out Cars with no Events, which is not what we want - but it is super fast!
I fiddled a bit with it in raw sql and came to his nice working example, that I now would like to write into ORM, since we dont really want to use rawsql. This query takes 2.2s, which is in our acceptable boundary, but faaaaar less than the 18minutes.
SELECT
"core_car"."id",
COUNT(DISTINCT "analytics_carevent"."id") AS "num_cars_view",
FROM
"core_car"
LEFT JOIN "analytics_carevent" ON ("core_car"."id" = "analytics_carevent"."car_id" AND "analytics_carevent"."event" = 'view' AND "analytics_carevent"."instance_created_date" > '2022-05-14T00:00:00+02:00'::timestamptz
AND "analytics_carevent"."instance_created_date" <= '2022-05-15T00:00:00+02:00'::timestamptz)
WHERE (some conditions that dont matter)
GROUP BY "core_car"."id";
My question now is:
How can I make the above query into the ORM?
I need to put the "filter" or conditions onto the left join. If I just use filter() it will just put it into the where clause, which is wrong.
I tried:
two_days_ago = (timezone.now().replace(hour=0, minute=0, second=0, microsecond=0) - timedelta(days=2))
cars = Car.objects.prefetch_related(
'car_events',
).filter(some_filters,)
and
cars = cars.annotate(events=FilteredRelation('car_events')).filter(car_events__car_id__in=cars.values_list("id", flat=True), car_events__instance_created_date__gt=six_days_ago)
But I dont think this is quite correct. I also need the count of the annotation.
I also tried with
cars = cars.annotate(events=FilteredRelation('car_events', condition=Q(car_events__car_id__in=ads.values_list("id", flat=True)))).filter(events__instance_created_date__gt=six_days_ago)
But this results in an inner join, which i dont think is wanted here
Using Django 4 and latest python release as of this writing. :)
Thanks a lot!
TLDR: Putting a filter or conditions on LEFT JOIN in django, instead of queryset.filter()
Change History (2)
comment:1 by , 3 years ago
| Description: | modified (diff) |
|---|
comment:2 by , 3 years ago
| Resolution: | → duplicate |
|---|---|
| Status: | new → closed |
Cross-posted from https://code.djangoproject.com/ticket/29262#comment:19 (Please don't do that.)