Opened 9 months ago
Last modified 9 months ago
#36157 closed Bug
Unusual behaviour when pre-fetching with only applied on the related fields — at Initial Version
| Reported by: | Tim McCurrach | Owned by: | |
|---|---|---|---|
| Component: | Uncategorized | Version: | 5.1 |
| Severity: | Normal | Keywords: | |
| Cc: | Tim McCurrach | Triage Stage: | Unreviewed |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
When prefetching related models. If you apply .only() to the related queryset, django performs additional lookups for related IDs that have been left out of the only.
It is probably easiest to explain the issue with an example. Suppose you have these models:
class Blog(models.Model):
name = models.CharField(max_length=100)
class Post(models.Model):
name = models.CharField(max_length=100)
blog = models.ForeignKey(Blog, on_delete=models.CASCADE, related_name="posts")
...lots of other big fields
And you create a few items for each:
blog = Blog.objects.create(name="Django Tricks") blog2 = Blog.objects.create(name="React Tricks") Post.objects.create(name="prefetching", blog=blog) Post.objects.create(name="models", blog=blog) Post.objects.create(name="templates", blog=blog) Post.objects.create(name="hooks", blog=blog2) Post.objects.create(name="components", blog=blog2)
If I wish to pre-fetch the posts for some blogs, but only want the names of each post, rather than the content of each post I can do the following:
Blog.objects.prefetch_related(Prefetch("posts", queryset=Post.objects.only("name")))
I would expect this to result in just 2 database queries. One to fetch the data for the Blog instances, and another to fetch the the data for the related Posts. Instead, there is an n+1 issue where there are 5 extra follow up requests for each of the related Post instances. This is the SQL that is generated:
SELECT "app_blog"."id", "app_blog"."name" FROM "app_blog" LIMIT 21 SELECT "app_post"."id", "app_post"."name" FROM "app_post" WHERE "app_post"."blog_id" IN (1, 2) SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE "app_post"."id" = 1 LIMIT 21 SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE "app_post"."id" = 2 LIMIT 21 SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE "app_post"."id" = 3 LIMIT 21 SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE "app_post"."id" = 4 LIMIT 21 SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE "app_post"."id" = 5 LIMIT 21
I can understand it might be a good idea to have the related-id's for the blog on hand should you need them later. But I also think, that by using .only() you are explicitly telling django - I don't need these. This is a real problem for larger data-sets, where you end up with thousands of extra round-trips to the database.