Opened 3 weeks ago

Closed 3 weeks ago

Last modified 2 weeks ago

#37027 closed Bug (wontfix)

refresh_from_db() with from_queryset + prefetch does not persist result in instance

Reported by: Hugo Maingonnat Owned by:
Component: Database layer (models, ORM) Version: 5.2
Severity: Normal Keywords: refresh_from_db, prefetch
Cc: MANAS MADESHIYA Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

foo = Foo.objects.create(**data)
# The prefetch query is executed, but does not seem to be persisted into the instance
prefetch = Prefetch('bar_set', queryset=Bar.objects.all())
foo.refresh_from_db(from_queryset=Foo.objects.prefetch_related(
   prefetch
))
# This is empty
print('CACHE:', foo._prefetched_objects_cache)
# Lazy extra query is executed because prefetch was not persisted: this should not happen
list(foo.bar_set.all())

This also does not work if for example to_attr='bar_list' is set in the prefetch, then foo.bar_list is not set.

Note that using select_related() is working.

Change History (4)

comment:1 by Simon Charette, 3 weeks ago

There's is little value in using prefetch_related when dealing with a single object as this method is meant to be used to prevent N+1 query issues but in this case N=1 so the number of queries will be the same. In other words, performing foo.bar_set.all() will issue the same query as what a prefetching would have done.

When we added refresh_from_db(from_queryset) support (#28344) we knew it would open the door to a large surface area between Model and QuerySet APIs. Given the original intent was to support for_update and friends I'm not convinced we should add support for prefetch_related and certainly not for Prefetch(to_attr) as it doesn't follow the refresh_from_db(fields) convention.

Support for select_related came for free as it's a normal attribute lookup on the retrieved model instance and there are real benefits to using it (reducing the number of queries). In the case of prefetch_related it requires extra retrieval and invalidation logic of many-to-many relationships and results in the same number of queries. I believe accepting this work would open the door to supporting the full Model X QuerySet intersection which would include cases like annotate members being ignored as well.

Last edited 3 weeks ago by Simon Charette (previous) (diff)

comment:2 by MANAS MADESHIYA, 3 weeks ago

Cc: MANAS MADESHIYA added

comment:3 by Jacob Walls, 3 weeks ago

Resolution: wontfix
Status: newclosed

Simon's doubts are convincing. If helpful, I'll add that I had a need for this in a project (I had a tree of objects shaped via custom Prefetch(to_attr=...) that I needed to be able to refresh), and I found an involved workaround. Thanks to your ticket, I had the idea to retry after #28455 (Django 6.1), and there's now a much simpler workaround.

Just prevent the from_queryset from being cloned, and then the prefetch result is still accessible to you, and in your override of Model.refresh_from_db you can put it where you expect it.

In [1]: admin = User.objects.get(username='admin')

In [2]: reset_queries()

In [3]: from_queryset = User.objects.prefetch_related("groups")

In [4]: with from_queryset._avoid_cloning():
   ...:     admin.refresh_from_db(from_queryset=from_queryset)
   ...: 

In [5]: from_queryset[0]._prefetched_objects_cache
Out[5]: {'groups': <QuerySet [<Group: Graph Editor>, <Group: Resource Editor>, <Group: RDM Administrator>, <Group: Application Administrator>, <Group: System Administrator>, <Group: Mobile Project Administrator>, <Group: Crowdsource Editor>, <Group: Guest>, <Group: Resource Reviewer>, <Group: Resource Exporter>]>}

In [6]: connection.queries
Out[6]: 
[{'sql': 'SELECT "auth_user"."id", "auth_user"."password", "auth_user"."last_login", "auth_user"."is_superuser", "auth_user"."username", "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", "auth_user"."is_staff", "auth_user"."is_active", "auth_user"."date_joined" FROM "auth_user" WHERE "auth_user"."id" = 1 LIMIT 21',
  'time': '0.005'},
 {'sql': 'SELECT ("auth_user_groups"."user_id") AS "_prefetch_related_val_user_id", "auth_group"."id", "auth_group"."name" FROM "auth_group" INNER JOIN "auth_user_groups" ON ("auth_group"."id" = "auth_user_groups"."group_id") WHERE "auth_user_groups"."user_id" IN (1)',
  'time': '0.004'}]

comment:4 by Hugo Maingonnat, 2 weeks ago

Thank you both for your answers ! If I may give a bit more context. I understand this is only for single instances, but end users are always happy if CRUD endpoints are faster (especially if called multiple times, with multiple nested objects).

  • Using foo.bar_set.all() will not necessarily fill up nested objects (and to_attr), and each level will trigger lazy queries.
  • My instance has initially been created with both select_related and prefetch_related via a queryset in a drf view. From a user point of view, I can do like below but the API feels a bit inconsistent. Especially users would not expect the current silent behavior if they try to set a prefetch in the from_queryset and see that it is not working.
    foo.refresh_from_db(from_queryset=Foo.objects.select_related('baz'))
    prefetch = Prefetch('bar_set', queryset=Bar.objects.all())
    prefetch_related_objects([foo], prefetch)
    
  • Thank you @Jacob for the workaround. I am not entirely sure to understand why using _avoid_cloning() make it works. I feel if it works with _avoid_cloning(), it could just work with the clone API ?

Another suggestion would be to have something simpler refresh_from_db(select_related=True, prefetch_related=True) (or another method name like deep_refresh_from_db()), that just refresh all relations that the initial object had. But I am not sure if it still has a reference to the initial queryset, to autonomously re-execute the query.

Version 1, edited 2 weeks ago by Hugo Maingonnat (previous) (next) (diff)
Note: See TracTickets for help on using tickets.
Back to Top