Opened 4 months ago

Last modified 4 months ago

#28939 new Cleanup/optimization

Document which ORM methods provide an instance hint to database routers

Reported by: Nick Pope Owned by: nobody
Component: Documentation Version: master
Severity: Normal Keywords: prefetch, prefetch_related, using, connection
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Nick Pope)

Update: I should have mentioned that I was using a custom database router. I now know this issue occurred due to a bug in that router.


I've run into an case with prefetching where the connection used for the prefetch queries is not the one I would expect:

# When using the default database connection:

In [1]: from django.db import connections

In [2]: connections['default'].queries
Out[2]: []

In [3]: connections['readonly'].queries
Out[3]: []

In [4]: User.objects.prefetch_related('operators').first()
Out[4]: <User: example>

In [5]: connections['default'].queries
Out[5]:
[{'sql': 'SELECT ... FROM "auth_user" ORDER BY "auth_user"."id" ASC LIMIT 1', 'time': '0.205'},
 {'sql': 'SELECT ... AS "_prefetch_related_val_user_id", ... FROM "operator" INNER JOIN "operator_users" ON ("operator"."id" = "operator_users"."operator_id") WHERE "operator_users"."user_id" IN (1) ORDER BY "operator"."code" ASC', 'time': '0.011'}]

In [6]: connections['readonly'].queries
Out[6]: []

# When specifying the database connection to use:

In [1]: from django.db import connections

In [2]: connections['default'].queries
Out[2]: []

In [3]: connections['readonly'].queries
Out[3]: []

In [4]: User.objects.using('readonly').prefetch_related('operators').first()
Out[4]: <User: example>

In [5]: connections['default'].queries
Out[5]:
[{'sql': 'SELECT ... AS "_prefetch_related_val_user_id", ... FROM "operator" INNER JOIN "operator_users" ON ("operator"."id" = "operator_users"."operator_id") WHERE "operator_users"."user_id" IN (1) ORDER BY "operator"."code" ASC', 'time': '0.010'}]

In [6]: connections['readonly'].queries
Out[6]:
[{'sql': 'SELECT ... FROM "auth_user" ORDER BY "auth_user"."id" ASC LIMIT 1', 'time': '0.002'}]

In the second case I would have expected all queries to be executed on the readonly connection by default.

This can be achieved by using Prefetch('operators', queryset=Operator.objects.using('readonly')), but it means that plain strings cannot be used and often a lot of nested Prefetch() objects can be required.

A solution to this could be to do the following:

  1. Use the connection from the original QuerySet that called .prefetch_related() by default.
  2. Possibly add QuerySet.prefetch_related(using=...) to allow overriding for all prefetch queries. (Doesn't fit nicely with the API.)
  3. Possibly add Prefetch(using=...) to allow overriding for a specific branch of prefetching. (This would propagate down.)

Change History (3)

comment:1 Changed 4 months ago by Simon Charette

Triage Stage: UnreviewedAccepted

I think 1. would make the most sense here unfortunately I'm afraid this might break backward compatibility with regards to database router.

For example, given the following scenario

class Foo(models.Model):
    pass

class Bar(models.Model):
    foo = models.ForeignKey(Foo, related_name='bars')

class Router(object):
    def db_for_read(self, model):
        if model is Bar:
            return 'other'
        return 'default'

It would change the behaviour of Foo.objects.using('default').prefetch_related('bars') because using() has usually precedence over the routers suggestion. We could special case prefetches to only default to the base queryset's using if no router provide a db_for_read suggestion but that would be counter intuitive IMO and add even more complexity to the read database alias selection logic.

So, I think that 1. is favourable but would require a deprecation cycle for the case where both a base query alias is forced though using and that db_for_read suggests a database. In the mean time I think that 3. is easier to implement and could be useful even in a future where 1. is fixed.

Last edited 4 months ago by Simon Charette (previous) (diff)

comment:2 Changed 4 months ago by Nick Pope

Component: Database layer (models, ORM)Documentation
Description: modified (diff)
Type: BugCleanup/optimization

Hi Simon,

Sorry - false alarm.

I did some digging and found tests.prefetch_related.tests.MultiDbTests which seems to show my scenario working correctly.

I've subsequently realised that I wasn't handling instance._state.db correctly in db_for_read() in my router.

It may be worth improving the documentation on database routers and/or prefetch_related() warning of this pitfall.

I have left this ticket open, but feel free to close it if you don't think the documentation needs improving.

Here is (an example of) my fixed database router:

class Router(object):

    def __init__(self):
        self.mapping = {'special': 'other'}

    def db_for_read(self, model, **hints):
        db = self.mapping.get(model._meta.app_label, 'default')
        instance = hints.get('instance')
        return instance._state.db or db if instance else db

    def db_for_write(self, model, **hints):
        return self.mapping.get(model._meta.app_label, 'default')

    def allow_relation(self, obj1, obj2, **hints):
        db1 = self.mapping.get(obj1._meta.app_label, 'default')
        db2 = self.mapping.get(obj2._meta.app_label, 'default')
        return db1 == db2

    def allow_migrate(self, db, app_label, model_name=None, **hints):
        return db == self.mapping.get(app_label, 'default')

Beforehand, db_for_read() was the same as db_for_write().

Essentially there are two databases (default and other) which also both have read-only connections to a replica (default-ro and other-ro).
This is an application-routing router - everything goes to default unless it is from the special application which goes to other.
Typically we opt-in to using the read-only connections via QuerySet.using() and this is where things then broke due to not handling instance._state.db.

comment:3 Changed 4 months ago by Simon Charette

Summary: QuerySet used by prefetch_related() does not use expected connection.Document which ORM methods provide an instance hint to database routers

Hey Nick!

It may be worth improving the documentation on database routers and/or prefetch_related() warning of this pitfall.

I'm not sure how we could improve the current routers documentation given we link to hints in both db_for_read and db_for_write section but I guess the prefetch_related one and all the other database methods that provide an instance hint to routers could be improved.

I think there would also be benefit in eventually enhancing the routing context with additional hints such as operation, field, related_instance, cardinality and such. I've had instances in the past where being able to differentiate between heavy (e.g. prefetch_related()) and lightweight (e.g. refresh_from_db(), get()) reads would have been quite useful.

Note: See TracTickets for help on using tickets.
Back to Top