Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#30079 closed Cleanup/optimization (wontfix)

Prefetch cache should be aware of database source and .using() should not always trigger a new query

Reported by: Mike Lissner Owned by: nobody
Component: Documentation Version: dev
Severity: Normal Keywords: prefetch, multidatabase
Cc: Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I've been poking around the darker edges of multidatabase prefetching and I discovered a strange pattern. When you have cached prefetch values, they're not aware of which database they came from:

# This runs zero queries
In [62]: pizzas = Pizza.objects.filter(pk=17).prefetch_related('toppings').using('replica')

# This runs two, as expected
In [63]: pizza0 = pizzas[0]

# This runs zero, even though it normally would come from "default" and the cache came from "replica"
In [64]: pizza0.toppings.all()
Out[64]: [<Topping 1: Cheese>, <Topping 2: Bacon>]

# This runs one query, even though this data should already be populated.
In [65]: pizza0.toppings.all().using('replica')
Out[65]: [<Topping 1: Cheese>, <Topping 2: Bacon>]

I was pretty surprised and confused that adding the using() method on the last line above busted the cache even when the database selected was the same as the one used to pre-populate the cache.

I was also surprised (though less so) at the opposite, that *not* including the non-default DB (on line 64) *was* able to hit the cache (which was populated by "replica") instead of hitting the default database (as the query would normally do).

On IRC chatting about this, the question was: "Well, should filter hit the DB or use the cache? What about exclude and other methods? Where do you draw the line?" I think the simple line to draw is whether something changes the SQL query. using() methods don't change SQL, so they should work properly on a cached result. filter() and exclude() do change the SQL, so they *shouldn't* use the cache.

Attachments (1)

ticket_30079.patch (1.0 KB ) - added by Carlton Gibson 5 years ago.
Adjustment of admonition wording.

Download all attachments as: .zip

Change History (5)

comment:1 by Carlton Gibson, 5 years ago

Type: BugCleanup/optimization
Version: 2.1master

Hi Mike. Thanks for the report.

On the basis of the difference between ln64 and ln65 I'm inclined to accept this. pizza0 came from replica and the ln64 call just goes with that. (That seems right/expected to me...) I see your point that adding the using() shouldn't change that.

At first glance the "if you didn't add a filter()/exclude()" idea seems OK too.

Two questions though:

  1. What's the use-case for re-adding the using() call in ln65? I can't quite see why you'd do that...? (Another way of looking at using() might be "go back to this DB"...)
  2. Have you looked at all at what a patch might look like? (If it's simple enough, that helps...)

Thanks.

Last edited 5 years ago by Carlton Gibson (previous) (diff)

in reply to:  1 comment:2 by Mike Lissner, 5 years ago

Replying to Carlton Gibson:

On the basis of the difference between ln64 and ln65 I'm inclined to accept this. pizza0 came from replica and the ln64 call just goes with that. (That seems right/expected to me...) I see your point that adding the using() shouldn't change that.

Good! I made myself clear. Always a good thing.

At first glance the "if you didn't add a filter()/exclude()" idea seems OK too.

That was my analysis too: "This seems to be a reasonable line in the sand."

Two questions though:

  1. What's the use-case for re-adding the using() call in ln65? I can't quite see why you'd do that...? (Another way of looking at using() might be "go back to this DB"...)

I can't make a really clear argument for my use case, but basically it was, "I'm doing weird things with prefetching and I want to be sure Django pulls this data properly from the right DB/cache." In other words, it was me trying to be explicit about where the caches were *supposed* come from, and then getting bit by having the caches busted. From my reading, ln64 is totally unclear which DB it would use since it would normally use the "default" DB, but had a cached value, so which wins? My solution was to be more explicit to make sure it used the cache, but that backfired (luckily, I was watching the query logs).

  1. Have you looked at all at what a patch might look like? (If it's simple enough, that helps...)

No, not at all, unfortunately. I've looked into the Django code from time to time, but mostly I stay on my side of the API, lest I get in trouble. Especially when it comes to the ORM bits.

by Carlton Gibson, 5 years ago

Attachment: ticket_30079.patch added

Adjustment of admonition wording.

comment:3 by Carlton Gibson, 5 years ago

Component: Database layer (models, ORM)Documentation
Has patch: set
Resolution: wontfix
Status: newclosed

OK, at best, this is a cleanup on the documentation. The admonition note in the `prefech_related` docs already covers this:

Remember that, as always with QuerySets, any subsequent chained methods which imply a different database query will ignore previously cached results, and retrieve data using a fresh database query.

using() is a subsequent chained method. It returns a new QuerySet, and the results and prefetch caches are not transferred.

Arguably, using(), whilst returning a new QuerySet, doesn't actually imply a different query.

As such I drafted the attached patch.

... as always with QuerySets, any subsequent chained methods which return a new QuerySet, normally implying a different database query ...

In my opinion, the patch unnecessarily complicates the meaning. We know from our use of QuerySets that all the Methods that return new QuerySets behave this way, even if we might forget it occasionally. Thus wontfix.

Thanks for the report Mike!

comment:4 by Mike Lissner, 5 years ago

Whew, that's pretty subtle in both the old and the new versions of the documentation. I have to confess that I don't think of chaining as the trigger for a new queryset. I think about the query as a new trigger for a queryset. I guess I never read this part of the documentation or didn't take it to heart.

I don't know, I don't think we can document our way out of this behavior, but I have no idea how difficult this is to fix technically.

Note: See TracTickets for help on using tickets.
Back to Top