Code

Opened 4 years ago

Closed 4 years ago

Last modified 22 months ago

#14195 closed Uncategorized (duplicate)

ContentType object fails to use cache, beats DB to death

Reported by: mark0978 Owned by: nobody
Component: Uncategorized Version: 1.2
Severity: Normal Keywords:
Cc: Kronuz Triage Stage: Someday/Maybe
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

using the content_type.model member causes a DB SELECT instead of using the cache of ContentTypes

This is the partial class definition

class Activity(models.Model):
    action_time = models.DateTimeField(_('action time'), auto_now=True)
    user = models.ForeignKey(User, related_name='activities')
    content_type = models.ForeignKey(ContentType, blank=True, null=True)
    ...

    def short_template(self):
        "Returns the name of the short template to display this item"
        return "activity/short/" + self.content_type.model + ".html"

this is the select statement

    activities = Activity.objects.filter(user=user).select_related('user')[:25]

    for act in activities:
        print act.short_template()

The above code will hit the DB once for every Activity object where you do self.content.model, in spite of the comments in the code about a cache, and the presence of the cache in the code. Seems like the content_type should be built from the cache when the object is fetched, or at least fetched from the cache instead of the database.

Attachments (0)

Change History (10)

comment:1 Changed 4 years ago by lukeplant

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

I think this is a case of just adding 'use_for_related_fields' to the manager (if not, I suspect we would have to close WONTFIX or INVALID, because it would be too hacky to fix, and the comments don't make the guarantee that the cache will be used in a related context). However, I'd like tests, but they wouldn't fit nicely into the current 'django/contrib/contenttypes/tests.py' because of the need for a related model (like your example) to test them, and because we want to avoid more doctests.

So, if you would like to contribute, I think the best path is:

  1. convert django/contrib/contenttypes/tests.py into unit tests (Paul McMillan's SOC project is not touching anything under contrib, AFAIK, so you won't be duplicating work).
  2. move the tests to a new contenttypes directory in tests/regressiontests. You'll also need a models.py for your tests to run.
  3. put your example model into tests/regressiontests/models.py
  4. add a new test for this bug. You will probably need to use settings.DEBUG = True and use len(django.db.connection.queries) to prove that no additional queries are being done.

If you could put 1 + 2 into one patch, and then 3 + 4 into another patch, that would be fantastic. But if you've got no interest on working on this, let us know.

comment:2 Changed 4 years ago by mark0978

I'll test that fix in my app first, and if it does the trick, I'll take this on, if not, I may need some help to find a non-hacky fix. I think ContentType is too useful to be broken as badly as this is.

comment:3 Changed 4 years ago by lukeplant

I wouldn't say that ContentType is really broken without this - you can still use select_related to cut down the number of DB queries. But it would be nice to make use of the cache we already have for it.

comment:4 follow-up: Changed 4 years ago by mark0978

Well, setting use_for_related_fields=True didn't work, however the reason it didn't work is a little murky and I wonder if it might be related to the multi-db stuff that was put in recently.

            if getattr(rel_mgr, 'use_for_related_fields', False):
                rel_obj = rel_mgr.using(db).get(**params)

In this code, the rel_mgr.using(db) returns a QuerySet object, so that get() on the rel_mgr seems to get bypassed and you end up in the QuerySet's get, most of the way to the DB.

I can fix this by changing it to:

            if getattr(rel_mgr, 'use_for_related_fields', False):
                params['using'] = db
                rel_obj = rel_mgr.get(**params)

which allows me to handle caching in ContentTypeManager(models.Manager).get

I'm wondering how many other places .using(db) has been used instead of passing in the argument as using=db

Any ideas on how to test this? Grab the connection.queries and check the length before and after?

comment:5 in reply to: ↑ 4 Changed 4 years ago by lukeplant

The use of the using method seems correct, and putting it into the params is definitely incorrect - those params are meant to be the fields on the model that you want to match.

Looking at it again, it seems that the cache on ContentType is just not designed for your use case. It doesn't even override the get method so that it uses the cache. The cache is designed for the places that call ContentType.objects.get_for_* explicitly. Even if it does, you run into the fact that the use of using means you end up in QuerySet.get.

There might be a way to get it to work - by overriding the 'Manager.get_query_set' method - but I imagine it could get tricky, and we might decide it's too invasive anyway. Adding 'content_type' to your select_related call seems like a much easier fix for your use case.

comment:6 follow-up: Changed 4 years ago by mark0978

I fail to see how putting using into the kwargs is wrong. QuerySet has using as a named parameter, so it being in kwargs means it is extracted from there before the rest of the kwargs are processed by the routine.

The code prior to the multi-db changes called the manager get method, the new code sidesteps the manager and does a get through the QuerySet which sort of defeats the purpose of the manager, since it won't be involved in the get anyway.

Pre-Multi-db (commit 11938/11951)

            if getattr(rel_mgr, 'use_for_related_fields', False):
                rel_obj = rel_mgr.get(**params)
            else:
                rel_obj = QuerySet(self.field.rel.to).get(**params)
            setattr(instance, cache_name, rel_obj)
            return rel_obj

vs

Post Multi-DB (commit 11952)

            using = instance._state.db or DEFAULT_DB_ALIAS
            if getattr(rel_mgr, 'use_for_related_fields', False):
                rel_obj = rel_mgr.using(using).get(**params)
            else:
                rel_obj = QuerySet(self.field.rel.to).using(using).get(**params)
            setattr(instance, cache_name, rel_obj)
            return rel_obj

The problem with overriding get_query_set is that it needs to return a QuerySet() so that other things can be added to the end of it. If you can't replace get(), I'm not sure how you manage to cache within the Manager class. If you are going to use the QuerySet, why even bother to have the use_for_related_fields flag? Now, in both cases you get a QuerySet, not the manager itself.

comment:7 in reply to: ↑ 6 Changed 4 years ago by lrekucki

Replying to mark0978:

I fail to see how putting using into the kwargs is wrong. QuerySet has using as a named parameter, so it being in kwargs means it is extracted from there before the rest of the kwargs are processed by the routine.

It's QuerySet.__init__() that has using as an argument, not QuerySet.get() which is called here. One solution that comes to mind, would be overriding ContentTypeManager so it returns a subclass of QuerySet. A simple get() is easy to handle, but (for example) any filters() with reference to related objects can't really use the cache.

comment:8 follow-up: Changed 4 years ago by mark0978

Ok, my mistake on the using part, I did get the QuerySet.init() confused with the get.

I still believe that if use_for_related_fields is set, it should use the get on the Manager, but I'll drop it for now.

If you were to use a subclass of QuerySet and override get_query_set() to perform the query caching, how would you hook up the subclass to the Manager class cache so you don't end up with 2 caches?

I am assuming the Manager cache is in the class object, not in the instance of the Manager class.

I can solve my current problem with select_related, but that ends up creating N ContentType objects, where the cache based method just reuses existing instances.

comment:9 in reply to: ↑ 8 Changed 4 years ago by lukeplant

  • Resolution set to duplicate
  • Status changed from new to closed
  • Triage Stage changed from Accepted to Someday/Maybe

Replying to mark0978:

I still believe that if use_for_related_fields is set, it should use the get on the Manager, but I'll drop it for now.

That's not the way that managers work. For example, if you do SomeModel.some_manager.filter(**kwargs).filter(**kwargs2), the second filter is always the filter on a QuerySet object. Methods like filter and get are meant to be proxies to the QuerySet methods.

If you were to use a subclass of QuerySet and override get_query_set() to perform the query caching, how would you hook up the subclass to the Manager class cache so you don't end up with 2 caches?

I think we are basically in the realm of a special case of ticket #17 now. Since 1) the cache on ContentTypeManager is not meant for this use case 2) there is no easy way to make it work for this use case 3) select_related fixes all but the duplicated instance problem, which is covered by #17, I'm closing as duplicate.

comment:10 Changed 22 months ago by Kronuz

  • Cc Kronuz added
  • Easy pickings unset
  • Severity set to Normal
  • Type set to Uncategorized
  • UI/UX unset

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.