#14195 closed Uncategorized (duplicate)
ContentType object fails to use cache, beats DB to death
Reported by: | Mark Jones | Owned by: | nobody |
---|---|---|---|
Component: | Uncategorized | Version: | 1.2 |
Severity: | Normal | Keywords: | |
Cc: | German M. Bravo | Triage Stage: | Someday/Maybe |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
using the content_type.model member causes a DB SELECT instead of using the cache of ContentTypes
This is the partial class definition
class Activity(models.Model): action_time = models.DateTimeField(_('action time'), auto_now=True) user = models.ForeignKey(User, related_name='activities') content_type = models.ForeignKey(ContentType, blank=True, null=True) ... def short_template(self): "Returns the name of the short template to display this item" return "activity/short/" + self.content_type.model + ".html"
this is the select statement
activities = Activity.objects.filter(user=user).select_related('user')[:25] for act in activities: print act.short_template()
The above code will hit the DB once for every Activity object where you do self.content.model, in spite of the comments in the code about a cache, and the presence of the cache in the code. Seems like the content_type should be built from the cache when the object is fetched, or at least fetched from the cache instead of the database.
Change History (10)
comment:1 by , 14 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 14 years ago
I'll test that fix in my app first, and if it does the trick, I'll take this on, if not, I may need some help to find a non-hacky fix. I think ContentType is too useful to be broken as badly as this is.
comment:3 by , 14 years ago
I wouldn't say that ContentType is really broken without this - you can still use select_related
to cut down the number of DB queries. But it would be nice to make use of the cache we already have for it.
follow-up: 5 comment:4 by , 14 years ago
Well, setting use_for_related_fields=True didn't work, however the reason it didn't work is a little murky and I wonder if it might be related to the multi-db stuff that was put in recently.
if getattr(rel_mgr, 'use_for_related_fields', False): rel_obj = rel_mgr.using(db).get(**params)
In this code, the rel_mgr.using(db) returns a QuerySet object, so that get() on the rel_mgr seems to get bypassed and you end up in the QuerySet's get, most of the way to the DB.
I can fix this by changing it to:
if getattr(rel_mgr, 'use_for_related_fields', False): params['using'] = db rel_obj = rel_mgr.get(**params)
which allows me to handle caching in ContentTypeManager(models.Manager).get
I'm wondering how many other places .using(db) has been used instead of passing in the argument as using=db
Any ideas on how to test this? Grab the connection.queries and check the length before and after?
comment:5 by , 14 years ago
The use of the using
method seems correct, and putting it into the params is definitely incorrect - those params are meant to be the fields on the model that you want to match.
Looking at it again, it seems that the cache on ContentType is just not designed for your use case. It doesn't even override the get
method so that it uses the cache. The cache is designed for the places that call ContentType.objects.get_for_*
explicitly. Even if it does, you run into the fact that the use of using
means you end up in QuerySet.get
.
There might be a way to get it to work - by overriding the 'Manager.get_query_set' method - but I imagine it could get tricky, and we might decide it's too invasive anyway. Adding 'content_type'
to your select_related
call seems like a much easier fix for your use case.
follow-up: 7 comment:6 by , 14 years ago
I fail to see how putting using into the kwargs is wrong. QuerySet has using as a named parameter, so it being in kwargs means it is extracted from there before the rest of the kwargs are processed by the routine.
The code prior to the multi-db changes called the manager get method, the new code sidesteps the manager and does a get through the QuerySet which sort of defeats the purpose of the manager, since it won't be involved in the get anyway.
Pre-Multi-db (commit 11938/11951)
if getattr(rel_mgr, 'use_for_related_fields', False): rel_obj = rel_mgr.get(**params) else: rel_obj = QuerySet(self.field.rel.to).get(**params) setattr(instance, cache_name, rel_obj) return rel_obj
vs
Post Multi-DB (commit 11952)
using = instance._state.db or DEFAULT_DB_ALIAS if getattr(rel_mgr, 'use_for_related_fields', False): rel_obj = rel_mgr.using(using).get(**params) else: rel_obj = QuerySet(self.field.rel.to).using(using).get(**params) setattr(instance, cache_name, rel_obj) return rel_obj
The problem with overriding get_query_set is that it needs to return a QuerySet() so that other things can be added to the end of it. If you can't replace get(), I'm not sure how you manage to cache within the Manager class. If you are going to use the QuerySet, why even bother to have the use_for_related_fields flag? Now, in both cases you get a QuerySet, not the manager itself.
comment:7 by , 14 years ago
Replying to mark0978:
I fail to see how putting using into the kwargs is wrong. QuerySet has using as a named parameter, so it being in kwargs means it is extracted from there before the rest of the kwargs are processed by the routine.
It's QuerySet.__init__()
that has using
as an argument, not QuerySet.get()
which is called here. One solution that comes to mind, would be overriding ContentTypeManager so it returns a subclass of QuerySet. A simple get()
is easy to handle, but (for example) any filters()
with reference to related objects can't really use the cache.
follow-up: 9 comment:8 by , 14 years ago
Ok, my mistake on the using part, I did get the QuerySet.init() confused with the get.
I still believe that if use_for_related_fields is set, it should use the get on the Manager, but I'll drop it for now.
If you were to use a subclass of QuerySet and override get_query_set() to perform the query caching, how would you hook up the subclass to the Manager class cache so you don't end up with 2 caches?
I am assuming the Manager cache is in the class object, not in the instance of the Manager class.
I can solve my current problem with select_related, but that ends up creating N ContentType objects, where the cache based method just reuses existing instances.
comment:9 by , 14 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
Triage Stage: | Accepted → Someday/Maybe |
Replying to mark0978:
I still believe that if use_for_related_fields is set, it should use the get on the Manager, but I'll drop it for now.
That's not the way that managers work. For example, if you do SomeModel.some_manager.filter(**kwargs).filter(**kwargs2)
, the second filter is always the filter on a QuerySet object. Methods like filter
and get
are meant to be proxies to the QuerySet methods.
If you were to use a subclass of QuerySet and override get_query_set() to perform the query caching, how would you hook up the subclass to the Manager class cache so you don't end up with 2 caches?
I think we are basically in the realm of a special case of ticket #17 now. Since 1) the cache on ContentTypeManager is not meant for this use case 2) there is no easy way to make it work for this use case 3) select_related
fixes all but the duplicated instance problem, which is covered by #17, I'm closing as duplicate.
comment:10 by , 13 years ago
Cc: | added |
---|---|
Easy pickings: | unset |
Severity: | → Normal |
Type: | → Uncategorized |
UI/UX: | unset |
I think this is a case of just adding 'use_for_related_fields' to the manager (if not, I suspect we would have to close WONTFIX or INVALID, because it would be too hacky to fix, and the comments don't make the guarantee that the cache will be used in a related context). However, I'd like tests, but they wouldn't fit nicely into the current 'django/contrib/contenttypes/tests.py' because of the need for a related model (like your example) to test them, and because we want to avoid more doctests.
So, if you would like to contribute, I think the best path is:
django/contrib/contenttypes/tests.py
into unit tests (Paul McMillan's SOC project is not touching anything under contrib, AFAIK, so you won't be duplicating work).contenttypes
directory intests/regressiontests
. You'll also need amodels.py
for your tests to run.tests/regressiontests/models.py
settings.DEBUG = True
and uselen(django.db.connection.queries)
to prove that no additional queries are being done.If you could put 1 + 2 into one patch, and then 3 + 4 into another patch, that would be fantastic. But if you've got no interest on working on this, let us know.