Opened 16 years ago
Closed 13 years ago
#7338 closed New feature (wontfix)
Method .cache(timeout) in QuerySet
Reported by: | Marinho Brandão | Owned by: | Marinho Brandão |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | semente@…, ross@…, msaelices@…, ramusus@… | Triage Stage: | Design decision needed |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I made these changes in the QuerySet and sql.Query classes in the database component of newforms-admin branch.
I added a new method called .cache(timeout) to use like we use filter(), exclude(), select_related(), etc. that if used, enable the sql.Query class to stores the SQL query result in the cache.
This improved my systems very well, because using it, I avoid to make several queries with the same SQL sentence in the same time from the database. As we know, memory is faster than database/filesystem, so, if you use memcached or locmem backends, is probably that you can can enhance your system using this method.
As my english is the best, I published a snippet that can clear for who not understand correctly what I meant: http://www.djangosnippets.org/snippets/777/
Attachments (3)
Change History (25)
comment:1 by , 16 years ago
comment:2 by , 16 years ago
Cc: | added |
---|
comment:3 by , 16 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Fixed patch wish key based on hash. This is because cache doesn't accept keys with 250+ characters.
comment:4 by , 16 years ago
Version: | newforms-admin → SVN |
---|
comment:5 by , 16 years ago
Since I lost the detailed text that I wrote in the Django Snippet told above (that has been deleted), I am writing below exemples of use of the method.
Overriding get_query_set
This is an example for use cacheable request everytime.
class CategoryManager(models.Manager): def get_query_set(self): q = super(CategoryManager, self).get_query_set() return q.cache(300)
Single use
This is an example for use cacheable request just for that line.
lista = Video.all().cache(300)
comment:6 by , 16 years ago
milestone: | → post-1.0 |
---|---|
Triage Stage: | Unreviewed → Accepted |
You should:
- Remove the "#-----"
- Add test cases to test this new behaviour, like:
- Run a query cached,
- Modify objects,
- Rerun the query cached, see it's not changed
- Rerun without .cache(), see it's changed
- Patch the documentation to report about this method.
Anyway I think that should go in post-1.0 ;)
And I'm not sure if running a query without cache should invalidate caches of that same query, but that could waste resources checking if cache exists for every query, so better forget.
comment:7 by , 16 years ago
@telenieko
Thanks for advices :) I known that I need to improve the patch, but I'm waiting for the merge of 1.0-final (with newforms-admin) to do this :)
comment:9 by , 16 years ago
Cc: | added |
---|
comment:10 by , 16 years ago
Two other approaches to do that:
but I still prefer the approach of this ticket, in my opinion is more simple and powerfull :)
comment:11 by , 16 years ago
milestone: | post-1.0 |
---|---|
Triage Stage: | Accepted → Design decision needed |
I don't think this has a place in core at the moment. There are a lot of trade-offs you have to know about when doing caching like this (e.g. it assumes the data isn't changing rapidly enough for the caching to get unacceptably stale). It's already pretty easy to cache queries when you want them to be cached (and caching querysets with a key so they can be looked up later is probably better in most cases).
My feeling is that this is probably better done for now as a third-party thing that is invoked via a manager that provides a custom QuerySet class. Making it work smoothly with other custom Queryset / Query classes might require a little work and that's something you can bring up on django-dev if it turns out to be hard (it would be nice to make it possible). But tying the caching stuff into the SQL producing stuff so tightly doesn't sit comfortably with me.
Leaving open for the time being in the hopes that another core developer will comment one way or the other, but I'm giving it a -1 at the moment.
comment:12 by , 16 years ago
Cc: | added |
---|
by , 16 years ago
Attachment: | method_cache_r9787.diff added |
---|
Patch that work with django r9787 revision
follow-up: 14 comment:13 by , 16 years ago
I think Malcolmn is right. You cannot rely on a cache that you cannot control for invalidation. But I think this patch maybe can be a good idea to get a default cache system in Django ORM.
I will introduce what I think is the main problem with this patch: invalidation.
It would be great that, for example, in a FooModel
saving, you could flush from cache all possible querysets from FooModel
. Then it would be fantastic if you can do something like:
FooModel.objects.delete()
But with this patch this cannot be possible, because it uses hash cache key, and for memcached you cannot flush a bunch of keys (without hacking things). Therefore, if you have several views that uses FooModel querysets, like that:
# view 1 FooModel.objects.all() # view 2 FooModel.objects.filter(published=True)
Every previous queryset has a diferent SQL sentence, and this means a different hash and cache key. I think a valid approach to get Manager.invalidate
method could be something like:
- For every
cache()
call, besides generate a hashed cache key, you can add key value into a register cached in memcached also (but with a larger timeout) that maintains all cached keys from this manager. - When
invalidate()
get called, this method will iterate into its cached register and will flush all querysets.
¿what is your opinion about this?
comment:14 by , 16 years ago
by , 16 years ago
Attachment: | method_cache_with_invalidation_r9787.diff added |
---|
Caching with manager invalidation
comment:15 by , 16 years ago
My english, also is NOT best, so I prefer upload a patch with all I want to say implemented :)
This is a example test:
>>> from django.db import connection >>> from beach.models import Beach >>> Beach.objects.all().cache(timeout=3600) [<Beach: Aguadulce>, ...] >>> len(connection.queries) 6 >>> Beach.objects.all().cache(timeout=3600) [<Beach: Aguadulce>, ...] >>> len(connection.queries) 6 >>> Beach.objects.filter(status='published').cache(timeout=3600) [<Beach: Aguadulce>, ...] >>> len(connection.queries) 7 >>> Beach.objects.filter(status='published').cache(timeout=3600) [<Beach: Aguadulce>, ...] >>> len(connection.queries) 7 >>> Beach.objects.flush_cache() # invalidate all queries from this model managers >>> Beach.objects.all().cache(timeout=3600) [<Beach: Aguadulce>, ...] >>> len(connection.queries) 8 >>> Beach.objects.filter(status='published').cache(timeout=3600) [<Beach: Aguadulce>, ...] >>> len(connection.queries) 9
comment:16 by , 16 years ago
Malcolm, I see your point of view.
Maybe there is a better way to make a cache for ORM objects, and since I started to use Django, I'm thankful to core developers because in 95% of times you made amazing good choices, and this is why Django is amazing as it is :)
But I agree to msaelices. I'd like to have a default and strong way to cache query results, and share this among all users online, and I didn't see until now a better solution using third part or Manager inheritance, because there are some times when Django ORM ignores _default_manager and use its own Memory class. Then, a default cache way would resolve this.
And I like msaelices's idea for invalidation.
But since is a design decision, I have just followed your thoughts and waited for your decisions :)
comment:18 by , 16 years ago
All the latest reasons I've seen to include this are because "it's a compromise because of <some bigger problem>". So fix the real issue, rather than trying to put in a compromise. Sure, it's harder. But it's much more worthwhile in the long run and helps us work out whether it really is a bigger problem, or not really a problem and what the possible alternatives are.
For example, maybe an identity manager, a la #17 (or, at least some variation on #17, since that ticket wanders all over the place) is the solution. Maybe the sort of thing we want to do for Django 1.2 or 1.3 allowing specifying an alternative default QuerySet
class is the right solution. Maybe it's something else.
Right now, though, I'm unconvinced that we're forced to compromise yet.
comment:19 by , 15 years ago
Cc: | added |
---|
comment:20 by , 15 years ago
I agree with the invalidation problem. And as other have pointed out, there are lots of things you need to know about the application to know which is the best approach to invalidate. I do think that if the invalidation problem is solved, it's definitely a good idea to add this feature. Here are some requirements off the top of my head for proper invalidation requirements.
- Never invalidate option... have separate process refresh the cache on schedule (if something has been updated). Performance is most important feature here.
- Invalidate on any delete/update/insert (I think this should be the default). In this scenario data consistency is the most important feature.
- It's OK to have stale data, just allow it to expire and reload on use. Watch out for dog pile.
- Allow for flushing.
- Allow for automatic upgrade from memcache to local file cache if result set it too large (I currently use a hack in the cache system to handle this). Too large might be under the 1mb memcache limit.
- More advanced feature that allows for certain specific records to trigger invalidation. For example, if you query 5 models, only 1 of them should be an instant invalidate. The rest should just allow something like option 1 to run.
I'd love to hear what requirements others think this feature should have. I've spent considerable time optimizing applications and integrating caching and having something that easily supports this out of the box for me would be great!
comment:21 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → New feature |
comment:22 by , 13 years ago
Easy pickings: | unset |
---|---|
Resolution: | → wontfix |
Status: | assigned → closed |
UI/UX: | unset |
wontfixing this: there are tons of project level decisions that Django can't enforce, about how to do invlidation, how to interact with non-Django DB access, and other concerns. Plus it's not really appropriate for the ORM to know about the cache, and it can live easily outside of Django (and there are several solutions that do this). Basically what Malcolm said in his first comment.
I wrote wrong this part: "As my english is the best", I forgot the NOT part of the sentence... hehe ;)