Opened 17 years ago

Closed 13 years ago

#7338 closed New feature (wontfix)

Method .cache(timeout) in QuerySet

Reported by: Marinho Brandão Owned by: Marinho Brandão
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: semente@…, ross@…, msaelices@…, ramusus@… Triage Stage: Design decision needed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I made these changes in the QuerySet and sql.Query classes in the database component of newforms-admin branch.

I added a new method called .cache(timeout) to use like we use filter(), exclude(), select_related(), etc. that if used, enable the sql.Query class to stores the SQL query result in the cache.

This improved my systems very well, because using it, I avoid to make several queries with the same SQL sentence in the same time from the database. As we know, memory is faster than database/filesystem, so, if you use memcached or locmem backends, is probably that you can can enhance your system using this method.

As my english is the best, I published a snippet that can clear for who not understand correctly what I meant: http://www.djangosnippets.org/snippets/777/

Attachments (3)

method_cache.diff (7.7 KB ) - added by Marinho Brandão 16 years ago.
Updated patch with cache method
method_cache_r9787.diff (3.2 KB ) - added by Manuel Saelices 16 years ago.
Patch that work with django r9787 revision
method_cache_with_invalidation_r9787.diff (5.2 KB ) - added by Manuel Saelices 16 years ago.
Caching with manager invalidation

Download all attachments as: .zip

Change History (25)

comment:1 by Marinho Brandão, 17 years ago

I wrote wrong this part: "As my english is the best", I forgot the NOT part of the sentence... hehe ;)

comment:2 by Guilherme M. Gondim <semente@…>, 17 years ago

Cc: semente@… added

comment:3 by Marinho Brandão, 17 years ago

Owner: changed from nobody to Marinho Brandão
Status: newassigned

Fixed patch wish key based on hash. This is because cache doesn't accept keys with 250+ characters.

comment:4 by Brian Rosner, 17 years ago

Version: newforms-adminSVN

comment:5 by Marinho Brandão, 17 years ago

Since I lost the detailed text that I wrote in the Django Snippet told above (that has been deleted), I am writing below exemples of use of the method.

Overriding get_query_set

This is an example for use cacheable request everytime.

class CategoryManager(models.Manager):
    def get_query_set(self):
        q = super(CategoryManager, self).get_query_set()

        return q.cache(300)

Single use

This is an example for use cacheable request just for that line.

lista = Video.all().cache(300)

comment:6 by Marc Fargas, 17 years ago

milestone: post-1.0
Triage Stage: UnreviewedAccepted

You should:

  • Remove the "#-----"
  • Add test cases to test this new behaviour, like:
    • Run a query cached,
    • Modify objects,
    • Rerun the query cached, see it's not changed
    • Rerun without .cache(), see it's changed
  • Patch the documentation to report about this method.

Anyway I think that should go in post-1.0 ;)

And I'm not sure if running a query without cache should invalidate caches of that same query, but that could waste resources checking if cache exists for every query, so better forget.

comment:7 by Marinho Brandão, 16 years ago

@telenieko

Thanks for advices :) I known that I need to improve the patch, but I'm waiting for the merge of 1.0-final (with newforms-admin) to do this :)

by Marinho Brandão, 16 years ago

Attachment: method_cache.diff added

Updated patch with cache method

comment:8 by Marinho Brandão, 16 years ago

Tests and documentation done. Also the proxy method in Manager class

comment:9 by Ross Poulton, 16 years ago

Cc: ross@… added

comment:10 by Marinho Brandão, 16 years ago

Two other approaches to do that:

but I still prefer the approach of this ticket, in my opinion is more simple and powerfull :)

comment:11 by Malcolm Tredinnick, 16 years ago

milestone: post-1.0
Triage Stage: AcceptedDesign decision needed

I don't think this has a place in core at the moment. There are a lot of trade-offs you have to know about when doing caching like this (e.g. it assumes the data isn't changing rapidly enough for the caching to get unacceptably stale). It's already pretty easy to cache queries when you want them to be cached (and caching querysets with a key so they can be looked up later is probably better in most cases).

My feeling is that this is probably better done for now as a third-party thing that is invoked via a manager that provides a custom QuerySet class. Making it work smoothly with other custom Queryset / Query classes might require a little work and that's something you can bring up on django-dev if it turns out to be hard (it would be nice to make it possible). But tying the caching stuff into the SQL producing stuff so tightly doesn't sit comfortably with me.

Leaving open for the time being in the hopes that another core developer will comment one way or the other, but I'm giving it a -1 at the moment.

comment:12 by anonymous, 16 years ago

Cc: msaelices@… added

by Manuel Saelices, 16 years ago

Attachment: method_cache_r9787.diff added

Patch that work with django r9787 revision

comment:13 by Manuel Saelices, 16 years ago

I think Malcolmn is right. You cannot rely on a cache that you cannot control for invalidation. But I think this patch maybe can be a good idea to get a default cache system in Django ORM.

I will introduce what I think is the main problem with this patch: invalidation.

It would be great that, for example, in a FooModel saving, you could flush from cache all possible querysets from FooModel. Then it would be fantastic if you can do something like:

FooModel.objects.delete()

But with this patch this cannot be possible, because it uses hash cache key, and for memcached you cannot flush a bunch of keys (without hacking things). Therefore, if you have several views that uses FooModel querysets, like that:

# view 1
FooModel.objects.all()
# view 2
FooModel.objects.filter(published=True)

Every previous queryset has a diferent SQL sentence, and this means a different hash and cache key. I think a valid approach to get Manager.invalidate method could be something like:

  1. For every cache() call, besides generate a hashed cache key, you can add key value into a register cached in memcached also (but with a larger timeout) that maintains all cached keys from this manager.
  2. When invalidate() get called, this method will iterate into its cached register and will flush all querysets.

¿what is your opinion about this?

in reply to:  13 comment:14 by Manuel Saelices, 16 years ago

Replying to msaelices:

FooModel.objects.delete()

Sorry, i means FooModel.objects.invalidate()

by Manuel Saelices, 16 years ago

Caching with manager invalidation

comment:15 by Manuel Saelices, 16 years ago

My english, also is NOT best, so I prefer upload a patch with all I want to say implemented :)

This is a example test:

>>> from django.db import connection
>>> from beach.models import Beach
>>> Beach.objects.all().cache(timeout=3600)
[<Beach: Aguadulce>, ...]
>>> len(connection.queries)
6
>>> Beach.objects.all().cache(timeout=3600)
[<Beach: Aguadulce>, ...]
>>> len(connection.queries)
6
>>> Beach.objects.filter(status='published').cache(timeout=3600)
[<Beach: Aguadulce>, ...]
>>> len(connection.queries)
7
>>> Beach.objects.filter(status='published').cache(timeout=3600)
[<Beach: Aguadulce>, ...]
>>> len(connection.queries)
7
>>> Beach.objects.flush_cache() # invalidate all queries from this model managers
>>> Beach.objects.all().cache(timeout=3600)
[<Beach: Aguadulce>, ...]
>>> len(connection.queries)
8
>>> Beach.objects.filter(status='published').cache(timeout=3600)
[<Beach: Aguadulce>, ...]
>>> len(connection.queries)
9

comment:16 by Marinho Brandão, 16 years ago

Malcolm, I see your point of view.

Maybe there is a better way to make a cache for ORM objects, and since I started to use Django, I'm thankful to core developers because in 95% of times you made amazing good choices, and this is why Django is amazing as it is :)

But I agree to msaelices. I'd like to have a default and strong way to cache query results, and share this among all users online, and I didn't see until now a better solution using third part or Manager inheritance, because there are some times when Django ORM ignores _default_manager and use its own Memory class. Then, a default cache way would resolve this.

And I like msaelices's idea for invalidation.

But since is a design decision, I have just followed your thoughts and waited for your decisions :)

comment:17 by Marinho Brandão, 16 years ago

sorry, when I said "Memory class" I was meaning "Manager class"

comment:18 by Malcolm Tredinnick, 16 years ago

All the latest reasons I've seen to include this are because "it's a compromise because of <some bigger problem>". So fix the real issue, rather than trying to put in a compromise. Sure, it's harder. But it's much more worthwhile in the long run and helps us work out whether it really is a bigger problem, or not really a problem and what the possible alternatives are.

For example, maybe an identity manager, a la #17 (or, at least some variation on #17, since that ticket wanders all over the place) is the solution. Maybe the sort of thing we want to do for Django 1.2 or 1.3 allowing specifying an alternative default QuerySet class is the right solution. Maybe it's something else.

Right now, though, I'm unconvinced that we're forced to compromise yet.

comment:19 by ramusus, 15 years ago

Cc: ramusus@… added

comment:20 by ed@…, 15 years ago

I agree with the invalidation problem. And as other have pointed out, there are lots of things you need to know about the application to know which is the best approach to invalidate. I do think that if the invalidation problem is solved, it's definitely a good idea to add this feature. Here are some requirements off the top of my head for proper invalidation requirements.

  1. Never invalidate option... have separate process refresh the cache on schedule (if something has been updated). Performance is most important feature here.
  2. Invalidate on any delete/update/insert (I think this should be the default). In this scenario data consistency is the most important feature.
  3. It's OK to have stale data, just allow it to expire and reload on use. Watch out for dog pile.
  4. Allow for flushing.
  5. Allow for automatic upgrade from memcache to local file cache if result set it too large (I currently use a hack in the cache system to handle this). Too large might be under the 1mb memcache limit.
  6. More advanced feature that allows for certain specific records to trigger invalidation. For example, if you query 5 models, only 1 of them should be an instant invalidate. The rest should just allow something like option 1 to run.

I'd love to hear what requirements others think this feature should have. I've spent considerable time optimizing applications and integrating caching and having something that easily supports this out of the box for me would be great!

comment:21 by Luke Plant, 14 years ago

Severity: Normal
Type: New feature

comment:22 by Alex Gaynor, 13 years ago

Easy pickings: unset
Resolution: wontfix
Status: assignedclosed
UI/UX: unset

wontfixing this: there are tons of project level decisions that Django can't enforce, about how to do invlidation, how to interact with non-Django DB access, and other concerns. Plus it's not really appropriate for the ORM to know about the cache, and it can live easily outside of Django (and there are several solutions that do this). Basically what Malcolm said in his first comment.

Note: See TracTickets for help on using tickets.
Back to Top