Django

Code

Ticket #7338 (assigned)

Opened 1 year ago

Last modified 4 months ago

Method .cache(timeout) in QuerySet

Reported by: marinho Assigned to: marinho (accepted)
Milestone: Component: Database layer (models, ORM)
Version: SVN Keywords:
Cc: semente@taurinus.org, ross@rossp.org, msaelices@yaco.es Triage Stage: Design decision needed
Has patch: 1 Needs documentation: 0
Needs tests: 0 Patch needs improvement: 0

Description

I made these changes in the QuerySet? and sql.Query classes in the database component of newforms-admin branch.

I added a new method called .cache(timeout) to use like we use filter(), exclude(), select_related(), etc. that if used, enable the sql.Query class to stores the SQL query result in the cache.

This improved my systems very well, because using it, I avoid to make several queries with the same SQL sentence in the same time from the database. As we know, memory is faster than database/filesystem, so, if you use memcached or locmem backends, is probably that you can can enhance your system using this method.

As my english is the best, I published a snippet that can clear for who not understand correctly what I meant: http://www.djangosnippets.org/snippets/777/

Attachments

method_cache.diff (7.7 kB) - added by marinho on 07/25/08 07:59:44.
Updated patch with cache method
method_cache_r9787.diff (3.2 kB) - added by msaelices on 01/23/09 21:43:34.
Patch that work with django r9787 revision
method_cache_with_invalidation_r9787.diff (5.2 kB) - added by msaelices on 01/24/09 08:10:35.
Caching with manager invalidation

Change History

05/30/08 15:58:57 changed by marinho

  • needs_better_patch changed.
  • needs_tests changed.
  • needs_docs changed.

I wrote wrong this part: "As my english is the best", I forgot the NOT part of the sentence... hehe ;)

06/03/08 12:20:35 changed by Guilherme M. Gondim <semente@taurinus.org>

  • cc set to semente@taurinus.org.

06/04/08 15:02:33 changed by marinho

  • owner changed from nobody to marinho.
  • status changed from new to assigned.

Fixed patch wish key based on hash. This is because cache doesn't accept keys with 250+ characters.

06/07/08 12:30:37 changed by brosner

  • version changed from newforms-admin to SVN.

06/20/08 06:39:20 changed by marinho

Since I lost the detailed text that I wrote in the Django Snippet told above (that has been deleted), I am writing below exemples of use of the method.

Overriding get_query_set

This is an example for use cacheable request everytime.

class CategoryManager(models.Manager):
    def get_query_set(self):
        q = super(CategoryManager, self).get_query_set()

        return q.cache(300)

Single use

This is an example for use cacheable request just for that line.

lista = Video.all().cache(300)

06/20/08 06:45:26 changed by telenieko

  • stage changed from Unreviewed to Accepted.
  • milestone set to post-1.0.

You should:

  • Remove the "#-----"
  • Add test cases to test this new behaviour, like:
    • Run a query cached,
    • Modify objects,
    • Rerun the query cached, see it's not changed
    • Rerun without .cache(), see it's changed
  • Patch the documentation to report about this method.

Anyway I think that should go in post-1.0 ;)

And I'm not sure if running a query without cache should invalidate caches of that same query, but that could waste resources checking if cache exists for every query, so better forget.

07/16/08 06:04:14 changed by marinho

@telenieko

Thanks for advices :) I known that I need to improve the patch, but I'm waiting for the merge of 1.0-final (with newforms-admin) to do this :)

07/25/08 07:59:44 changed by marinho

  • attachment method_cache.diff added.

Updated patch with cache method

07/25/08 08:01:37 changed by marinho

Tests and documentation done. Also the proxy method in Manager class

07/28/08 01:31:48 changed by rossp

  • cc changed from semente@taurinus.org to semente@taurinus.org, ross@rossp.org.

09/15/08 08:53:11 changed by marinho

Two other approaches to do that:

but I still prefer the approach of this ticket, in my opinion is more simple and powerfull :)

09/16/08 22:26:47 changed by mtredinnick

  • stage changed from Accepted to Design decision needed.
  • milestone deleted.

I don't think this has a place in core at the moment. There are a lot of trade-offs you have to know about when doing caching like this (e.g. it assumes the data isn't changing rapidly enough for the caching to get unacceptably stale). It's already pretty easy to cache queries when you want them to be cached (and caching querysets with a key so they can be looked up later is probably better in most cases).

My feeling is that this is probably better done for now as a third-party thing that is invoked via a manager that provides a custom QuerySet? class. Making it work smoothly with other custom Queryset / Query classes might require a little work and that's something you can bring up on django-dev if it turns out to be hard (it would be nice to make it possible). But tying the caching stuff into the SQL producing stuff so tightly doesn't sit comfortably with me.

Leaving open for the time being in the hopes that another core developer will comment one way or the other, but I'm giving it a -1 at the moment.

01/23/09 20:27:09 changed by anonymous

  • cc changed from semente@taurinus.org, ross@rossp.org to semente@taurinus.org, ross@rossp.org, msaelices@yaco.es.

01/23/09 21:43:34 changed by msaelices

  • attachment method_cache_r9787.diff added.

Patch that work with django r9787 revision

(follow-up: ↓ 14 ) 01/23/09 21:55:12 changed by msaelices

I think Malcolmn is right. You cannot rely on a cache that you cannot control for invalidation. But I think this patch maybe can be a good idea to get a default cache system in Django ORM.

I will introduce what I think is the main problem with this patch: invalidation.

It would be great that, for example, in a FooModel saving, you could flush from cache all possible querysets from FooModel. Then it would be fantastic if you can do something like:

FooModel.objects.delete()

But with this patch this cannot be possible, because it uses hash cache key, and for memcached you cannot flush a bunch of keys (without hacking things). Therefore, if you have several views that uses FooModel? querysets, like that:

# view 1
FooModel.objects.all()
# view 2
FooModel.objects.filter(published=True)

Every previous queryset has a diferent SQL sentence, and this means a different hash and cache key. I think a valid approach to get Manager.invalidate method could be something like:

  1. For every cache() call, besides generate a hashed cache key, you can add key value into a register cached in memcached also (but with a larger timeout) that maintains all cached keys from this manager.
  2. When invalidate() get called, this method will iterate into its cached register and will flush all querysets.

¿what is your opinion about this?

(in reply to: ↑ 13 ) 01/23/09 22:00:33 changed by msaelices

Replying to msaelices:

{{{ #!python FooModel?.objects.delete() }}}

Sorry, i means FooModel.objects.invalidate()

01/24/09 08:10:35 changed by msaelices

  • attachment method_cache_with_invalidation_r9787.diff added.

Caching with manager invalidation

01/24/09 08:18:12 changed by msaelices

My english, also is NOT best, so I prefer upload a patch with all I want to say implemented :)

This is a example test:

>>> from django.db import connection
>>> from beach.models import Beach
>>> Beach.objects.all().cache(timeout=3600)
[<Beach: Aguadulce>, ...]
>>> len(connection.queries)
6
>>> Beach.objects.all().cache(timeout=3600)
[<Beach: Aguadulce>, ...]
>>> len(connection.queries)
6
>>> Beach.objects.filter(status='published').cache(timeout=3600)
[<Beach: Aguadulce>, ...]
>>> len(connection.queries)
7
>>> Beach.objects.filter(status='published').cache(timeout=3600)
[<Beach: Aguadulce>, ...]
>>> len(connection.queries)
7
>>> Beach.objects.flush_cache() # invalidate all queries from this model managers
>>> Beach.objects.all().cache(timeout=3600)
[<Beach: Aguadulce>, ...]
>>> len(connection.queries)
8
>>> Beach.objects.filter(status='published').cache(timeout=3600)
[<Beach: Aguadulce>, ...]
>>> len(connection.queries)
9

01/24/09 10:12:15 changed by marinho

Malcolm, I see your point of view.

Maybe there is a better way to make a cache for ORM objects, and since I started to use Django, I'm thankful to core developers because in 95% of times you made amazing good choices, and this is why Django is amazing as it is :)

But I agree to msaelices. I'd like to have a default and strong way to cache query results, and share this among all users online, and I didn't see until now a better solution using third part or Manager inheritance, because there are some times when Django ORM ignores _default_manager and use its own Memory class. Then, a default cache way would resolve this.

And I like msaelices's idea for invalidation.

But since is a design decision, I have just followed your thoughts and waited for your decisions :)

01/24/09 10:14:18 changed by marinho

sorry, when I said "Memory class" I was meaning "Manager class"

02/27/09 21:49:04 changed by mtredinnick

All the latest reasons I've seen to include this are because "it's a compromise because of <some bigger problem>". So fix the real issue, rather than trying to put in a compromise. Sure, it's harder. But it's much more worthwhile in the long run and helps us work out whether it really is a bigger problem, or not really a problem and what the possible alternatives are.

For example, maybe an identity manager, a la #17 (or, at least some variation on #17, since that ticket wanders all over the place) is the solution. Maybe the sort of thing we want to do for Django 1.2 or 1.3 allowing specifying an alternative default QuerySet class is the right solution. Maybe it's something else.

Right now, though, I'm unconvinced that we're forced to compromise yet.


Add/Change #7338 (Method .cache(timeout) in QuerySet)




Change Properties
Action