Code

Opened 6 years ago

Closed 3 years ago

#7338 closed New feature (wontfix)

Method .cache(timeout) in QuerySet

Reported by: marinho Owned by: marinho
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: semente@…, ross@…, msaelices@…, ramusus@… Triage Stage: Design decision needed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I made these changes in the QuerySet and sql.Query classes in the database component of newforms-admin branch.

I added a new method called .cache(timeout) to use like we use filter(), exclude(), select_related(), etc. that if used, enable the sql.Query class to stores the SQL query result in the cache.

This improved my systems very well, because using it, I avoid to make several queries with the same SQL sentence in the same time from the database. As we know, memory is faster than database/filesystem, so, if you use memcached or locmem backends, is probably that you can can enhance your system using this method.

As my english is the best, I published a snippet that can clear for who not understand correctly what I meant: http://www.djangosnippets.org/snippets/777/

Attachments (3)

method_cache.diff (7.7 KB) - added by marinho 6 years ago.
Updated patch with cache method
method_cache_r9787.diff (3.2 KB) - added by msaelices 5 years ago.
Patch that work with django r9787 revision
method_cache_with_invalidation_r9787.diff (5.2 KB) - added by msaelices 5 years ago.
Caching with manager invalidation

Download all attachments as: .zip

Change History (25)

comment:1 Changed 6 years ago by marinho

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

I wrote wrong this part: "As my english is the best", I forgot the NOT part of the sentence... hehe ;)

comment:2 Changed 6 years ago by Guilherme M. Gondim <semente@…>

  • Cc semente@… added

comment:3 Changed 6 years ago by marinho

  • Owner changed from nobody to marinho
  • Status changed from new to assigned

Fixed patch wish key based on hash. This is because cache doesn't accept keys with 250+ characters.

comment:4 Changed 6 years ago by brosner

  • Version changed from newforms-admin to SVN

comment:5 Changed 6 years ago by marinho

Since I lost the detailed text that I wrote in the Django Snippet told above (that has been deleted), I am writing below exemples of use of the method.

Overriding get_query_set

This is an example for use cacheable request everytime.

class CategoryManager(models.Manager):
    def get_query_set(self):
        q = super(CategoryManager, self).get_query_set()

        return q.cache(300)

Single use

This is an example for use cacheable request just for that line.

lista = Video.all().cache(300)

comment:6 Changed 6 years ago by telenieko

  • milestone set to post-1.0
  • Triage Stage changed from Unreviewed to Accepted

You should:

  • Remove the "#-----"
  • Add test cases to test this new behaviour, like:
    • Run a query cached,
    • Modify objects,
    • Rerun the query cached, see it's not changed
    • Rerun without .cache(), see it's changed
  • Patch the documentation to report about this method.

Anyway I think that should go in post-1.0 ;)

And I'm not sure if running a query without cache should invalidate caches of that same query, but that could waste resources checking if cache exists for every query, so better forget.

comment:7 Changed 6 years ago by marinho

@telenieko

Thanks for advices :) I known that I need to improve the patch, but I'm waiting for the merge of 1.0-final (with newforms-admin) to do this :)

Changed 6 years ago by marinho

Updated patch with cache method

comment:8 Changed 6 years ago by marinho

Tests and documentation done. Also the proxy method in Manager class

comment:9 Changed 6 years ago by rossp

  • Cc ross@… added

comment:10 Changed 6 years ago by marinho

Two other approaches to do that:

but I still prefer the approach of this ticket, in my opinion is more simple and powerfull :)

comment:11 Changed 6 years ago by mtredinnick

  • milestone post-1.0 deleted
  • Triage Stage changed from Accepted to Design decision needed

I don't think this has a place in core at the moment. There are a lot of trade-offs you have to know about when doing caching like this (e.g. it assumes the data isn't changing rapidly enough for the caching to get unacceptably stale). It's already pretty easy to cache queries when you want them to be cached (and caching querysets with a key so they can be looked up later is probably better in most cases).

My feeling is that this is probably better done for now as a third-party thing that is invoked via a manager that provides a custom QuerySet class. Making it work smoothly with other custom Queryset / Query classes might require a little work and that's something you can bring up on django-dev if it turns out to be hard (it would be nice to make it possible). But tying the caching stuff into the SQL producing stuff so tightly doesn't sit comfortably with me.

Leaving open for the time being in the hopes that another core developer will comment one way or the other, but I'm giving it a -1 at the moment.

comment:12 Changed 5 years ago by anonymous

  • Cc msaelices@… added

Changed 5 years ago by msaelices

Patch that work with django r9787 revision

comment:13 follow-up: Changed 5 years ago by msaelices

I think Malcolmn is right. You cannot rely on a cache that you cannot control for invalidation. But I think this patch maybe can be a good idea to get a default cache system in Django ORM.

I will introduce what I think is the main problem with this patch: invalidation.

It would be great that, for example, in a FooModel saving, you could flush from cache all possible querysets from FooModel. Then it would be fantastic if you can do something like:

FooModel.objects.delete()

But with this patch this cannot be possible, because it uses hash cache key, and for memcached you cannot flush a bunch of keys (without hacking things). Therefore, if you have several views that uses FooModel querysets, like that:

# view 1
FooModel.objects.all()
# view 2
FooModel.objects.filter(published=True)

Every previous queryset has a diferent SQL sentence, and this means a different hash and cache key. I think a valid approach to get Manager.invalidate method could be something like:

  1. For every cache() call, besides generate a hashed cache key, you can add key value into a register cached in memcached also (but with a larger timeout) that maintains all cached keys from this manager.
  2. When invalidate() get called, this method will iterate into its cached register and will flush all querysets.

¿what is your opinion about this?

comment:14 in reply to: ↑ 13 Changed 5 years ago by msaelices

Replying to msaelices:

FooModel.objects.delete()

Sorry, i means FooModel.objects.invalidate()

Changed 5 years ago by msaelices

Caching with manager invalidation

comment:15 Changed 5 years ago by msaelices

My english, also is NOT best, so I prefer upload a patch with all I want to say implemented :)

This is a example test:

>>> from django.db import connection
>>> from beach.models import Beach
>>> Beach.objects.all().cache(timeout=3600)
[<Beach: Aguadulce>, ...]
>>> len(connection.queries)
6
>>> Beach.objects.all().cache(timeout=3600)
[<Beach: Aguadulce>, ...]
>>> len(connection.queries)
6
>>> Beach.objects.filter(status='published').cache(timeout=3600)
[<Beach: Aguadulce>, ...]
>>> len(connection.queries)
7
>>> Beach.objects.filter(status='published').cache(timeout=3600)
[<Beach: Aguadulce>, ...]
>>> len(connection.queries)
7
>>> Beach.objects.flush_cache() # invalidate all queries from this model managers
>>> Beach.objects.all().cache(timeout=3600)
[<Beach: Aguadulce>, ...]
>>> len(connection.queries)
8
>>> Beach.objects.filter(status='published').cache(timeout=3600)
[<Beach: Aguadulce>, ...]
>>> len(connection.queries)
9

comment:16 Changed 5 years ago by marinho

Malcolm, I see your point of view.

Maybe there is a better way to make a cache for ORM objects, and since I started to use Django, I'm thankful to core developers because in 95% of times you made amazing good choices, and this is why Django is amazing as it is :)

But I agree to msaelices. I'd like to have a default and strong way to cache query results, and share this among all users online, and I didn't see until now a better solution using third part or Manager inheritance, because there are some times when Django ORM ignores _default_manager and use its own Memory class. Then, a default cache way would resolve this.

And I like msaelices's idea for invalidation.

But since is a design decision, I have just followed your thoughts and waited for your decisions :)

comment:17 Changed 5 years ago by marinho

sorry, when I said "Memory class" I was meaning "Manager class"

comment:18 Changed 5 years ago by mtredinnick

All the latest reasons I've seen to include this are because "it's a compromise because of <some bigger problem>". So fix the real issue, rather than trying to put in a compromise. Sure, it's harder. But it's much more worthwhile in the long run and helps us work out whether it really is a bigger problem, or not really a problem and what the possible alternatives are.

For example, maybe an identity manager, a la #17 (or, at least some variation on #17, since that ticket wanders all over the place) is the solution. Maybe the sort of thing we want to do for Django 1.2 or 1.3 allowing specifying an alternative default QuerySet class is the right solution. Maybe it's something else.

Right now, though, I'm unconvinced that we're forced to compromise yet.

comment:19 Changed 5 years ago by ramusus

  • Cc ramusus@… added

comment:20 Changed 5 years ago by ed@…

I agree with the invalidation problem. And as other have pointed out, there are lots of things you need to know about the application to know which is the best approach to invalidate. I do think that if the invalidation problem is solved, it's definitely a good idea to add this feature. Here are some requirements off the top of my head for proper invalidation requirements.

  1. Never invalidate option... have separate process refresh the cache on schedule (if something has been updated). Performance is most important feature here.
  2. Invalidate on any delete/update/insert (I think this should be the default). In this scenario data consistency is the most important feature.
  3. It's OK to have stale data, just allow it to expire and reload on use. Watch out for dog pile.
  4. Allow for flushing.
  5. Allow for automatic upgrade from memcache to local file cache if result set it too large (I currently use a hack in the cache system to handle this). Too large might be under the 1mb memcache limit.
  6. More advanced feature that allows for certain specific records to trigger invalidation. For example, if you query 5 models, only 1 of them should be an instant invalidate. The rest should just allow something like option 1 to run.

I'd love to hear what requirements others think this feature should have. I've spent considerable time optimizing applications and integrating caching and having something that easily supports this out of the box for me would be great!

comment:21 Changed 3 years ago by lukeplant

  • Severity set to Normal
  • Type set to New feature

comment:22 Changed 3 years ago by Alex

  • Easy pickings unset
  • Resolution set to wontfix
  • Status changed from assigned to closed
  • UI/UX unset

wontfixing this: there are tons of project level decisions that Django can't enforce, about how to do invlidation, how to interact with non-Django DB access, and other concerns. Plus it's not really appropriate for the ORM to know about the cache, and it can live easily outside of Django (and there are several solutions that do this). Basically what Malcolm said in his first comment.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.