Opened 18 years ago

Closed 12 years ago

Last modified 4 years ago

#17 closed New feature (wontfix)

Metasystem optimization: Share select_related in memory

Reported by: Adrian Holovaty Owned by: Philippe Raoult
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: feature caching
Cc: miracle2k, Mike Scott, dcramer@…, brosner@…, paching@…, wbyoung@…, jesse.lovelace@…, calvin@…, David Larlet, haavikko@…, matt@…, romme@…, django@…, Trevor Caira, datavortex@…, Michael Malone, Alexander Koshelev, xiterrex@…, Craig de Stigter, eallik@…, dan.fairs@…, miloslav.pojman@…, James Pic Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

When using select_related, each cache is stored separately in memory. For
example, each Choice object here has its Poll cached, but each of those caches
is a separate object in memory. It would require less memory if the caches were
references to the same Poll object.

>>> from djangomodels.polls import choices
>>> choice_list = choices.get_list(poll__id__exact=90, select_related=True)
>>> id(choice_list[0]._poll_cache)
-156344020
>>> id(choice_list[1]._poll_cache)
-156344084

Attachments (9)

patch (4.0 KB) - added by (removed) 17 years ago.
rough stab at instance caching
patch2.diff (4.0 KB) - added by David Cramer <dcramer@…> 17 years ago.
corrected an issue with pk_val not being set
instance-caching-2.patch (20.9 KB) - added by (removed) 17 years ago.
instance-caching-v2
instance-caching-3.patch (11.2 KB) - added by (removed) 17 years ago.
instance caching v3; add appropriate cache cleaning to Model.delete, and tweak queries delete_objects to prune the cache while it's wiping pk instances. Only potential issue I'm aware of at this point is if there are multiple pk's for a record, which is screwy behaviour for django in general.
5737_instance_caching.diff (8.6 KB) - added by Brian Rosner <brosner@…> 16 years ago.
updated patch to #5737 with some minor edits
instance-caching-4.patch (12.7 KB) - added by (removed) 16 years ago.
split against 5737; preserving test updates (thus tests pass), and xrange updates (as said, modifying 4 lines away).
instance-caching-5.patch (12.8 KB) - added by (removed) 16 years ago.
add in a class attr able to disable instance caching for that model.
17_no_tests.diff (10.8 KB) - added by Philippe Raoult 16 years ago.
cleanup and comments. Not working but comments welcome
17.diff (19.5 KB) - added by Philippe Raoult 16 years ago.
added tests, and style issues

Download all attachments as: .zip

Change History (73)

comment:1 Changed 17 years ago by Here

Type: enhancementdefect

comment:2 Changed 17 years ago by Gary Wilson <gary.wilson@…>

Triage Stage: UnreviewedAccepted

Barring any gotchas, I don't think anyone would argue with this.

comment:3 Changed 17 years ago by (removed)

Cc: ferringb@… added

Worth noting this also will speed things up pretty nicely for mass instance creation- Model.init

Downside to such instance caching is that currently with django, you can get away with modifying an instance and decide to not commit the changes- if it's a shared instance, that no longer flies. Have a patch locally that does caching, but had a few failures test wise.

Still any interest? If so, can dig it out and update it.

comment:4 Changed 17 years ago by Chris Beaven

Can't harm to attach your patch to this ticket, even if it still needs work.

Changed 17 years ago by (removed)

Attachment: patch added

rough stab at instance caching

comment:5 Changed 17 years ago by (removed)

Offhand, that patch breaks a couple of tests; screws up a couple of tests that rely on checking the # of queries to see if things behaved (those tests are slightly broke imo anyways, since they're not checking *what* the queries were about, just that the db was touched N times).

Basically consists of two changes;
change 1: modify Model metaclass overriding default call to do a weakref cache lookup, returning the instance if it's already available; additionally, modify the save method to update the cache for new instances post commit so that they're reused.

change 2: mangle fields.ReverseSingleRelatedObjectDescriptor inserting a cache lookup; did it this way since bastardizing the query api didn't seem wise, but changing record_inst.foreign_key to try and use cached instances (thus avoiding hitting the db) *is* useful.

Mind you, been a while since I've done serious testing against this, but recall change #2 being particularly useful for select_related; reduction of 32s to 23s for 53k record set joining in around 20 unique foreign keys. 14.6k records with a simple display, reduction from 11.8s to 5.4s; mainly via again avoiding the overkill of select_related, and via dodging Model.init; *very* heavy function call runtime wise.

Fair warning also; works for my own uses, but definitely needs heavier testing (and unittests). Basically is a proof of concept.

comment:6 Changed 17 years ago by Chris Beaven

Has patch: set
Needs tests: set
Patch needs improvement: set

comment:7 Changed 17 years ago by David Cramer <dcramer@…>

Cc: dcramer@… added

We are running this live on curse-gaming.com now -- in a brief set of tests I ran we gained 30% speed and 80% more efficiency on memory use in our apps. We ran into one issue w/ a set of models doing .select_related() (at a depth of 2) but it wasn't serious for us (as we don't do 2nd depth select_related's.. i dont recommend anyone to do it) and this change is VERY important for load optimization.

comment:8 Changed 17 years ago by Brian Rosner <brosner@…>

Cc: brosner@… added

Changed 17 years ago by David Cramer <dcramer@…>

Attachment: patch2.diff added

corrected an issue with pk_val not being set

comment:9 Changed 17 years ago by David Cramer <dcramer@…>

There is still an issue with delete objects. We are implementing patch #1796 as I was told it may help resolve this issue (and we also needed to resolve the issue it addresses).

The current problem is it's trying to .delete() it twice (at least it seems) so when the second time comes around there is no id attribute set so it throws an AssertionError

comment:10 in reply to:  9 Changed 17 years ago by (removed)

re: patch2.diff, that's broke- note

self.__instance_cache__.setdefault(getattr(self, self._meta.pk.attname, self))

that setdefault's a None in place. Should be

self.__instance_cache__[getattr(self, self._meta.pk.attname)] = self

instead; Forces the cache to reuse the last saved version whenever the next request comes in.

Aside from that, patch has issues with json/xml serialization bits which basically just requires disabling instance caching for generation of the 'detached' instance that haven't yet been saved.

Replying to David Cramer <dcramer@gmail.com>:

There is still an issue with delete objects. We are implementing patch #1796 as I was told it may help resolve this issue (and we also needed to resolve the issue it addresses).

The current problem is it's trying to .delete() it twice (at least it seems) so when the second time comes around there is no id attribute set so it throws an AssertionError

Got a test case? I'm offline for the next 36 hours, but have got a local patch (will post when the system is online) passing all unittests at this point- haven't seen anything delete related in local fixing.

Changed 17 years ago by (removed)

Attachment: instance-caching-2.patch added

instance-caching-v2

comment:11 Changed 17 years ago by anonymous

Cc: paching@… added

Changed 17 years ago by (removed)

Attachment: instance-caching-3.patch added

instance caching v3; add appropriate cache cleaning to Model.delete, and tweak queries delete_objects to prune the cache while it's wiping pk instances. Only potential issue I'm aware of at this point is if there are multiple pk's for a record, which is screwy behaviour for django in general.

comment:12 Changed 17 years ago by anonymous

Cc: wbyoung@… added

comment:13 Changed 16 years ago by anonymous

Cc: jesse.lovelace@… added

comment:14 Changed 16 years ago by Brian Rosner <brosner@…>

I have updated this patch to trunk. I made some minor changes to some naming of variables and comments. The test suite errors with 4 errors related to data from the instance being a byte-string and not unicode. I searched through all the code and could not find why. This doesn't have anything to do with how the files are being saved or accessed? Perhaps it is just some setting on my side. Once I can grab some more time I will write up some more tests for this patch to ensure it works better.

Changed 16 years ago by Brian Rosner <brosner@…>

Attachment: 5737_instance_caching.diff added

updated patch to #5737 with some minor edits

comment:15 Changed 16 years ago by Brian Rosner <brosner@…>

Oh and I wanted to mention that I removed some changes in areas that really don't pertain to what this ticket is trying to accomplish. While the changes were better they are not really related.

comment:16 in reply to:  15 Changed 16 years ago by (removed)

Patch needs improvement: unset

Replying to Brian Rosner <brosner@gmail.com>:

Oh and I wanted to mention that I removed some changes in areas that really don't pertain to what this ticket is trying to accomplish. While the changes were better they are not really related.

Only unrelated changes (patch wise) you removed were conversion of range in delete to xrange; those might as well stay (already tweaking that area of the file).

The tests changes that were dropped (forced conversion of a few test args up front) however *are* required- run the tests, it'll hork. They're needed- that's more a sign that the misc fields are screwed up since they're not forcing unicode conversion, but that's a seperate mess only exposed by tests.

Either way, a version of this is maintained in a bzr branch at http://pkgcore.org/~ferringb/bzr/instance-caching/ ; using it for curse these days, so I keep it reasonably within trunk (week lag is usually the most).

Related to that, please don't do misc var renames when correcting for bit rot- not trying to be possessive, moreso it's extra noise when trying to nail down exactly what was updated between patch versions, and changes the api the functionality provides when kwargs are mangled.

Finally, dropping the "patch needs improvement"; not really relevant at this point (my earlier comments have long since been fixed).

Changed 16 years ago by (removed)

Attachment: instance-caching-4.patch added

split against 5737; preserving test updates (thus tests pass), and xrange updates (as said, modifying 4 lines away).

Changed 16 years ago by (removed)

Attachment: instance-caching-5.patch added

add in a class attr able to disable instance caching for that model.

comment:17 Changed 16 years ago by Simon G. <dev@…>

Needs tests: unset
Triage Stage: AcceptedReady for checkin

This looks good, tests and everything, and would be a fantastic improvment, so I'm bumping this to RFC.

comment:18 Changed 16 years ago by Jacob

Needs documentation: set
Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

This is a very good idea, but isn't quite ready for prime-time. Some things that need fixing:

  • The class attribute feels hackish, and is badly named. I'd prefer something simpler like Model.objects.select_related(cache=False) -- any reason not to do this?
  • A note needs to be added to the docs about this behavior so that people don't get bitten by stale select related stuff.
  • There's a lot of dense code here without docstrings/comments; I can barely follow what's going on. Some more explanation of what's happening would be very nice with a feature of this complexity.


comment:19 Changed 16 years ago by Philippe Raoult

The class attribute allows very fine-grained control over caching: you can select which classes will be cached. But a cache=False argument to select_related would be a nice complement.

comment:20 Changed 16 years ago by Philippe Raoult

Owner: changed from nobody to Philippe Raoult

Changed 16 years ago by Philippe Raoult

Attachment: 17_no_tests.diff added

cleanup and comments. Not working but comments welcome

comment:21 Changed 16 years ago by Philippe Raoult

My patch addresses cleans up and documents the doc, and adds way more tests. It also removed unrelated changes that were present in the original patch. It's still subject to jacob's points 1 and 2, namely how to configure the cache and lack of user doc. Note that by default the cache is disabled so it's safe.

Changed 16 years ago by Philippe Raoult

Attachment: 17.diff added

added tests, and style issues

comment:22 Changed 16 years ago by Philippe Raoult

Keywords: feature caching added

comment:23 Changed 16 years ago by Philippe Raoult

Status: newassigned

comment:24 Changed 16 years ago by James Bennett

See also #3369, which was closed in favor of this ticket.

comment:25 Changed 16 years ago by Bastian Kleineidam <calvin@…>

Cc: calvin@… added

comment:26 Changed 16 years ago by Philippe Raoult

See #5514 for a use case (besides the performance/memory gains).

comment:27 Changed 16 years ago by David Larlet

Cc: larlet@… added

comment:28 Changed 16 years ago by Michael Axiak

I was talking with PhiR in the IRC channel. I would just like to mention two issues:

  1. Thread safety -- for python > 2.3 we can wrap the storage in threading.local to avoid issues. Though maybe making it thread-aware could lead to other enhancements (and lots of trouble).
  2. Cache cleanup (garbage collection) -- Maybe we can attach a cleanup of the local cache to the response signal. This should be thought through though.

comment:29 in reply to:  28 ; Changed 16 years ago by anonymous

Replying to axiak:

I was talking with PhiR in the IRC channel. I would just like to mention two issues:

  1. Thread safety -- for python > 2.3 we can wrap the storage in threading.local to avoid issues. Though maybe making it thread-aware could lead to other enhancements (and lots of trouble).

Each thread gets it's own view of the cache- trying to allow instance sharing across threads means that thread 1 (which is doing an edit that will ultimately fail) is screwing with the data thread 2 (which is doing a simple view rendering) uses, meaning bad mojo... and if someone suggests locking to try and duck that, I wedgie them, that's even worse mojo ;)

  1. Cache cleanup (garbage collection) -- Maybe we can attach a cleanup of the local cache to the response signal. This should be thought through though.

The cache is a WeakValueDictionary... meaning weak ref'ing of the instances. This means there is no garbage collection issues- if a cycle exists to hold it in memory, it would exist without this code anyways.

comment:30 in reply to:  29 Changed 16 years ago by Michael Axiak

Replying to anonymous:

Replying to axiak:

I was talking with PhiR in the IRC channel. I would just like to mention two issues:

  1. Thread safety -- for python > 2.3 we can wrap the storage in threading.local to avoid issues. Though maybe making it thread-aware could lead to other enhancements (and lots of trouble).

Each thread gets it's own view of the cache- trying to allow instance sharing across threads means that thread 1 (which is doing an edit that will ultimately fail) is screwing with the data thread 2 (which is doing a simple view rendering) uses, meaning bad mojo... and if someone suggests locking to try and duck that, I wedgie them, that's even worse mojo ;)

As I understand it the storage is part of the class object, which is global, no? So currently the patch does not behave like this and this will need to be addressed.

[...]
The cache is a WeakValueDictionary... meaning weak ref'ing of the instances. This means there is no garbage collection issues- if a cycle exists to hold it in memory, it would exist without this code anyways.

Ah, didn't see that.

comment:31 Changed 16 years ago by Philippe Raoult

If you are interested in this ticket you should check out:

http://code.djangoproject.com/wiki/DjangoSpecifications/Core/SingleInstance

If you notice something that this page doesn't mention please add it.

comment:32 Changed 15 years ago by anonymous

milestone: 1.0
Version: queryset-refactor

comment:33 Changed 15 years ago by anonymous

milestone: 1.0
Version: queryset-refactor

comment:34 Changed 15 years ago by oliver@…

Cc: oliver@… added

comment:35 Changed 15 years ago by mrts

milestone: post-1.0
Version: SVN

Not in scope for 1.0.

comment:36 Changed 15 years ago by Matti Haavikko

Cc: haavikko@… added

comment:37 Changed 15 years ago by anonymous

Cc: Mike Scott added

comment:38 Changed 15 years ago by mattrussell

Cc: matt@… added

comment:39 Changed 15 years ago by (removed)

Cc: ferringb@… removed

comment:40 Changed 15 years ago by RommeDeSerieux

Cc: romme@… added

comment:42 Changed 15 years ago by David Cramer

Ive worked up a quick "SingletonModel" piece of code. It allows you to define a model, by inheriting from SingletonModel, as a Singleton. It works almost identically to this patch, barring the removal of some unneeded code.

http://github.com/dcramer/django-singletons/tree/master

comment:43 Changed 15 years ago by David Cramer

Just a note, django-singletons was renamed to django-idmapper: http://github.com/dcramer/django-idmapper/tree/master

My mistake for the incorrect terminology. It *only* provides an Identity Mapper, just as this patch does.

comment:44 Changed 15 years ago by Greg Wogan-Browne

Cc: django@… added

comment:45 Changed 15 years ago by Trevor Caira

Cc: Trevor Caira added

comment:46 Changed 15 years ago by (none)

milestone: post-1.0

Milestone post-1.0 deleted

comment:47 Changed 15 years ago by David Larlet

Cc: David Larlet added; larlet@… removed

comment:48 Changed 15 years ago by datavortex

Cc: datavortex@… added

comment:49 Changed 14 years ago by Michael Malone

Cc: Michael Malone added

comment:50 Changed 14 years ago by Alexander Koshelev

Cc: Alexander Koshelev added

comment:51 Changed 14 years ago by Guillermo Gutiérrez

Cc: xiterrex@… added

comment:52 Changed 14 years ago by miracle2k

Cc: miracle2k added

comment:53 Changed 13 years ago by Craig de Stigter

Cc: Craig de Stigter added

comment:54 Changed 13 years ago by Erik Allik

Cc: eallik@… added

comment:55 Changed 13 years ago by Dan Fairs

Cc: dan.fairs@… added

comment:56 Changed 13 years ago by Miloslav Pojman

Cc: miloslav.pojman@… added

comment:57 Changed 13 years ago by Łukasz Rekucki

Severity: normalNormal
Type: defectCleanup/optimization

comment:58 Changed 13 years ago by Jannis Leidel

Type: Cleanup/optimizationNew feature

This is really a new feature.

comment:59 Changed 13 years ago by Oliver Beattie

Cc: oliver@… removed

comment:60 Changed 12 years ago by Jacob

Easy pickings: unset
Resolution: wontfix
Status: assignedclosed
UI/UX: unset

I've spoken to Russ, and he concurs with my decision that this would fundamentally change the ORM, and it can't be part of Django core.

However, it's been proven that this is possible (albeit tricky) in external packages -- JohhnyCache, for example. Concrete suggestions about how to make things like Johnny easier are super-welcome, but for clarity those should get fresh new tickets.

comment:61 Changed 5 years ago by James Pic

Do you remember the discussion you had with Russ about this ?

Is this good for re-opening ?

  • The ORM is much cleaner now than 7 years ago,
  • Many great ORMs have a builtin Identity Map, why can't Django's ?

Identity Map is a common programming pattern:

Last edited 5 years ago by James Pic (previous) (diff)

comment:62 Changed 5 years ago by James Pic

Cc: James Pic added

comment:64 Changed 5 years ago by James Pic

Thanks for the heads up Tim.

Last edited 4 years ago by James Pic (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top