Code

Opened 9 years ago

Closed 3 years ago

#17 closed New feature (wontfix)

Metasystem optimization: Share select_related in memory

Reported by: adrian Owned by: PhiR
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: feature caching
Cc: miracle2k, micsco, dcramer@…, brosner@…, paching@…, wbyoung@…, jesse.lovelace@…, calvin@…, david, haavikko@…, matt@…, romme@…, django@…, trevor, datavortex@…, mmalone, alexkoshelev, xiterrex@…, cdestigter, eallik@…, dan.fairs@…, miloslav.pojman@… Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

When using select_related, each cache is stored separately in memory. For
example, each Choice object here has its Poll cached, but each of those caches
is a separate object in memory. It would require less memory if the caches were
references to the same Poll object.

>>> from djangomodels.polls import choices
>>> choice_list = choices.get_list(poll__id__exact=90, select_related=True)
>>> id(choice_list[0]._poll_cache)
-156344020
>>> id(choice_list[1]._poll_cache)
-156344084

Attachments (9)

patch (4.0 KB) - added by (removed) 7 years ago.
rough stab at instance caching
patch2.diff (4.0 KB) - added by David Cramer <dcramer@…> 7 years ago.
corrected an issue with pk_val not being set
instance-caching-2.patch (20.9 KB) - added by (removed) 7 years ago.
instance-caching-v2
instance-caching-3.patch (11.2 KB) - added by (removed) 7 years ago.
instance caching v3; add appropriate cache cleaning to Model.delete, and tweak queries delete_objects to prune the cache while it's wiping pk instances. Only potential issue I'm aware of at this point is if there are multiple pk's for a record, which is screwy behaviour for django in general.
5737_instance_caching.diff (8.6 KB) - added by Brian Rosner <brosner@…> 7 years ago.
updated patch to #5737 with some minor edits
instance-caching-4.patch (12.7 KB) - added by (removed) 7 years ago.
split against 5737; preserving test updates (thus tests pass), and xrange updates (as said, modifying 4 lines away).
instance-caching-5.patch (12.8 KB) - added by (removed) 7 years ago.
add in a class attr able to disable instance caching for that model.
17_no_tests.diff (10.8 KB) - added by PhiR 7 years ago.
cleanup and comments. Not working but comments welcome
17.diff (19.5 KB) - added by PhiR 7 years ago.
added tests, and style issues

Download all attachments as: .zip

Change History (69)

comment:1 Changed 8 years ago by Here

  • Type changed from enhancement to defect

comment:2 Changed 7 years ago by Gary Wilson <gary.wilson@…>

  • Triage Stage changed from Unreviewed to Accepted

Barring any gotchas, I don't think anyone would argue with this.

comment:3 Changed 7 years ago by (removed)

  • Cc ferringb@… added

Worth noting this also will speed things up pretty nicely for mass instance creation- Model.init

Downside to such instance caching is that currently with django, you can get away with modifying an instance and decide to not commit the changes- if it's a shared instance, that no longer flies. Have a patch locally that does caching, but had a few failures test wise.

Still any interest? If so, can dig it out and update it.

comment:4 Changed 7 years ago by SmileyChris

Can't harm to attach your patch to this ticket, even if it still needs work.

Changed 7 years ago by (removed)

rough stab at instance caching

comment:5 Changed 7 years ago by (removed)

Offhand, that patch breaks a couple of tests; screws up a couple of tests that rely on checking the # of queries to see if things behaved (those tests are slightly broke imo anyways, since they're not checking *what* the queries were about, just that the db was touched N times).

Basically consists of two changes;
change 1: modify Model metaclass overriding default call to do a weakref cache lookup, returning the instance if it's already available; additionally, modify the save method to update the cache for new instances post commit so that they're reused.

change 2: mangle fields.ReverseSingleRelatedObjectDescriptor inserting a cache lookup; did it this way since bastardizing the query api didn't seem wise, but changing record_inst.foreign_key to try and use cached instances (thus avoiding hitting the db) *is* useful.

Mind you, been a while since I've done serious testing against this, but recall change #2 being particularly useful for select_related; reduction of 32s to 23s for 53k record set joining in around 20 unique foreign keys. 14.6k records with a simple display, reduction from 11.8s to 5.4s; mainly via again avoiding the overkill of select_related, and via dodging Model.init; *very* heavy function call runtime wise.

Fair warning also; works for my own uses, but definitely needs heavier testing (and unittests). Basically is a proof of concept.

comment:6 Changed 7 years ago by SmileyChris

  • Has patch set
  • Needs tests set
  • Patch needs improvement set

comment:7 Changed 7 years ago by David Cramer <dcramer@…>

  • Cc dcramer@… added

We are running this live on curse-gaming.com now -- in a brief set of tests I ran we gained 30% speed and 80% more efficiency on memory use in our apps. We ran into one issue w/ a set of models doing .select_related() (at a depth of 2) but it wasn't serious for us (as we don't do 2nd depth select_related's.. i dont recommend anyone to do it) and this change is VERY important for load optimization.

comment:8 Changed 7 years ago by Brian Rosner <brosner@…>

  • Cc brosner@… added

Changed 7 years ago by David Cramer <dcramer@…>

corrected an issue with pk_val not being set

comment:9 follow-up: Changed 7 years ago by David Cramer <dcramer@…>

There is still an issue with delete objects. We are implementing patch #1796 as I was told it may help resolve this issue (and we also needed to resolve the issue it addresses).

The current problem is it's trying to .delete() it twice (at least it seems) so when the second time comes around there is no id attribute set so it throws an AssertionError

comment:10 in reply to: ↑ 9 Changed 7 years ago by (removed)

re: patch2.diff, that's broke- note

self.__instance_cache__.setdefault(getattr(self, self._meta.pk.attname, self))

that setdefault's a None in place. Should be

self.__instance_cache__[getattr(self, self._meta.pk.attname)] = self

instead; Forces the cache to reuse the last saved version whenever the next request comes in.

Aside from that, patch has issues with json/xml serialization bits which basically just requires disabling instance caching for generation of the 'detached' instance that haven't yet been saved.

Replying to David Cramer <dcramer@gmail.com>:

There is still an issue with delete objects. We are implementing patch #1796 as I was told it may help resolve this issue (and we also needed to resolve the issue it addresses).

The current problem is it's trying to .delete() it twice (at least it seems) so when the second time comes around there is no id attribute set so it throws an AssertionError

Got a test case? I'm offline for the next 36 hours, but have got a local patch (will post when the system is online) passing all unittests at this point- haven't seen anything delete related in local fixing.

Changed 7 years ago by (removed)

instance-caching-v2

comment:11 Changed 7 years ago by anonymous

  • Cc paching@… added

Changed 7 years ago by (removed)

instance caching v3; add appropriate cache cleaning to Model.delete, and tweak queries delete_objects to prune the cache while it's wiping pk instances. Only potential issue I'm aware of at this point is if there are multiple pk's for a record, which is screwy behaviour for django in general.

comment:12 Changed 7 years ago by anonymous

  • Cc wbyoung@… added

comment:13 Changed 7 years ago by anonymous

  • Cc jesse.lovelace@… added

comment:14 Changed 7 years ago by Brian Rosner <brosner@…>

I have updated this patch to trunk. I made some minor changes to some naming of variables and comments. The test suite errors with 4 errors related to data from the instance being a byte-string and not unicode. I searched through all the code and could not find why. This doesn't have anything to do with how the files are being saved or accessed? Perhaps it is just some setting on my side. Once I can grab some more time I will write up some more tests for this patch to ensure it works better.

Changed 7 years ago by Brian Rosner <brosner@…>

updated patch to #5737 with some minor edits

comment:15 follow-up: Changed 7 years ago by Brian Rosner <brosner@…>

Oh and I wanted to mention that I removed some changes in areas that really don't pertain to what this ticket is trying to accomplish. While the changes were better they are not really related.

comment:16 in reply to: ↑ 15 Changed 7 years ago by (removed)

  • Patch needs improvement unset

Replying to Brian Rosner <brosner@gmail.com>:

Oh and I wanted to mention that I removed some changes in areas that really don't pertain to what this ticket is trying to accomplish. While the changes were better they are not really related.

Only unrelated changes (patch wise) you removed were conversion of range in delete to xrange; those might as well stay (already tweaking that area of the file).

The tests changes that were dropped (forced conversion of a few test args up front) however *are* required- run the tests, it'll hork. They're needed- that's more a sign that the misc fields are screwed up since they're not forcing unicode conversion, but that's a seperate mess only exposed by tests.

Either way, a version of this is maintained in a bzr branch at http://pkgcore.org/~ferringb/bzr/instance-caching/ ; using it for curse these days, so I keep it reasonably within trunk (week lag is usually the most).

Related to that, please don't do misc var renames when correcting for bit rot- not trying to be possessive, moreso it's extra noise when trying to nail down exactly what was updated between patch versions, and changes the api the functionality provides when kwargs are mangled.

Finally, dropping the "patch needs improvement"; not really relevant at this point (my earlier comments have long since been fixed).

Changed 7 years ago by (removed)

split against 5737; preserving test updates (thus tests pass), and xrange updates (as said, modifying 4 lines away).

Changed 7 years ago by (removed)

add in a class attr able to disable instance caching for that model.

comment:17 Changed 7 years ago by Simon G. <dev@…>

  • Needs tests unset
  • Triage Stage changed from Accepted to Ready for checkin

This looks good, tests and everything, and would be a fantastic improvment, so I'm bumping this to RFC.

comment:18 Changed 7 years ago by jacob

  • Needs documentation set
  • Patch needs improvement set
  • Triage Stage changed from Ready for checkin to Accepted

This is a very good idea, but isn't quite ready for prime-time. Some things that need fixing:

  • The class attribute feels hackish, and is badly named. I'd prefer something simpler like Model.objects.select_related(cache=False) -- any reason not to do this?
  • A note needs to be added to the docs about this behavior so that people don't get bitten by stale select related stuff.
  • There's a lot of dense code here without docstrings/comments; I can barely follow what's going on. Some more explanation of what's happening would be very nice with a feature of this complexity.


comment:19 Changed 7 years ago by PhiR

The class attribute allows very fine-grained control over caching: you can select which classes will be cached. But a cache=False argument to select_related would be a nice complement.

comment:20 Changed 7 years ago by PhiR

  • Owner changed from nobody to PhiR

Changed 7 years ago by PhiR

cleanup and comments. Not working but comments welcome

comment:21 Changed 7 years ago by PhiR

My patch addresses cleans up and documents the doc, and adds way more tests. It also removed unrelated changes that were present in the original patch. It's still subject to jacob's points 1 and 2, namely how to configure the cache and lack of user doc. Note that by default the cache is disabled so it's safe.

Changed 7 years ago by PhiR

added tests, and style issues

comment:22 Changed 7 years ago by PhiR

  • Keywords feature caching added

comment:23 Changed 7 years ago by PhiR

  • Status changed from new to assigned

comment:24 Changed 7 years ago by ubernostrum

See also #3369, which was closed in favor of this ticket.

comment:25 Changed 6 years ago by Bastian Kleineidam <calvin@…>

  • Cc calvin@… added

comment:26 Changed 6 years ago by PhiR

See #5514 for a use case (besides the performance/memory gains).

comment:27 Changed 6 years ago by david

  • Cc larlet@… added

comment:28 follow-up: Changed 6 years ago by axiak

I was talking with PhiR in the IRC channel. I would just like to mention two issues:

  1. Thread safety -- for python > 2.3 we can wrap the storage in threading.local to avoid issues. Though maybe making it thread-aware could lead to other enhancements (and lots of trouble).
  2. Cache cleanup (garbage collection) -- Maybe we can attach a cleanup of the local cache to the response signal. This should be thought through though.

comment:29 in reply to: ↑ 28 ; follow-up: Changed 6 years ago by anonymous

Replying to axiak:

I was talking with PhiR in the IRC channel. I would just like to mention two issues:

  1. Thread safety -- for python > 2.3 we can wrap the storage in threading.local to avoid issues. Though maybe making it thread-aware could lead to other enhancements (and lots of trouble).

Each thread gets it's own view of the cache- trying to allow instance sharing across threads means that thread 1 (which is doing an edit that will ultimately fail) is screwing with the data thread 2 (which is doing a simple view rendering) uses, meaning bad mojo... and if someone suggests locking to try and duck that, I wedgie them, that's even worse mojo ;)

  1. Cache cleanup (garbage collection) -- Maybe we can attach a cleanup of the local cache to the response signal. This should be thought through though.

The cache is a WeakValueDictionary... meaning weak ref'ing of the instances. This means there is no garbage collection issues- if a cycle exists to hold it in memory, it would exist without this code anyways.

comment:30 in reply to: ↑ 29 Changed 6 years ago by axiak

Replying to anonymous:

Replying to axiak:

I was talking with PhiR in the IRC channel. I would just like to mention two issues:

  1. Thread safety -- for python > 2.3 we can wrap the storage in threading.local to avoid issues. Though maybe making it thread-aware could lead to other enhancements (and lots of trouble).

Each thread gets it's own view of the cache- trying to allow instance sharing across threads means that thread 1 (which is doing an edit that will ultimately fail) is screwing with the data thread 2 (which is doing a simple view rendering) uses, meaning bad mojo... and if someone suggests locking to try and duck that, I wedgie them, that's even worse mojo ;)

As I understand it the storage is part of the class object, which is global, no? So currently the patch does not behave like this and this will need to be addressed.

[...]
The cache is a WeakValueDictionary... meaning weak ref'ing of the instances. This means there is no garbage collection issues- if a cycle exists to hold it in memory, it would exist without this code anyways.

Ah, didn't see that.

comment:31 Changed 6 years ago by PhiR

If you are interested in this ticket you should check out:

http://code.djangoproject.com/wiki/DjangoSpecifications/Core/SingleInstance

If you notice something that this page doesn't mention please add it.

comment:32 Changed 6 years ago by anonymous

  • milestone set to 1.0
  • Version set to queryset-refactor

comment:33 Changed 6 years ago by anonymous

  • milestone 1.0 deleted
  • Version queryset-refactor deleted

comment:34 Changed 6 years ago by oliver@…

  • Cc oliver@… added

comment:35 Changed 6 years ago by mrts

  • milestone set to post-1.0
  • Version set to SVN

Not in scope for 1.0.

comment:36 Changed 6 years ago by haavikko

  • Cc haavikko@… added

comment:37 Changed 6 years ago by anonymous

  • Cc micsco added

comment:38 Changed 6 years ago by mattrussell

  • Cc matt@… added

comment:39 Changed 6 years ago by (removed)

  • Cc ferringb@… removed

comment:40 Changed 5 years ago by RommeDeSerieux

  • Cc romme@… added

comment:42 Changed 5 years ago by dcramer

Ive worked up a quick "SingletonModel" piece of code. It allows you to define a model, by inheriting from SingletonModel, as a Singleton. It works almost identically to this patch, barring the removal of some unneeded code.

http://github.com/dcramer/django-singletons/tree/master

comment:43 Changed 5 years ago by dcramer

Just a note, django-singletons was renamed to django-idmapper: http://github.com/dcramer/django-idmapper/tree/master

My mistake for the incorrect terminology. It *only* provides an Identity Mapper, just as this patch does.

comment:44 Changed 5 years ago by wogan

  • Cc django@… added

comment:45 Changed 5 years ago by trevor

  • Cc trevor added

comment:46 Changed 5 years ago by anonymous

  • milestone post-1.0 deleted

Milestone post-1.0 deleted

comment:47 Changed 5 years ago by david

  • Cc david added; larlet@… removed

comment:48 Changed 5 years ago by datavortex

  • Cc datavortex@… added

comment:49 Changed 5 years ago by mmalone

  • Cc mmalone added

comment:50 Changed 5 years ago by alexkoshelev

  • Cc alexkoshelev added

comment:51 Changed 4 years ago by terrex

  • Cc xiterrex@… added

comment:52 Changed 4 years ago by miracle2k

  • Cc miracle2k added

comment:53 Changed 4 years ago by cdestigter

  • Cc cdestigter added

comment:54 Changed 3 years ago by RaceCondition

  • Cc eallik@… added

comment:55 Changed 3 years ago by danfairs

  • Cc dan.fairs@… added

comment:56 Changed 3 years ago by mila

  • Cc miloslav.pojman@… added

comment:57 Changed 3 years ago by lrekucki

  • Severity changed from normal to Normal
  • Type changed from defect to Cleanup/optimization

comment:58 Changed 3 years ago by jezdez

  • Type changed from Cleanup/optimization to New feature

This is really a new feature.

comment:59 Changed 3 years ago by obeattie

  • Cc oliver@… removed

comment:60 Changed 3 years ago by jacob

  • Easy pickings unset
  • Resolution set to wontfix
  • Status changed from assigned to closed
  • UI/UX unset

I've spoken to Russ, and he concurs with my decision that this would fundamentally change the ORM, and it can't be part of Django core.

However, it's been proven that this is possible (albeit tricky) in external packages -- JohhnyCache, for example. Concrete suggestions about how to make things like Johnny easier are super-welcome, but for clarity those should get fresh new tickets.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.