#17 closed New feature (wontfix)
Metasystem optimization: Share select_related in memory
Reported by: | Adrian Holovaty | Owned by: | Philippe Raoult |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | feature caching |
Cc: | miracle2k, Mike Scott, dcramer@…, brosner@…, paching@…, wbyoung@…, jesse.lovelace@…, calvin@…, David Larlet, haavikko@…, matt@…, romme@…, django@…, Trevor Caira, datavortex@…, Michael Malone, Alexander Koshelev, xiterrex@…, Craig de Stigter, eallik@…, dan.fairs@…, miloslav.pojman@…, James Pic | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | yes |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
When using select_related
, each cache is stored separately in memory. For
example, each Choice object here has its Poll cached, but each of those caches
is a separate object in memory. It would require less memory if the caches were
references to the same Poll object.
>>> from djangomodels.polls import choices >>> choice_list = choices.get_list(poll__id__exact=90, select_related=True) >>> id(choice_list[0]._poll_cache) -156344020 >>> id(choice_list[1]._poll_cache) -156344084
Attachments (9)
Change History (73)
comment:1 by , 18 years ago
Type: | enhancement → defect |
---|
comment:2 by , 18 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:3 by , 18 years ago
Cc: | added |
---|
Worth noting this also will speed things up pretty nicely for mass instance creation- Model.init
Downside to such instance caching is that currently with django, you can get away with modifying an instance and decide to not commit the changes- if it's a shared instance, that no longer flies. Have a patch locally that does caching, but had a few failures test wise.
Still any interest? If so, can dig it out and update it.
comment:4 by , 18 years ago
Can't harm to attach your patch to this ticket, even if it still needs work.
comment:5 by , 18 years ago
Offhand, that patch breaks a couple of tests; screws up a couple of tests that rely on checking the # of queries to see if things behaved (those tests are slightly broke imo anyways, since they're not checking *what* the queries were about, just that the db was touched N times).
Basically consists of two changes;
change 1: modify Model metaclass overriding default call to do a weakref cache lookup, returning the instance if it's already available; additionally, modify the save method to update the cache for new instances post commit so that they're reused.
change 2: mangle fields.ReverseSingleRelatedObjectDescriptor inserting a cache lookup; did it this way since bastardizing the query api didn't seem wise, but changing record_inst.foreign_key to try and use cached instances (thus avoiding hitting the db) *is* useful.
Mind you, been a while since I've done serious testing against this, but recall change #2 being particularly useful for select_related; reduction of 32s to 23s for 53k record set joining in around 20 unique foreign keys. 14.6k records with a simple display, reduction from 11.8s to 5.4s; mainly via again avoiding the overkill of select_related, and via dodging Model.init; *very* heavy function call runtime wise.
Fair warning also; works for my own uses, but definitely needs heavier testing (and unittests). Basically is a proof of concept.
comment:6 by , 18 years ago
Has patch: | set |
---|---|
Needs tests: | set |
Patch needs improvement: | set |
comment:7 by , 18 years ago
Cc: | added |
---|
We are running this live on curse-gaming.com now -- in a brief set of tests I ran we gained 30% speed and 80% more efficiency on memory use in our apps. We ran into one issue w/ a set of models doing .select_related() (at a depth of 2) but it wasn't serious for us (as we don't do 2nd depth select_related's.. i dont recommend anyone to do it) and this change is VERY important for load optimization.
comment:8 by , 18 years ago
Cc: | added |
---|
follow-up: 10 comment:9 by , 18 years ago
There is still an issue with delete objects. We are implementing patch #1796 as I was told it may help resolve this issue (and we also needed to resolve the issue it addresses).
The current problem is it's trying to .delete() it twice (at least it seems) so when the second time comes around there is no id attribute set so it throws an AssertionError
comment:10 by , 18 years ago
re: patch2.diff, that's broke- note
self.__instance_cache__.setdefault(getattr(self, self._meta.pk.attname, self))
that setdefault's a None in place. Should be
self.__instance_cache__[getattr(self, self._meta.pk.attname)] = self
instead; Forces the cache to reuse the last saved version whenever the next request comes in.
Aside from that, patch has issues with json/xml serialization bits which basically just requires disabling instance caching for generation of the 'detached' instance that haven't yet been saved.
Replying to David Cramer <dcramer@gmail.com>:
There is still an issue with delete objects. We are implementing patch #1796 as I was told it may help resolve this issue (and we also needed to resolve the issue it addresses).
The current problem is it's trying to .delete() it twice (at least it seems) so when the second time comes around there is no id attribute set so it throws an AssertionError
Got a test case? I'm offline for the next 36 hours, but have got a local patch (will post when the system is online) passing all unittests at this point- haven't seen anything delete related in local fixing.
comment:11 by , 17 years ago
Cc: | added |
---|
by , 17 years ago
Attachment: | instance-caching-3.patch added |
---|
instance caching v3; add appropriate cache cleaning to Model.delete, and tweak queries delete_objects to prune the cache while it's wiping pk instances. Only potential issue I'm aware of at this point is if there are multiple pk's for a record, which is screwy behaviour for django in general.
comment:12 by , 17 years ago
Cc: | added |
---|
comment:13 by , 17 years ago
Cc: | added |
---|
comment:14 by , 17 years ago
I have updated this patch to trunk. I made some minor changes to some naming of variables and comments. The test suite errors with 4 errors related to data from the instance being a byte-string and not unicode. I searched through all the code and could not find why. This doesn't have anything to do with how the files are being saved or accessed? Perhaps it is just some setting on my side. Once I can grab some more time I will write up some more tests for this patch to ensure it works better.
by , 17 years ago
Attachment: | 5737_instance_caching.diff added |
---|
updated patch to #5737 with some minor edits
follow-up: 16 comment:15 by , 17 years ago
Oh and I wanted to mention that I removed some changes in areas that really don't pertain to what this ticket is trying to accomplish. While the changes were better they are not really related.
comment:16 by , 17 years ago
Patch needs improvement: | unset |
---|
Replying to Brian Rosner <brosner@gmail.com>:
Oh and I wanted to mention that I removed some changes in areas that really don't pertain to what this ticket is trying to accomplish. While the changes were better they are not really related.
Only unrelated changes (patch wise) you removed were conversion of range in delete to xrange; those might as well stay (already tweaking that area of the file).
The tests changes that were dropped (forced conversion of a few test args up front) however *are* required- run the tests, it'll hork. They're needed- that's more a sign that the misc fields are screwed up since they're not forcing unicode conversion, but that's a seperate mess only exposed by tests.
Either way, a version of this is maintained in a bzr branch at http://pkgcore.org/~ferringb/bzr/instance-caching/ ; using it for curse these days, so I keep it reasonably within trunk (week lag is usually the most).
Related to that, please don't do misc var renames when correcting for bit rot- not trying to be possessive, moreso it's extra noise when trying to nail down exactly what was updated between patch versions, and changes the api the functionality provides when kwargs are mangled.
Finally, dropping the "patch needs improvement"; not really relevant at this point (my earlier comments have long since been fixed).
by , 17 years ago
Attachment: | instance-caching-4.patch added |
---|
split against 5737; preserving test updates (thus tests pass), and xrange updates (as said, modifying 4 lines away).
by , 17 years ago
Attachment: | instance-caching-5.patch added |
---|
add in a class attr able to disable instance caching for that model.
comment:17 by , 17 years ago
Needs tests: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
This looks good, tests and everything, and would be a fantastic improvment, so I'm bumping this to RFC.
comment:18 by , 17 years ago
Needs documentation: | set |
---|---|
Patch needs improvement: | set |
Triage Stage: | Ready for checkin → Accepted |
This is a very good idea, but isn't quite ready for prime-time. Some things that need fixing:
- The class attribute feels hackish, and is badly named. I'd prefer something simpler like
Model.objects.select_related(cache=False)
-- any reason not to do this? - A note needs to be added to the docs about this behavior so that people don't get bitten by stale select related stuff.
- There's a lot of dense code here without docstrings/comments; I can barely follow what's going on. Some more explanation of what's happening would be very nice with a feature of this complexity.
comment:19 by , 17 years ago
The class attribute allows very fine-grained control over caching: you can select which classes will be cached. But a cache=False argument to select_related would be a nice complement.
comment:20 by , 17 years ago
Owner: | changed from | to
---|
by , 17 years ago
Attachment: | 17_no_tests.diff added |
---|
cleanup and comments. Not working but comments welcome
comment:21 by , 17 years ago
My patch addresses cleans up and documents the doc, and adds way more tests. It also removed unrelated changes that were present in the original patch. It's still subject to jacob's points 1 and 2, namely how to configure the cache and lack of user doc. Note that by default the cache is disabled so it's safe.
comment:22 by , 17 years ago
Keywords: | feature caching added |
---|
comment:23 by , 17 years ago
Status: | new → assigned |
---|
comment:25 by , 17 years ago
Cc: | added |
---|
comment:27 by , 17 years ago
Cc: | added |
---|
follow-up: 29 comment:28 by , 17 years ago
I was talking with PhiR in the IRC channel. I would just like to mention two issues:
- Thread safety -- for python > 2.3 we can wrap the storage in threading.local to avoid issues. Though maybe making it thread-aware could lead to other enhancements (and lots of trouble).
- Cache cleanup (garbage collection) -- Maybe we can attach a cleanup of the local cache to the response signal. This should be thought through though.
follow-up: 30 comment:29 by , 17 years ago
Replying to axiak:
I was talking with PhiR in the IRC channel. I would just like to mention two issues:
- Thread safety -- for python > 2.3 we can wrap the storage in threading.local to avoid issues. Though maybe making it thread-aware could lead to other enhancements (and lots of trouble).
Each thread gets it's own view of the cache- trying to allow instance sharing across threads means that thread 1 (which is doing an edit that will ultimately fail) is screwing with the data thread 2 (which is doing a simple view rendering) uses, meaning bad mojo... and if someone suggests locking to try and duck that, I wedgie them, that's even worse mojo ;)
- Cache cleanup (garbage collection) -- Maybe we can attach a cleanup of the local cache to the response signal. This should be thought through though.
The cache is a WeakValueDictionary... meaning weak ref'ing of the instances. This means there is no garbage collection issues- if a cycle exists to hold it in memory, it would exist without this code anyways.
comment:30 by , 17 years ago
Replying to anonymous:
Replying to axiak:
I was talking with PhiR in the IRC channel. I would just like to mention two issues:
- Thread safety -- for python > 2.3 we can wrap the storage in threading.local to avoid issues. Though maybe making it thread-aware could lead to other enhancements (and lots of trouble).
Each thread gets it's own view of the cache- trying to allow instance sharing across threads means that thread 1 (which is doing an edit that will ultimately fail) is screwing with the data thread 2 (which is doing a simple view rendering) uses, meaning bad mojo... and if someone suggests locking to try and duck that, I wedgie them, that's even worse mojo ;)
As I understand it the storage is part of the class object, which is global, no? So currently the patch does not behave like this and this will need to be addressed.
[...]
The cache is a WeakValueDictionary... meaning weak ref'ing of the instances. This means there is no garbage collection issues- if a cycle exists to hold it in memory, it would exist without this code anyways.
Ah, didn't see that.
comment:31 by , 17 years ago
If you are interested in this ticket you should check out:
http://code.djangoproject.com/wiki/DjangoSpecifications/Core/SingleInstance
If you notice something that this page doesn't mention please add it.
comment:32 by , 16 years ago
milestone: | → 1.0 |
---|---|
Version: | → queryset-refactor |
comment:33 by , 16 years ago
milestone: | 1.0 |
---|---|
Version: | queryset-refactor |
comment:34 by , 16 years ago
Cc: | added |
---|
comment:36 by , 16 years ago
Cc: | added |
---|
comment:37 by , 16 years ago
Cc: | added |
---|
comment:38 by , 16 years ago
Cc: | added |
---|
comment:39 by , 16 years ago
Cc: | removed |
---|
comment:40 by , 16 years ago
Cc: | added |
---|
comment:41 by , 16 years ago
People who follow this might be interested in http://lazypython.blogspot.com/2008/11/building-simple-identity-map-in-django.html and http://lazypython.blogspot.com/2008/12/few-more-thoughts-on-identity-mapper.html .
comment:42 by , 16 years ago
Ive worked up a quick "SingletonModel" piece of code. It allows you to define a model, by inheriting from SingletonModel, as a Singleton. It works almost identically to this patch, barring the removal of some unneeded code.
comment:43 by , 16 years ago
Just a note, django-singletons was renamed to django-idmapper: http://github.com/dcramer/django-idmapper/tree/master
My mistake for the incorrect terminology. It *only* provides an Identity Mapper, just as this patch does.
comment:44 by , 16 years ago
Cc: | added |
---|
comment:45 by , 16 years ago
Cc: | added |
---|
comment:47 by , 16 years ago
Cc: | added; removed |
---|
comment:48 by , 16 years ago
Cc: | added |
---|
comment:49 by , 15 years ago
Cc: | added |
---|
comment:50 by , 15 years ago
Cc: | added |
---|
comment:51 by , 15 years ago
Cc: | added |
---|
comment:52 by , 15 years ago
Cc: | added |
---|
comment:53 by , 14 years ago
Cc: | added |
---|
comment:54 by , 14 years ago
Cc: | added |
---|
comment:55 by , 14 years ago
Cc: | added |
---|
comment:56 by , 14 years ago
Cc: | added |
---|
comment:57 by , 14 years ago
Severity: | normal → Normal |
---|---|
Type: | defect → Cleanup/optimization |
comment:59 by , 14 years ago
Cc: | removed |
---|
comment:60 by , 13 years ago
Easy pickings: | unset |
---|---|
Resolution: | → wontfix |
Status: | assigned → closed |
UI/UX: | unset |
I've spoken to Russ, and he concurs with my decision that this would fundamentally change the ORM, and it can't be part of Django core.
However, it's been proven that this is possible (albeit tricky) in external packages -- JohhnyCache, for example. Concrete suggestions about how to make things like Johnny easier are super-welcome, but for clarity those should get fresh new tickets.
comment:61 by , 6 years ago
Do you remember the discussion you had with Russ about this ?
Is this good for re-opening ?
- The ORM is much cleaner now than 7 years ago,
- Many great ORMs have a builtin Identity Map, why can't Django's ?
Identity Map is a common programming pattern:
comment:62 by , 6 years ago
Cc: | added |
---|
Barring any gotchas, I don't think anyone would argue with this.