Django

Code

Ticket #5420 (assigned)

Opened 1 year ago

Last modified 2 months ago

Allow database API users to specify the fields to exclude in a SELECT statement

Reported by: adrian Assigned to: jacob (accepted)
Milestone: post-1.0 Component: Database layer (models, ORM)
Version: SVN Keywords: qs-rf
Cc: msaelices, ferringb@gmail.com, marinho, semente@taurinus.org, research@einfallsreich.net Triage Stage: Accepted
Has patch: 1 Needs documentation: 0
Needs tests: 0 Patch needs improvement: 1

Description

This one will help people use their databases more efficiently.

The Django ORM should allow users to specify a list of field names to *exclude* from a QuerySet. If a user attempts to access one of those excluded fields on the resulting model instance, the field will be loaded lazily via a separate query.

This is useful when you know you absolutely will not need to use a particular field in your template, so there's no point in SELECTing that data. This saves memory, and it saves on bandwidth between the database server and the Web server.

Example:

class Person(models.Model):
    name = models.CharField(maxlength=32)
    age = models.IntegerField()
    hometown = models.CharField(maxlength=32)
    is_cool = models.BooleanField()

# My instinct is to call this hide(), but I'm sure there's a better name for it.
>>> p = Person.objects.hide('hometown', 'is_cool').get(name='John Lennon')
>>> p.id
3
>>> p.name
u'John Lennon'

# Does a query to get "hometown", because it was hidden from the QuerySet.
# 'SELECT hometown FROM person WHERE id=3;'
>>> p.hometown
u'Liverpool'

# Does a query to get "is_cool", because it was hidden from the QuerySet.
# 'SELECT is_cool FROM person WHERE id=3;'
>>> p.is_cool
True

In the case of lazily loaded fields, the lazy loading *only* applies to the particular field. E.g., when I accessed p.hometown in the above example, it did *not* also lazily load the rest of the hidden fields ("is_cool").

We should also provide the inverse of hide() -- perhaps called expose()? -- which would take a list of field names to *include* rather than *exclude*. This would be an opt-in instead of an opt-out.

Attachments

queryset_fields_trunk.diff (29.7 kB) - added by dogada on 02/14/08 04:44:47.
QuerySet? patch: adds fields() and improves values)() methods

Change History

09/13/07 10:54:37 changed by adrian

  • needs_better_patch changed.
  • stage changed from Unreviewed to Accepted.
  • needs_tests changed.
  • needs_docs changed.

09/14/07 15:03:42 changed by durdinator

  • owner changed from nobody to durdinator.

09/14/07 17:29:23 changed by durdinator

  • keywords set to qs-rf.

09/14/07 19:44:31 changed by durdinator

  • owner changed from durdinator to nobody.

09/14/07 19:47:47 changed by durdinator

Not much point tackling this now if the queryset-refactoring work is coming soon; it'd just break whatever patch was made for this.

01/29/08 14:34:02 changed by Alex

Should the "exposes" method replace values, as I see it the main difference is that exposes seems to still return an object, whereas values returns a dictionary.

01/29/08 16:58:23 changed by msaelices

  • cc set to msaelices.

02/01/08 04:20:19 changed by ferringb@gmail.com

  • cc changed from msaelices to msaelices, ferringb@gmail.com.

02/13/08 01:18:06 changed by dogada

  • owner changed from nobody to dogada.
  • status changed from new to assigned.

02/14/08 04:44:47 changed by dogada

  • attachment queryset_fields_trunk.diff added.

QuerySet? patch: adds fields() and improves values)() methods

02/14/08 04:51:36 changed by dogada

  • has_patch set to 1.

I added a patch http://code.djangoproject.com/attachment/ticket/5420/queryset_fields_trunk.diff that implements QuerySet.fields(*fields, **related_fields) and make possible to load only some from master and related models fields. It allows to tune various object list queries when we need only limited subset of all fields, improve general performance and decrease database load. As side effect of this patch support of selecting fields from related models in QuerySet.values() is implemented too. It was changed signature of this method from values(*fields) to values(*fields, **related_fields) but the change is backward compatible.

Patch doesn't implement lazy field loading, see more details about this and other issues at: http://www.mysoftparade.com/blog/django_orm_performance_patch/

02/14/08 05:38:55 changed by mtredinnick

  • needs_better_patch set to 1.

Thanks for the patch. However, patches in this are of the code against trunk aren't very useful at the moment, since it's been hugely rewritten on the queryset-refactor branch and that is what will be merged into trunk. So if you want to tackle this problem, please write a patch against queryset-refactor. Patches against trunk aren't worth considering at this point in time, because they will be quite different from the final version.

It's a very complete patch, with tests and documentation, however there are some problems (aside from being against code that is scheduled for removal).

  1. The fields() addition probably isn't the nicest way to do this. Adrian called it exclude(), although probably defer() is a better name. You just pass this function a list of fields to exclude. Your fields() call seems to take more than just that: filters and all sorts of things. I think you've gone a little too closely to the raw SQL, rather than allowing the user to specify another modifier on the QuerySet (a list of fields not to include when the query is eventually executed). Your examples don't look as clear as the alternative approach.
  2. The fields we don't immediately load should be deferred (lazy-loaded), not omitted altogether. So when you access those attributes, they should be loaded transparently. This means you can load with some fields being deferred, but still safely pass that object around everywhere and if something else does need to access one of the expensive fields, it will be loaded on demand.
  3. There's a big change to ValuesQuerySet in here which seems unnecessary. This really shouldn't be touching that at all (it looks like you might be trying to use some common stuff, but really the commonality is in the normal QuerySet loading path). ValueQuerySets already have a way to not load particular fields (you just don't list them). No changes needed there.
  4. I'm fairly sure Jacob Kaplan-Moss is a fair way along writing the full version of the functionality needed by this ticket for queryset-refactor. So if you'd like to just wait a little bit, we should have something in-tree shortly that implements this particular feature.

So, thanks a lot (seriously!) for taking the time to write the patch. However, in its current form this won't be applied. As mentioned, though, I believe Jacob's pretty close to having something that works the way we're after, so hold tight and you won't have to write it all again.

02/14/08 10:29:24 changed by dogada

fields() is brother of values() - it share same concept and it's easy to switch output format from model instances to dictionaries and vise versa. IMHO it's better than have in API 2 different methods like show() and hide(). Also if you will use show() and hide() for model fields and values(*args) for dictionary fields it will look a bit strange.

I created a patch for our project because we have serious performance problems with generated by current Django ORM SQL-queries and just share it because I think my patch makes Django ORM much more effective than it is now. We can use our patch and wait for the alternative solution from Jacob Kaplan-Moss, but please make lazy loading of fields optional or by-request, because Django model instances already have signals-related overhead and adding new features to the models may shortly make Django models similar to slow J2EE Entity beans. Thanks.

02/14/08 10:31:42 changed by dogada

  • owner deleted.
  • status changed from assigned to new.

05/07/08 16:27:50 changed by dcramer

http://groups.google.com/group/django-developers/browse_thread/thread/2d4a8f5a4b399ed0

I threw this up the other day. I much prefer an extension to values of some sort (as really, it means the same thing) than new methods for hide/expose. If nothing else, a .fields(), but then we're just replicating what values does (but with partial objects instead of a dictionary).

06/06/08 14:49:44 changed by marinho

  • cc changed from msaelices, ferringb@gmail.com to msaelices, ferringb@gmail.com, marinho.

06/06/08 15:24:45 changed by Guilherme M. Gondim <semente@taurinus.org>

  • cc changed from msaelices, ferringb@gmail.com, marinho to msaelices, ferringb@gmail.com, marinho, semente@taurinus.org.

06/08/08 12:51:27 changed by jacob

  • owner set to jacob.
  • status changed from new to assigned.

06/16/08 14:37:22 changed by jacob

  • milestone set to 1.0.

07/01/08 07:27:02 changed by anonymous

  • cc changed from msaelices, ferringb@gmail.com, marinho, semente@taurinus.org to msaelices, ferringb@gmail.com, marinho, semente@taurinus.org, research@einfallsreich.net.

08/22/08 17:40:04 changed by ubernostrum

  • milestone changed from 1.0 to post-1.0.

Since this is a feature request, and we're past the feature-freeze point for 1.0... punt.


Add/Change #5420 (Allow database API users to specify the fields to exclude in a SELECT statement)




Change Properties
Action