Code

Opened 3 years ago

Last modified 4 days ago

#16735 assigned New feature

Queryset values should be aliasable

Reported by: alex.latchford@… Owned by: nate_b
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: queryset, alias, values
Cc: django@…, bendavis78 Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Student.objects.all().values('name', 'mother__name', 'class__teacher__name')
> QD {'name': 'Freddie', 'mother__name': 'Helen', 'class__teacher__name': 'Mr Williams'}

This sort of query doesn't quite demonstrate the problem fully, but it's the best example I can think of for now. Essentially I'd like to be able to alias these values in such a fashion that they are easily distinguishable or able to be shortened..

I envisage something like this..

Student.objects.all().values(student_name='name', mother_name='mother__name', teacher_name='class__teacher__name')
> QD {'student_name': 'Freddie', 'mother_name': 'Helen', 'teacher_name': 'Mr Williams'}

You should also be able to leave arguments if the standard name will suffice..

Many thanks,
Alex

Attachments (2)

column_alias.diff (33.0 KB) - added by nate_b 2 years ago.
16735-2.patch (33.0 KB) - added by nate_b 2 years ago.
Corrected an oversight when aliasing columns - previous patch would alias columns generically when it didn't need to.

Download all attachments as: .zip

Change History (20)

comment:1 Changed 3 years ago by aaugustin

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

Interesting idea. From a quick inspection of the source (ValuesQuerySet) this looks doable.

Changed 2 years ago by nate_b

comment:2 Changed 2 years ago by nate_b

  • Has patch set
  • Owner changed from nobody to nate_b
  • Status changed from new to assigned

This patch is the most direct way I could see of adding this feature.

It passes all run tests in ./runtests.py --settings=test_sqlite

I tried to be careful, but if it is missing anything, I'd be happy to take another look at it.

comment:3 Changed 2 years ago by lrekucki

  • Patch needs improvement set

While adding this to values() is perfectly fine, doing the same with values_list() feels weird. Also, it's buggy:

# from ValuesListQuerySet in the patch
fields = list(self._fields) + self._aliased_fields.keys()

This makes the order of values in returned tuples depend on order of dictionary keys, which is undefined. Just try:

print Model.objects.values_list(a2="field", a3="other_field")
# on Python 2.7 and 3.2 values should be reversed.
Last edited 2 years ago by lrekucki (previous) (diff)

Changed 2 years ago by nate_b

Corrected an oversight when aliasing columns - previous patch would alias columns generically when it didn't need to.

comment:4 Changed 2 years ago by nate_b

  • Triage Stage changed from Accepted to Design decision needed

Ostensibly, I agree with you - it's a bit odd, and I don't presume to know why one would use it. Initially, when I added that to values_list(), it was to "harmonize" it with the features added to values(). However, similar behavior results when you annotate or aggregate on a values_list() call - the aliased names are added in the order of the keys.

So, I would propose one of three solutions:

  1. Remove aliased names in the values_list() call entirely, replaced with a dummy empty dictionary.
  2. Force the sorting of the aliases, presumably alphabetically; this would also suggest repairing this defect with annotations and aggregations, which may not be quite so simple.
  3. Leave as is in my updated patch, with indeterminate order returned.

I will be happy to implement which ever one seems best to a core developer. As such, I'm setting this to DDN.

comment:5 Changed 2 years ago by akaariai

What is the rationale of doing this for values_list()? If I am not mistaken, you will get a list back anyways. So the aliases are used for what?

The .annotate() + values_list() seems to be a bug. Basically, multiple annotates in one .annotate() call have indeterminate order, which results in indeterminate order in the values_list(). I bet some users will be hit if/when Python randomizes the hashing algorithm. So, indeterminate order is not good. Unfortunately I don't see a fix other than disallowing multiple annotates in one call. Which is backwards incompatible.

In addition, some benchmark that this doesn't slow down fetching large sets of objects from the DB is needed. .values() is mostly an optimization, so it should remain as fast as possible. Maybe django-bench contains some benchmark already?

comment:6 follow-up: Changed 22 months ago by UloPe

  • Cc django@… added

I would disagree with the observation that .values() is just an optimization. In combination with .annotate() it is vital if you need to annotate / group by multiple fields. This is also one of the use cases where the missing aliases are most painful.

Example:

>>> Region.objects.values(
    "name", 
    "orders__items__product__category__parent__name"
).annotate(quantity = Sum("orders__items__quantity"))
[
    {'quantity': 10, 'name': u'...', 'orders__items__product__category__parent__name': u'Something'},
    {'quantity': 20, 'name': u'...', 'orders__items__product__category__parent__name': u'Something else'},
]

comment:7 Changed 20 months ago by jrs_66@…

I'm guessing that the seemingly trivial, and widely used, concept of aliasing values() lists is still not possible? Ever try a union of self joined tables? Is there any hope of this being put in place in the future?

comment:8 in reply to: ↑ 6 ; follow-up: Changed 17 months ago by Karthik

Replying to UloPe:

I would disagree with the observation that .values() is just an optimization. In combination with .annotate() it is vital if you need to annotate / group by multiple fields. This is also one of the use cases where the missing aliases are most painful.

Example:

>>> Region.objects.values(
    "name", 
    "orders__items__product__category__parent__name"
).annotate(quantity = Sum("orders__items__quantity"))
[
    {'quantity': 10, 'name': u'...', 'orders__items__product__category__parent__name': u'Something'},
    {'quantity': 20, 'name': u'...', 'orders__items__product__category__parent__name': u'Something else'},
]

I agree. I'm reading this thread because I have this exact same problem at the moment.

comment:9 follow-up: Changed 13 months ago by aaugustin

  • Triage Stage changed from Design decision needed to Accepted

Unless I missed something, there isn't any objection to adding this feature to values.

comment:10 in reply to: ↑ 8 Changed 11 months ago by erik.telepovsky

Replying to Karthik:

Replying to UloPe:

I would disagree with the observation that .values() is just an optimization. In combination with .annotate() it is vital if you need to annotate / group by multiple fields. This is also one of the use cases where the missing aliases are most painful.

Example:

>>> Region.objects.values(
    "name", 
    "orders__items__product__category__parent__name"
).annotate(quantity = Sum("orders__items__quantity"))
[
    {'quantity': 10, 'name': u'...', 'orders__items__product__category__parent__name': u'Something'},
    {'quantity': 20, 'name': u'...', 'orders__items__product__category__parent__name': u'Something else'},
]

I agree. I'm reading this thread because I have this exact same problem at the moment.

I agree as well. It is important to have alias functionality in values() method.

comment:11 Changed 11 months ago by akaariai

Quick observation: it might be better to have the aliasing feature as separate queryset operation:

Region.objects.alias(
    orders__items__product__category__parent__name="parentname"
).values(
    "name", "parentname"
).annotate(quantity=Sum("orders__items__quantity"))

Why this way? There are a couple other places where aliasing could be useful. For example in multijoin situations you could explicitly define if you want to filter the same join or different join by using the same alias or two different aliases (currently the way is "if filtered in same .filter() condition, then same joins, else different joins). Also, this way you might be able to inject extra SQL directly into the query:

Region.objects.alias(
    somesql=RawSQL("case when somecol > 0 then 1 else -1 end")
).order_by('somesql')

Of course, this ticket should not try to do more than just the bare minimum to get the aliasing to work for .values(). The point is aliasing could be useful in other operations, too, so lets be prepared for that.

comment:12 in reply to: ↑ 9 Changed 4 months ago by Wraithan

Replying to aaugustin:

Unless I missed something, there isn't any objection to adding this feature to values.

So my understanding is that this ticket is being held back by touching values_list as well as values. So a patch only affecting values (as well as docs and tests of course) is what is needed to get accepted?

comment:13 Changed 4 months ago by russellm

@Wraithan As far as I can make out, yes. I can't see any benefit to having API consistency between values() and values_list() - if only because values_list() already accepts a keyword argument, which introduces a whole world of pain in the API (what if you want an alias called flat?).

The "values() is an optimisation" argument doesn't hold water for me. Yes, it's an optimisation. The source of the optimisation is passing less information on the wire during the database query (i.e., only returning two of 15 fields). An alias is already being used for this operation - it just isn't user specified. Making it user specified is a single dictionary lookup in a couple of key locations is a minor change in implementation with huge usability benefits.

So - clean up the patch and we can get this into trunk.

comment:14 Changed 4 months ago by mjtamlyn

Agreed this should be possible.

As a side note Russ - I've found the main optimisation for using values was in fact avoiding calling Model.__init__ a few thousand times, not the lack of data on the wire.

comment:15 Changed 8 weeks ago by django@…

I've been wanting something like this for a while. Just another use case (when sending emails with pre-defined, admin-editable text.

text = Settings.objects.get(name='email_text').value  # text is something like "Hi {name}, you are {age} years old. You work at {job}"

email_texts = [text.format(**kw) for kw in Person.objects.all().values(name='name', age='age', job="job__name")]

comment:16 Changed 5 weeks ago by bendavis78

  • Cc bendavis78 added
  • Version changed from 1.3 to master

I've created a branch on github for a fix against the latest version of django: https://github.com/bendavis78/django/tree/issues/16735

The patch can be found here:
https://github.com/bendavis78/django/commit/cdff83bed850a631e7c6d3cb12359b9b1d3e9bc4

This patch only changes values() and not values_list().

comment:17 Changed 4 days ago by paveluc.alexandr@…

Will this patch be included in next Django release?

comment:18 Changed 4 days ago by charettes

Unfortunately this feature didn't make it to the 1.7.x branch before it was feature frozen.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as assigned
The owner will be changed from nate_b to anonymous. Next status will be 'assigned'
The ticket will be disowned. Next status will be 'new'
as The resolution will be set. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.