#34552 closed New feature (needsinfo)
Delaying get_from_clause call as much as possible.
Reported by: | Matthew Schinckel | Owned by: | Matthew Schinckel |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 4.2 |
Severity: | Normal | Keywords: | |
Cc: | Simon Charette, Florian Apolloner | Triage Stage: | Unreviewed |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I've got a couple of projects that supplement ORM functionality. One in particular is [django-shared-property](https://github.com/schinckel/django-shared-property), which allows a field to be defined on the model, that will automatically be added as an annotation, and as such can be filtered/aggregated.
However, it's also "live" in the sense that changes made after loading from the database to dependencies of the property will be reflected.
For example:
class Ordering(models.Model): a = models.IntegerField() b = models.IntegerField() class MyModel(models.Model): ord = models.ForeignKey(Ordering) @shared_property(models.F('ord__a') / models.F('ord__b')) def ordering(self): return self.ord.a / self.ord.b
In this case, you can filter/order by ordering
, but also if you changed ord.a
, then ordering
would update accordingly, without having to save the object and refetch.
Anyway, long story short, this code works amazingly well without any changes to the Django code, except for one edge case: when a shared_property refers to another model (which on it's own works), but you then filter on this shared_property (which again, works) and then do a .count()
or .exists()
.
MyModel.objects.filter(ordering__gte=10).count()
It turns out that because of the way the ORM compiler works, get_from_clause
is called before any filters are applied. Which normally works okay, but in this case results in the table not being available.
A solution I came up with was to delay the evaluation of get_from_clause
until after the where/having clauses have been processed, allowing them to flag tables that need to be referenced.
As such, I'd like to propose the change in https://github.com/django/django/pull/14683, which passes django tests (and also fixes the problem for me).
Change History (7)
comment:1 by , 18 months ago
comment:2 by , 18 months ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:3 by , 18 months ago
From Luke's links in the django-developers mailing list post, I wonder if we _should_ be building the references in the reverse order to the order-of-operations:
- DISTINCT
- SELECT
- HAVING
- GROUP BY
- WHERE
- FROM
(I'm not sure if that is a reasonable thing to do, or even necessary. Personally, I'd be super happy with just my single change being applied, but would also be happy to have a further discussion).
comment:4 by , 18 months ago
Cc: | added |
---|
comment:5 by , 18 months ago
If you need these changes it's likely because you are performing expression resolving during the compilation phase which is known to cause subtle issues with regards to join reuse and compilation in general e.g. what if your newly referenced expressions trigger aggregation or filter against window functions, now you have an improper order_by
It feels like this feature should be implementable by using QuerySet.alias
if the latter was implemented in a way that deferred any form of joining until the alias is first referenced. Am I right in the assessment that you implemented your expression resolving in as_sql
because you wanted to avoid unnecessary JOINs and that if QuerySet.alias
allowed you to stash ordering=models.F('ord__a') / models.F('ord__b')
without joining into ord__
until ordering
is referenced then you could implement the same feature by simply having shared_property
augment the managers querysets with the alias you defined?
comment:6 by , 18 months ago
Resolution: | → needsinfo |
---|---|
Status: | assigned → closed |
Marking as "needsinfo" per Simon's comment.
comment:7 by , 18 months ago
Hi Simon, thanks for your input.
It's been ages since I actually implemented it, and the resolving of the expression during the compilation phase was (IIRC) more a pragmatic thing: Field instances are not resolved (because they already belong to a model/query), instead they provide a "Col" instance to the query directly.
Normally, during a .annotate(x=expression)
, the expression is resolved at this point. However, because the query object does not exist at the point where we build the Col instance, I'm not able to force a join (if one does not exist) at that point.
In some sense part of this idea is to provide syntactic sugar for "always apply an annotation", but it's a bit more than that (in the sense that it also recalculates the value in python upon reference). As such I'm not exactly sure if being able to apply the alias to the queryset(s) would be sufficient.
Anyway, it's time to put my kids to bed; I'll try to have a bit more of a think about it later.
Also referenced briefly on https://groups.google.com/g/django-developers/c/X2XqA_S3IRA/m/5dje0pZXBwAJ