Opened 5 years ago

Last modified 8 months ago

#28822 new New feature

Add DBCalculatedField to model to annotate models automatically

Reported by: Ilya Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: Ryan Hiebert, Shai Berger, Roberto Maurizzi, Matthijs Kooijman, Diego Lima Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Ilya)

Models are ultimate source of business knowledge. We must encourage people to use them for it instead of spread logic over views, managers
and "utility" methods.

We have customer model and there is knowledge: People are allowed to drink when they are 18+

So, we implement this logic in model to use it in templates and other tools:

class Customer(models.Model):
    first_name = models.CharField(max_length=255)
    last_name = models.CharField(max_length=255)
    age = models.IntegerField()

    @property
    def allowed_to_drink(self):
        return self.age >= 18

Cool, but what if want to query database for only people who are allowed to drink?

models.Customer.objects.filter(age__gt=18)

We now have this logic written twice in two different places.

One can use managers with annotate:

class DrinkersManager(models.Manager):
    def get_queryset(self):
        return models.query.QuerySet(self.model, using=self._db).annotate(
            allowed_to_drink=ExpressionWrapper(Q(age__gt=18), models.BooleanField()))


class Customer(models.Model):
    first_name = models.CharField(max_length=255)
    last_name = models.CharField(max_length=255)
    age = models.IntegerField()

    objects = DrinkersManager()

We now can do both: use it as .filter(allowed_to_drink=False) and use it for templates: customer.allowed_to_drink.

But why do we define all "physical" fields in model and "calculated" field as keyword argument (!) inside of different (!!) class?
Why do we need class at all?

What we suggest:

class Customer(models.Model):
    first_name = models.CharField(max_length=255)
    last_name = models.CharField(max_length=255)
    age = models.IntegerField()
    allowed_to_drink = models.DBCalculatedField(Q(age__gt=18), models.BooleanField())


You just add this field and all queries to this model are annotated automatically, so you will have model field and query field.

I believe this syntax is much more clear and we have consensus about in on group:
https://groups.google.com/forum/#!topic/django-developers/ADSuUUuZp3Q
We may also have

    full_name = models.DBCalculatedField(F('first_name') + ' ' + F('last_name'), models.CharField()) 

And for local calculation (may be used by people who want to use this logic without of db or before saving)

    allowed_to_drink = models.DBCalculatedField(Q(age__gt=18), models.BooleanField(), local=lambda c: c.age > 18) 

# or 
@allowed_to_drink.local
def allowed_to_drink_local(self):
    return self.age > 18

Since knowledge is expressed by Django expression language it is possible to generate "local calculation" automatically
(you just need to evalute this simple language), but many people in group believe it is not safe since DB may use different logic which may be
hard to mimic (expecially in database-agnostic way). Tool for "automatic local calculation" may be created as external lib, not part of Django itself (one tool for each database probably)

Change History (9)

comment:1 Changed 5 years ago by Ilya

Description: modified (diff)

comment:2 Changed 5 years ago by Ryan Hiebert

Cc: Ryan Hiebert added

comment:3 Changed 5 years ago by Shai Berger

Cc: Shai Berger added
Triage Stage: UnreviewedAccepted

I don't really agree with the description's opening statement, that as much logic as possible should live within models. But the feature, as described above, has gained support and the design as described above is mostly in consensus.

The issue with safety of local calculation is about edge cases. Examples include different precisions and rounding rules for numeric calculations; different timezones between web-server and database-server for date calculations; empty string treated as null on Oracle, while '' != None in Python; locale (which, on MySql for example, can be defined per column) affecting the result of string order comparisons on the database but not in Python, and probably more. I believe that it is very easy to be naive about this, and this naivety may cause hard-to-debug data corruptions, so I'd really like the naming in the API to push this point (e.g. by using a name like estimate instead of local above).

comment:4 Changed 5 years ago by Ilya

Looks like #28826 needs to be fixed first.

Last edited 5 years ago by Tim Graham (previous) (diff)

comment:5 Changed 3 years ago by Roberto Maurizzi

Cc: Roberto Maurizzi added

comment:6 Changed 2 years ago by Matthijs Kooijman

Cc: Matthijs Kooijman added

The idea in this issue has been bouncing around in my head as well for a while. I'm not entirely sure if *all* annotations should *always* be enabled, as suggested here, but I do see great merit in having annotations defined declaratively outside of a specific queryset instance. Because:

  • These annotations can be enabled using an automatic mechanism on a queryset (e.g. qs.add_annotation('foo')).
  • These annotations can be automatically enabled by using them in a filter, another annotation, etc., e.g. qs.filter(foo__gt=1) would load the foo annotation automatically, instead of having to do qs.add_foo().filter(foo__gt=1).
  • These annotations can also be explicitly referenced and resolved to their query expression, for use as part of other expressions, queries, etc. I'm not entirely sure how this could be used yet, but I have the suspicion that making annotations named objects (outside of model instances) would allow for more reuse and composition.

For the last two points, I can see particular merit in situations where you cannot currently specify a custom queryset, such as cross-relation lookups, conditions on constraints, aggregrations, FilteredRelation, etc.

I originally also saw great merit in being able to calculate annotations in Python if they were not calculated during the query already (since this, in some cases, can remove the need to decide which annotations are needed beforehand). However, as extensively debated in the thread linked above this would have problems wrt performance (e.g. having to do subqueries or joins in Python) and correctness (subtle differences in database and Python handling of calculations), so I'm not so sure whether this would be feasible (but also less needed, if managing annotations is easier).

In the mailing list thread above, SQLAlchemy's "[Hybrid Attributes](https://docs.sqlalchemy.org/en/13/orm/extensions/hybrid.html)" were also mentioned, which fulfill a similar use. They are actually quite interesting, but also quite different from how Django does things. Hybrid attributes are like property getter methods, which work (I think) as normal python code on an instance, but can also be accessed on the class to return an ORM query expression equivalent with the python code in the method, to be used in e.g. queries. It looks like they do some kind of DSL-like evaluation, or maybe even AST-analysis of the python code for this conversion, pretty nifty. The local-vs-db problem seems to be handled by defaulting to an automatic conversion, but allow overriding the db expression for more complicated cases. They even allow writing such hybrid attributes (with some additional declarations), which is also nice. Regardless, this does not seem like something that is easily implemented for Django, or fits the current ORM structure well, so probably not so useful here and now.

Last edited 2 years ago by Matthijs Kooijman (previous) (diff)

comment:7 Changed 2 years ago by Cefas Rodrigues Freire

There still a community interest around this topic?

The idea of this ticket seems great to me. In fact, I currently started to implement the described functionality.

I wonder though if a possible patch would be accepted. So I can continue to code without no doubt of time-wasting.

comment:8 Changed 16 months ago by Diego Lima

Cc: Diego Lima added

comment:9 Changed 8 months ago by Matthijs Kooijman

I found one more case where this feature would be tremendously useful: Admin filtering on annotations. Currently, you can very easily filter on fields in the admin, using ModelAdmin.list_filter, but to filter on an annotation, you need to implement your own SimpleListFilter that handles everything, potentially duplicating a lot of code that is already in FieldListFilter and its subclasses (I tried some wrapping and using FieldListFilter.create passing a field instance, which worked for a choice field but failed for AllValuesFieldListFilter because it insists looking up the reverse field path for the filtered field).

If DBCalculatedFields (or something similar) would be implemented, it would become possible (possibly with some minor changes to the admin list filter code, but maybe not even that) to just specify the name of the annotion inModelAdmin.list_filter and have a filter interface just like for a real field.

Note: See TracTickets for help on using tickets.
Back to Top