Opened 3 years ago

Last modified 2 months ago

#28822 new New feature

Add DBCalculatedField to model to annotate models automatically

Reported by: Ilya Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: Ryan Hiebert, Shai Berger, Roberto Maurizzi, Matthijs Kooijman Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Ilya)

Models are ultimate source of business knowledge. We must encourage people to use them for it instead of spread logic over views, managers
and "utility" methods.

We have customer model and there is knowledge: People are allowed to drink when they are 18+

So, we implement this logic in model to use it in templates and other tools:

class Customer(models.Model):
    first_name = models.CharField(max_length=255)
    last_name = models.CharField(max_length=255)
    age = models.IntegerField()

    def allowed_to_drink(self):
        return self.age >= 18

Cool, but what if want to query database for only people who are allowed to drink?


We now have this logic written twice in two different places.

One can use managers with annotate:

class DrinkersManager(models.Manager):
    def get_queryset(self):
        return models.query.QuerySet(self.model, using=self._db).annotate(
            allowed_to_drink=ExpressionWrapper(Q(age__gt=18), models.BooleanField()))

class Customer(models.Model):
    first_name = models.CharField(max_length=255)
    last_name = models.CharField(max_length=255)
    age = models.IntegerField()

    objects = DrinkersManager()

We now can do both: use it as .filter(allowed_to_drink=False) and use it for templates: customer.allowed_to_drink.

But why do we define all "physical" fields in model and "calculated" field as keyword argument (!) inside of different (!!) class?
Why do we need class at all?

What we suggest:

class Customer(models.Model):
    first_name = models.CharField(max_length=255)
    last_name = models.CharField(max_length=255)
    age = models.IntegerField()
    allowed_to_drink = models.DBCalculatedField(Q(age__gt=18), models.BooleanField())

You just add this field and all queries to this model are annotated automatically, so you will have model field and query field.

I believe this syntax is much more clear and we have consensus about in on group:!topic/django-developers/ADSuUUuZp3Q
We may also have

    full_name = models.DBCalculatedField(F('first_name') + ' ' + F('last_name'), models.CharField()) 

And for local calculation (may be used by people who want to use this logic without of db or before saving)

    allowed_to_drink = models.DBCalculatedField(Q(age__gt=18), models.BooleanField(), local=lambda c: c.age > 18) 

# or 
def allowed_to_drink_local(self):
    return self.age > 18

Since knowledge is expressed by Django expression language it is possible to generate "local calculation" automatically
(you just need to evalute this simple language), but many people in group believe it is not safe since DB may use different logic which may be
hard to mimic (expecially in database-agnostic way). Tool for "automatic local calculation" may be created as external lib, not part of Django itself (one tool for each database probably)

Change History (7)

comment:1 Changed 3 years ago by Ilya

Description: modified (diff)

comment:2 Changed 3 years ago by Ryan Hiebert

Cc: Ryan Hiebert added

comment:3 Changed 3 years ago by Shai Berger

Cc: Shai Berger added
Triage Stage: UnreviewedAccepted

I don't really agree with the description's opening statement, that as much logic as possible should live within models. But the feature, as described above, has gained support and the design as described above is mostly in consensus.

The issue with safety of local calculation is about edge cases. Examples include different precisions and rounding rules for numeric calculations; different timezones between web-server and database-server for date calculations; empty string treated as null on Oracle, while '' != None in Python; locale (which, on MySql for example, can be defined per column) affecting the result of string order comparisons on the database but not in Python, and probably more. I believe that it is very easy to be naive about this, and this naivety may cause hard-to-debug data corruptions, so I'd really like the naming in the API to push this point (e.g. by using a name like estimate instead of local above).

comment:4 Changed 3 years ago by Ilya

Looks like #28826 needs to be fixed first.

Last edited 3 years ago by Tim Graham (previous) (diff)

comment:5 Changed 15 months ago by Roberto Maurizzi

Cc: Roberto Maurizzi added

comment:6 Changed 5 months ago by Matthijs Kooijman

Cc: Matthijs Kooijman added

The idea in this issue has been bouncing around in my head as well for a while. I'm not entirely sure if *all* annotations should *always* be enabled, as suggested here, but I do see great merit in having annotations defined declaratively outside of a specific queryset instance. Because:

  • These annotations can be enabled using an automatic mechanism on a queryset (e.g. qs.add_annotation('foo')).
  • These annotations can be automatically enabled by using them in a filter, another annotation, etc., e.g. qs.filter(foo__gt=1) would load the foo annotation automatically, instead of having to do qs.add_foo().filter(foo__gt=1).
  • These annotations can also be explicitly referenced and resolved to their query expression, for use as part of other expressions, queries, etc. I'm not entirely sure how this could be used yet, but I have the suspicion that making annotations named objects (outside of model instances) would allow for more reuse and composition.

For the last two points, I can see particular merit in situations where you cannot currently specify a custom queryset, such as cross-relation lookups, conditions on constraints, aggregrations, FilteredRelation, etc.

I originally also saw great merit in being able to calculate annotations in Python if they were not calculated during the query already (since this, in some cases, can remove the need to decide which annotations are needed beforehand). However, as extensively debated in the thread linked above this would have problems wrt performance (e.g. having to do subqueries or joins in Python) and correctness (subtle differences in database and Python handling of calculations), so I'm not so sure whether this would be feasible (but also less needed, if managing annotations is easier).

In the mailing list thread above, SQLAlchemy's "[Hybrid Attributes](" were also mentioned, which fulfill a similar use. They are actually quite interesting, but also quite different from how Django does things. Hybrid attributes are like property getter methods, which work (I think) as normal python code on an instance, but can also be accessed on the class to return an ORM query expression equivalent with the python code in the method, to be used in e.g. queries. It looks like they do some kind of DSL-like evaluation, or maybe even AST-analysis of the python code for this conversion, pretty nifty. The local-vs-db problem seems to be handled by defaulting to an automatic conversion, but allow overriding the db expression for more complicated cases. They even allow writing such hybrid attributes (with some additional declarations), which is also nice. Regardless, this does not seem like something that is easily implemented for Django, or fits the current ORM structure well, so probably not so useful here and now.

Last edited 5 months ago by Matthijs Kooijman (previous) (diff)

comment:7 Changed 2 months ago by Cefas Rodrigues Freire

There still a community interest around this topic?

The idea of this ticket seems great to me. In fact, I currently started to implement the described functionality.

I wonder though if a possible patch would be accepted. So I can continue to code without no doubt of time-wasting.

Note: See TracTickets for help on using tickets.
Back to Top