Opened 3 months ago

Last modified 3 months ago

#28822 new New feature

Add DBCalculatedField to model to annotate models automatically

Reported by: Ilya Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: Ryan Hiebert, Shai Berger Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Ilya)

Models are ultimate source of business knowledge. We must encourage people to use them for it instead of spread logic over views, managers
and "utility" methods.

We have customer model and there is knowledge: People are allowed to drink when they are 18+

So, we implement this logic in model to use it in templates and other tools:

class Customer(models.Model):
    first_name = models.CharField(max_length=255)
    last_name = models.CharField(max_length=255)
    age = models.IntegerField()

    def allowed_to_drink(self):
        return self.age >= 18

Cool, but what if want to query database for only people who are allowed to drink?


We now have this logic written twice in two different places.

One can use managers with annotate:

class DrinkersManager(models.Manager):
    def get_queryset(self):
        return models.query.QuerySet(self.model, using=self._db).annotate(
            allowed_to_drink=ExpressionWrapper(Q(age__gt=18), models.BooleanField()))

class Customer(models.Model):
    first_name = models.CharField(max_length=255)
    last_name = models.CharField(max_length=255)
    age = models.IntegerField()

    objects = DrinkersManager()

We now can do both: use it as .filter(allowed_to_drink=False) and use it for templates: customer.allowed_to_drink.

But why do we define all "physical" fields in model and "calculated" field as keyword argument (!) inside of different (!!) class?
Why do we need class at all?

What we suggest:

class Customer(models.Model):
    first_name = models.CharField(max_length=255)
    last_name = models.CharField(max_length=255)
    age = models.IntegerField()
    allowed_to_drink = models.DBCalculatedField(Q(age__gt=18), models.BooleanField())

You just add this field and all queries to this model are annotated automatically, so you will have model field and query field.

I believe this syntax is much more clear and we have consensus about in on group:!topic/django-developers/ADSuUUuZp3Q
We may also have

    full_name = models.DBCalculatedField(F('first_name') + ' ' + F('last_name'), models.CharField()) 

And for local calculation (may be used by people who want to use this logic without of db or before saving)

    allowed_to_drink = models.DBCalculatedField(Q(age__gt=18), models.BooleanField(), local=lambda c: c.age > 18) 

# or 
def allowed_to_drink_local(self):
    return self.age > 18

Since knowledge is expressed by Django expression language it is possible to generate "local calculation" automatically
(you just need to evalute this simple language), but many people in group believe it is not safe since DB may use different logic which may be
hard to mimic (expecially in database-agnostic way). Tool for "automatic local calculation" may be created as external lib, not part of Django itself (one tool for each database probably)

Change History (4)

comment:1 Changed 3 months ago by Ilya

Description: modified (diff)

comment:2 Changed 3 months ago by Ryan Hiebert

Cc: Ryan Hiebert added

comment:3 Changed 3 months ago by Shai Berger

Cc: Shai Berger added
Triage Stage: UnreviewedAccepted

I don't really agree with the description's opening statement, that as much logic as possible should live within models. But the feature, as described above, has gained support and the design as described above is mostly in consensus.

The issue with safety of local calculation is about edge cases. Examples include different precisions and rounding rules for numeric calculations; different timezones between web-server and database-server for date calculations; empty string treated as null on Oracle, while '' != None in Python; locale (which, on MySql for example, can be defined per column) affecting the result of string order comparisons on the database but not in Python, and probably more. I believe that it is very easy to be naive about this, and this naivety may cause hard-to-debug data corruptions, so I'd really like the naming in the API to push this point (e.g. by using a name like estimate instead of local above).

comment:4 Changed 3 months ago by Ilya

Looks like #28826 needs to be fixed first.

Last edited 3 months ago by Tim Graham (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top