Opened 2 days ago

Last modified 12 hours ago

#35575 assigned New feature

Add support for constraint validation on GeneratedFields

Reported by: Mark Gensler Owned by: Mark Gensler
Component: Database layer (models, ORM) Version: 5.0
Severity: Normal Keywords: generatedfield uniqueconstraint checkconstraint
Cc: Mariusz Felisiak, Lily Foote, Simon Charette Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Mark Gensler)

GeneratedFields on a model instance are not recalculated during Model.full_clean(). Therefore, if a GeneratedField is included in the *expressions or fields parameters of a UniqueConstraint or CheckConstraint in the instance's Model._meta, the new values of the GeneratedField attribute are not used by BaseConstraint.validate().

Instead, the model instance's GeneratedField attribute will return the value which corresponds to the most recent refresh of the instance from the database. If the instance is new and has not yet been saved, the following error is raised:

AttributeError: Cannot read a generated field from an unsaved model.

An IntegrityError is correctly raised when calling Model.save() if any constraints involving GeneratedFields are violated.

Instead, the recalculated values for GeneratedFields should be used when checking constraints.

This ticket was raised after fixing the bug in #35560 with PR https://github.com/django/django/pull/18309.

Change History (10)

comment:1 by Mark Gensler, 2 days ago

I'm happy to start working on this. I will raise a PR and add failing tests to catch the bug shortly.

Before writing any code to address this, I'd like to discuss where best to apply the fix. I see two possible options:

  1. Recalculate GeneratedField attribute values on the fly as part of calling Constraint.validate(). This will leave the instance attributes unchanged until Model.save() is called.
  2. Recalculate GeneratedField attributes on the model instance during Model.full_clean(), prior to calls to Constraint.validate(), and irrespective of whether relevant constraints are present.

The second option would allow further checks against the new GeneratedField value of the instance to be made during validation. On the other hand, the first option would leave the instance's attribute returning the value currently returned/stored by the database.

Which would be best?

comment:2 by Mark Gensler, 2 days ago

Description: modified (diff)

comment:3 by Mark Gensler, 2 days ago

Description: modified (diff)

in reply to:  1 ; comment:4 by Sarah Boyce, 2 days ago

Cc: Mariusz Felisiak Lily Foote Simon Charette added
Summary: Model.full_clean() does not recalculate GeneratedFields prior to validating model constraintsAdd support for constraint validation on GeneratedFields
Triage Stage: UnreviewedAccepted
Type: BugNew feature

Replying to Mark Gensler:

Before writing any code to address this, I'd like to discuss where best to apply the fix. I see two possible options:

  1. Recalculate GeneratedField attribute values on the fly as part of calling Constraint.validate(). This will leave the instance attributes unchanged until Model.save() is called.
  2. Recalculate GeneratedField attributes on the model instance during Model.full_clean(), prior to calls to Constraint.validate(), and irrespective of whether relevant constraints are present.

My gut says option 1

in reply to:  4 comment:5 by Mark Gensler, 2 days ago

Replying to Sarah Boyce:

My gut says option 1

Yes, that seems the simpler approach which directly addresses this problem and no more. Also from the definition of a GeneratedField in the docs:

A field that is always computed based on other fields in the model. This field is managed and updated by the database itself.

I'll work on option 1.

comment:6 by Simon Charette, 2 days ago

I feel like we should avoid the naive approach of calling refresh_from_db(fields=generated_fields) as that will mutate the in-memory instance and leave it in this form as well as perform an extra query per constraint.

What should be done IMO is for the query performed in UniqueConstraint to call replace_expressions({F(gfield.name): gfield.expression.replace_expressions(....) for gfield in generated_field}) so that will make

class Contributor(models.Model):
    first_name = models.TextField()
    last_name = models.TextField()
    full_name = models.GeneratedField(
        Lower(Concat("first_name", models.Value(" "), "last_name"))
    )

    class Meta:
        constraints = {
            UniqueConstraint(names="unique_full_name", MD5("full_name"))
        }

Contributor(first_name='Mark', last_name='Gensler').full_clean()

result in the following the query, assuming first_name and last_name are available, would be

SELECT 1 FROM contributor
WHERE md5(lower("first_name" || ' ' || "last_name")) = md5(lower('Mark' || ' ' || 'Gensler'))

The more we push to the database the less likely we are to run into race conditions and serde roudtrip issues.

Note that ExclusionConstraint is affected as well and that the logic added here will need to be made so constraints that include generated fields or have expressions with references to generated fields that themselves references exclude when provided will also need to skip validate. See the whole flatten logic for reference which that likely need to be updated to have F("some_generated_field") flattened in a way that returns all the fields referenced by it.

Last edited 2 days ago by Simon Charette (previous) (diff)

comment:7 by Sarah Boyce, 39 hours ago

I also think it's worth highlighting that this has come from us testing a PR and not "organically". I don't think this is a quick win and we should probably confirm that this is wanted, otherwise it might be best to document that it isn't supported.

in reply to:  7 comment:8 by Mark Gensler, 34 hours ago

Replying to Sarah Boyce:

I also think it's worth highlighting that this has come from us testing a PR and not "organically". I don't think this is a quick win and we should probably confirm that this is wanted, otherwise it might be best to document that it isn't supported.

How would we do that? Shall I raise a discussion on the django forum?

I don't see any gap in pure functionality at present. Any expression which is defined for a GeneratedField could also be written directly as an input to the CheckConstraint / UniqueConstraint / ExclusionConstraint, rather than by using the GeneratedField as shorthand for that expression. This change would simply allow a more concise and readable definition of the constraint and assist with DRY. Or is there another advantage which I'm missing?

If this isn't required, the only change necessary would be to prevent a GeneratedField appearing in the expression of any constraints. Otherwise a stale value would be returned for the GeneratedField attribute of an instance during constraint.validate(). This wouldn't be strictly backwards compatible, but users could re-write any existing constraints which use a GeneratedField.

*EDIT* Actually rather than preventing GeneratedFields from appearing in constraints, the system could call checks.Warning().

Replying to Simon Charette:

Thanks for the examples and for raising ExclusionConstraint, I had missed that. I'll investigate a solution using the outline you provided.

Last edited 34 hours ago by Mark Gensler (previous) (diff)

comment:9 by Simon Charette, 32 hours ago

FWIW I gave a shot at what I described above to ensure that it was implementable and not too invasive and it was relatively straightforward. I don't see why we shouldn't support constraints over GeneratedField knowing that.

in reply to:  9 comment:10 by Mark Gensler, 12 hours ago

Replying to Simon Charette:

FWIW I gave a shot at what I described above to ensure that it was implementable and not too invasive and it was relatively straightforward. I don't see why we shouldn't support constraints over GeneratedField knowing that.

Thanks Simon, I will use what you've done as a starting point for the PR.

Note: See TracTickets for help on using tickets.
Back to Top