Opened 5 months ago

Closed 4 months ago

#35575 closed New feature (fixed)

Add support for constraint validation on GeneratedFields

Reported by: Mark Gensler Owned by: Mark Gensler
Component: Database layer (models, ORM) Version: 5.0
Severity: Normal Keywords: generatedfield uniqueconstraint checkconstraint
Cc: Mariusz Felisiak, Lily Foote, Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Mark Gensler)

GeneratedFields on a model instance are not recalculated during Model.full_clean(). Therefore, if a GeneratedField is included in the *expressions or fields parameters of a UniqueConstraint or CheckConstraint in the instance's Model._meta, the new values of the GeneratedField attribute are not used by BaseConstraint.validate().

Instead, the model instance's GeneratedField attribute will return the value which corresponds to the most recent refresh of the instance from the database. If the instance is new and has not yet been saved, the following error is raised:

AttributeError: Cannot read a generated field from an unsaved model.

An IntegrityError is correctly raised when calling Model.save() if any constraints involving GeneratedFields are violated.

Instead, the recalculated values for GeneratedFields should be used when checking constraints.

This ticket was raised after fixing the bug in #35560 with PR https://github.com/django/django/pull/18309.

Change History (18)

comment:1 by Mark Gensler, 5 months ago

I'm happy to start working on this. I will raise a PR and add failing tests to catch the bug shortly.

Before writing any code to address this, I'd like to discuss where best to apply the fix. I see two possible options:

  1. Recalculate GeneratedField attribute values on the fly as part of calling Constraint.validate(). This will leave the instance attributes unchanged until Model.save() is called.
  2. Recalculate GeneratedField attributes on the model instance during Model.full_clean(), prior to calls to Constraint.validate(), and irrespective of whether relevant constraints are present.

The second option would allow further checks against the new GeneratedField value of the instance to be made during validation. On the other hand, the first option would leave the instance's attribute returning the value currently returned/stored by the database.

Which would be best?

comment:2 by Mark Gensler, 5 months ago

Description: modified (diff)

comment:3 by Mark Gensler, 5 months ago

Description: modified (diff)

in reply to:  1 ; comment:4 by Sarah Boyce, 5 months ago

Cc: Mariusz Felisiak Lily Foote Simon Charette added
Summary: Model.full_clean() does not recalculate GeneratedFields prior to validating model constraintsAdd support for constraint validation on GeneratedFields
Triage Stage: UnreviewedAccepted
Type: BugNew feature

Replying to Mark Gensler:

Before writing any code to address this, I'd like to discuss where best to apply the fix. I see two possible options:

  1. Recalculate GeneratedField attribute values on the fly as part of calling Constraint.validate(). This will leave the instance attributes unchanged until Model.save() is called.
  2. Recalculate GeneratedField attributes on the model instance during Model.full_clean(), prior to calls to Constraint.validate(), and irrespective of whether relevant constraints are present.

My gut says option 1

in reply to:  4 comment:5 by Mark Gensler, 5 months ago

Replying to Sarah Boyce:

My gut says option 1

Yes, that seems the simpler approach which directly addresses this problem and no more. Also from the definition of a GeneratedField in the docs:

A field that is always computed based on other fields in the model. This field is managed and updated by the database itself.

I'll work on option 1.

comment:6 by Simon Charette, 5 months ago

I feel like we should avoid the naive approach of calling refresh_from_db(fields=generated_fields) as that will mutate the in-memory instance and leave it in this form as well as perform an extra query per constraint.

What should be done IMO is for the query performed in UniqueConstraint to call replace_expressions({F(gfield.name): gfield.expression.replace_expressions(....) for gfield in generated_field}) so that will make

class Contributor(models.Model):
    first_name = models.TextField()
    last_name = models.TextField()
    full_name = models.GeneratedField(
        Lower(Concat("first_name", models.Value(" "), "last_name"))
    )

    class Meta:
        constraints = {
            UniqueConstraint(names="unique_full_name", MD5("full_name"))
        }

Contributor(first_name='Mark', last_name='Gensler').full_clean()

result in the following the query, assuming first_name and last_name are available, would be

SELECT 1 FROM contributor
WHERE md5(lower("first_name" || ' ' || "last_name")) = md5(lower('Mark' || ' ' || 'Gensler'))

The more we push to the database the less likely we are to run into race conditions and serde roudtrip issues.

Note that ExclusionConstraint is affected as well and that the logic added here will need to be made so constraints that include generated fields or have expressions with references to generated fields that themselves references exclude when provided will also need to skip validate. See the whole flatten logic for reference which that likely need to be updated to have F("some_generated_field") flattened in a way that returns all the fields referenced by it.

Last edited 5 months ago by Simon Charette (previous) (diff)

comment:7 by Sarah Boyce, 5 months ago

I also think it's worth highlighting that this has come from us testing a PR and not "organically". I don't think this is a quick win and we should probably confirm that this is wanted, otherwise it might be best to document that it isn't supported.

in reply to:  7 comment:8 by Mark Gensler, 5 months ago

Replying to Sarah Boyce:

I also think it's worth highlighting that this has come from us testing a PR and not "organically". I don't think this is a quick win and we should probably confirm that this is wanted, otherwise it might be best to document that it isn't supported.

How would we do that? Shall I raise a discussion on the django forum?

I don't see any gap in pure functionality at present. Any expression which is defined for a GeneratedField could also be written directly as an input to the CheckConstraint / UniqueConstraint / ExclusionConstraint, rather than by using the GeneratedField as shorthand for that expression. This change would simply allow a more concise and readable definition of the constraint and assist with DRY. Or is there another advantage which I'm missing?

If this isn't required, the only change necessary would be to prevent a GeneratedField appearing in the expression of any constraints. Otherwise a stale value would be returned for the GeneratedField attribute of an instance during constraint.validate(). This wouldn't be strictly backwards compatible, but users could re-write any existing constraints which use a GeneratedField.

*EDIT* Actually rather than preventing GeneratedFields from appearing in constraints, the system could call checks.Warning().

Replying to Simon Charette:

Thanks for the examples and for raising ExclusionConstraint, I had missed that. I'll investigate a solution using the outline you provided.

Last edited 5 months ago by Mark Gensler (previous) (diff)

comment:9 by Simon Charette, 5 months ago

FWIW I gave a shot at what I described above to ensure that it was implementable and not too invasive and it was relatively straightforward. I don't see why we shouldn't support constraints over GeneratedField knowing that.

in reply to:  9 comment:10 by Mark Gensler, 5 months ago

Replying to Simon Charette:

FWIW I gave a shot at what I described above to ensure that it was implementable and not too invasive and it was relatively straightforward. I don't see why we shouldn't support constraints over GeneratedField knowing that.

Thanks Simon, I will use what you've done as a starting point for the PR.

comment:11 by Mark Gensler, 5 months ago

Has patch: set

comment:12 by Mark Gensler, 5 months ago

I have added a draft PR but there are still a couple of outstanding issues. I thought the PR would probably be a better place for review/discussion.

Simon, would you be able to take a look? Thanks Mark

comment:13 by Simon Charette, 4 months ago

The patch looks ready to be reviewed to me. I requested that a release note be added for 5.0.X just in case we wanted to backport but in the light of #35560 disabling validation at first I think we should just consider this a new feature for inclusion in 5.2.x. That would also gives us enough time to properly address #34871 by then without restricting ourselves to solutions that are not to invasive for a backport. Given any of the GeneratedField validation can be emulated with expression constraints as Mark pointed out it should allow for users to lean into that in the mean time.

IMO the branch demonstrates that enabling this feature is achievable without too much boilerplate an ensures that GeneratedField works just like any other Field with regards to Meta.constraints instead of crashing with IntegrityError.

comment:14 by Mariusz Felisiak, 4 months ago

I think we should just consider this a new feature for inclusion in 5.2.x.

For me, definitely a new feature.

comment:15 by Sarah Boyce, 4 months ago

Patch needs improvement: set

comment:16 by Sarah Boyce, 4 months ago

Patch needs improvement: unset

comment:17 by Sarah Boyce, 4 months ago

Triage Stage: AcceptedReady for checkin

comment:18 by Sarah Boyce <42296566+sarahboyce@…>, 4 months ago

Resolution: fixed
Status: assignedclosed

In 2281286:

Fixed #35575 -- Added support for constraint validation on GeneratedFields.

Note: See TracTickets for help on using tickets.
Back to Top