Opened 13 months ago

Last modified 8 months ago

#34943 assigned New feature

Support passing unique constraint names to bulk_create().

Reported by: Alex Vandiver Owned by: HAMA Barhamou
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: bulk insert update upsert
Cc: Vitor Pereira, Chih Sean Hsu, Simon Charette, Ülgen Sarıkavak Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Mariusz Felisiak)

#31685 added support for INSERT ... ON CONFLICT(col1, col2) DO UPDATE SET ... In some cases, however, the unique constraint may be on computed columns, which cannot be used with unique_fields because they are directly quoted.

For instance:

from django.db.models.functions import Upper
UserTopic.objects.bulk_create(
  [ut],
  update_conflicts=True,
  update_fields=['last_updated','visibility_policy'],
  unique_fields=['user_profile_id','stream_id',Upper('topic_name')],
)

...fails because UserTopic has no field named 'Upper(F(topic_name))'.

And:

UserTopic.objects.bulk_create(
  [ut],
  update_conflicts=True,
  update_fields=['last_updated','visibility_policy'],
  unique_fields=['user_profile_id','stream_id','upper(topic_name)'],
)

...fails similarly, with UserTopic has no field named 'upper(topic_name)'.

It would be a useful feature to be able to handle these cases.

Change History (11)

comment:1 by Mariusz Felisiak, 13 months ago

Cc: Vitor Pereira Chih Sean Hsu added
Description: modified (diff)
Summary: Support computed unique fields for INSERT ON CONFLICT ... UPDATESupport passing database expressions to bulk_create()'s unique_fields.
Triage Stage: UnreviewedAccepted
Type: UncategorizedNew feature

Sounds reasonable.

comment:2 by Alex Vandiver, 13 months ago

Keywords: bulk insert update upsert added

comment:3 by Simon Charette, 13 months ago

Maybe the would be an opportunity to deprecate/rename the unique_fields kwarg name to unique_expressions or unique_together to denote that not only fields are acceptable here.

Or we could take an hybrid approach like we did with Index and Constraint where either allow update_fields or update_expressions to be specified.

I get a sense that we should have preferred a unique_constraint kwarg that points to the name of an entry of Meta.constraints or the name of a Field(unique) in the first place instead of forcing users to specify the exact same index definition they have in their model, Meta.unique_together, or Meta.constraints instead.

For backends that support the ON CONFLICT ON CONSTRAINT clause we could have simply used it for the constraint name specified otherwise we know what the index expression associated with the specified constraint is so we can generate the proper SQL.

In other words, for a model like

class UserTopic(models.Model):
    ...
    class Meta:
        constraints = [
            UniqueConstraint(
                'user_profile_id','stream_id',Upper('topic_name'),  
                name='unique_profile_stream_topic'
            )
        ]

The call to bulk_create could simply have been

UserTopic.objects.bulk_create(
    [ut],
    update_fields=['last_updated','visibility_policy'],
    unique_constraint='unique_profile_stream_topic',
)
Version 0, edited 13 months ago by Simon Charette (next)

comment:4 by Mariusz Felisiak, 13 months ago

Summary: Support passing database expressions to bulk_create()'s unique_fields.Support passing unique constraint names to bulk_create().

Great suggestion, Simon!

comment:5 by Sujay, 13 months ago

Owner: changed from nobody to Sujay
Status: newassigned

comment:6 by Alex Vandiver, 13 months ago

It's worth noting that [PostgreSQL's INSERT documentation](https://www.postgresql.org/docs/current/sql-insert.html) suggests against ON CONFLICT ON CONSTRAINT, preferring the raw list of expressions:

It is often preferable to use unique index inference rather than naming a constraint directly using ON CONFLICT ON CONSTRAINT constraint_name. Inference will continue to work correctly when the underlying index is replaced by another more or less equivalent index in an overlapping way, for example when using CREATE UNIQUE INDEX ... CONCURRENTLY before dropping the index being replaced.

That is, naming a specific constraint is in general more brittle, since it depends on the specific name, rather than naming the effect of the constraint.

In our particular use case, we also currently only have a unique index, not a constraint -- and I'm not sure how well doing that migration would go. But that's a separate issue.

comment:7 by Simon Charette, 13 months ago

Cc: Simon Charette added

Makes sense.

I guess that from an application maintenance perspective there are still benefits in using references to already defined constraint though, think of urlpattern names, annotations, lookups.

To take back the example above nothing would prevent the ORM from simply resolving unique_constraint='unique_profile_stream_topic' to ('user_profile_id', 'stream_id', Upper('topic_name')) and avoid the database level usage of ON CONFLICT ON CONSTRAINT if that's discouraged.

In other words, the aliasing capabilities could be solely done at the application level assuming users opt-in into the usage of Meta.constraints. That would also happen to address the issue involved in dealing with the difference between unique indexes and constraints as Postgres treat them differently.

in reply to:  3 comment:8 by HAMA Barhamou, 9 months ago

Replying to Simon Charette:

Maybe the would be an opportunity to deprecate/rename the unique_fields kwarg name to unique_expressions or unique_together to denote that not only fields are acceptable here.

Or we could take an hybrid approach like we did with Index and Constraint where either allow update_fields or update_expressions to be specified.

I get a sense that we should have preferred a unique_constraint kwarg that points to the name of an entry of Meta.constraints or the name of a Field(unique) in the first place instead of forcing users to specify the exact same index definition they have in their model, Meta.unique_together, or Meta.constraints instead.

For backends that support the ON CONFLICT ON CONSTRAINT clause we could have simply used it for the constraint name specified otherwise we know what the index expression associated with the specified constraint is so we can generate the proper SQL.

In other words, for a model like

class UserTopic(models.Model):
    ...
    class Meta:
        constraints = [
            UniqueConstraint(
                'user_profile_id','stream_id',Upper('topic_name'),  
                name='unique_profile_stream_topic'
            )
        ]

The call to bulk_create could simply have been

UserTopic.objects.bulk_create(
    [ut],
    update_fields=['last_updated','visibility_policy'],
    unique_constraint='unique_profile_stream_topic',
)

---

My recommendation would be to introduce a unique_constraint: str | tuple[str | Expression] kwarg and deprecate unique_fields. When provided a str it would be a reference to a UniqueConstraint by .name and when it's a tuple it would be expected to be a index expression where str are resolved to field names.

I guess the str -> UniqueConstraint.name part could be a distinct feature request but it seems related to this issue given the only way Django supports creating unique constraints composed of expressions is through Meta.constaints and having to define the exact expression twice is error prone.

Hi, If you depreciated unique_fields, what would that mean for ticket #34277 https://code.djangoproject.com/ticket/34277, which I'm working on?

Last edited 9 months ago by HAMA Barhamou (previous) (diff)

comment:9 by Sujay, 9 months ago

Owner: Sujay removed
Status: assignednew

comment:10 by Ülgen Sarıkavak, 8 months ago

Cc: Ülgen Sarıkavak added

comment:11 by HAMA Barhamou, 8 months ago

Owner: set to HAMA Barhamou
Status: newassigned

I will make a proposal to work on the ticket for GSoC 2024.

Note: See TracTickets for help on using tickets.
Back to Top