Opened 6 years ago

Closed 6 years ago

Last modified 2 years ago

#30108 closed Cleanup/optimization (fixed)

Include FK constraints when adding columns in PostgreSQL

Reported by: Dan Tao Owned by: nobody
Component: Migrations Version: dev
Severity: Normal Keywords:
Cc: Dan Tao, Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Dan Tao)

On the Bitbucket team, we have been stung by Django's default behavior, when adding a nullable ForeignKey field to a large, high-traffic table, of doing so in two separate statements:

ALTER TABLE [...] ADD COLUMN [...]
ALTER TABLE [...] ADD CONSTRAINT [...] FOREIGN KEY [...] REFERENCES [...]

The reason this is problematic is that, again for a very large table with many millions of rows and significant contention, the statement to add the constraint will acquire a lock and perform a full table scan to validate that there are no invalid foreign keys in the table.

In PostgreSQL, if you execute ADD COLUMN with a REFERENCES clause, it can bypass the table scan since there cannot possibly be any invalid foreign keys in the table.

We have written a custom operation to do just that whenever we encounter this scenario. Without it, we simply would not be able to add columns with foreign key constraints to some of our larger tables, other than by executing the SQL manually and then faking the migration. It would be nice if Django could take the more optimal approach by default, at least for the PostgreSQL database backend. (I'm not familiar enough with the other database backends to know whether this is also a problem for them.)

Change History (11)

comment:1 by Dan Tao, 6 years ago

I've taken a look at the code already, and I believe it's almost there.

The BaseDatabaseSchemaEditor class includes the attribute sql_create_inline_fk, None by default, which is appended to a column definition when creating a table. It is not included when adding a field, but as far as I can tell that is simply an oversight.

Somewhat surprisingly, this attribute is only defined on the sqlite3 backend. (Why there was any need to optimize this particular backend, I couldn't say.)

I attempted to make the following changes:

  1. Update the logic in BaseDatabaseSchemaEditor.add_field to append sql_create_inline_fk (if defined) to the column definition.
  2. Add the sql_create_inline_fk attribute to the postgresql backend class.

Unfortunately this ended up causing test failures because there are test models defined with a cyclic dependency. It seems that SQLite doesn't mind if you create a table with a foreign key to another table that doesn't exist yet, but PostgreSQL does.

psycopg2.ProgrammingError: relation "mutually_referential_child" does not exist

I'm not sure what the solution is. Perhaps the code could be refactored so that the postgresql backend only applies the sql_create_inline_fk optimization in add_field (which is when this scenario can occur) and not in create_model (when it shouldn't matter, because a table scan on an empty table is not problematic).

comment:2 by Dan Tao, 6 years ago

Cc: Dan Tao added
Description: modified (diff)
Has patch: set

comment:3 by Simon Charette, 6 years ago

Patch needs improvement: set
Triage Stage: UnreviewedAccepted

First, thanks for the very detailed report and the patch.

For posterity here are similar comments to the ones I left on your Github PR.

Somewhat surprisingly, this attribute is only defined on the sqlite3 backend. (Why there was any need to optimize this particular backend, I couldn't say.)

That's because SQLite doesn't support ALTER TABLE ADD CONSTRAINT; the table has to be rebuilt on column addition so the only way to define foreign constraints is by inlining them.

I'm not sure what the solution is. Perhaps the code could be refactored so that the postgresql backend only applies the sql_create_inline_fk optimization in add_field (which is when this scenario can occur) and not in create_model (when it shouldn't matter, because a table scan on an empty table is not problematic).

I think it should be done for all backends that support it on field addition. As you've said it's not that useful in table creation cases anyway because the table will be empty so the operation will be really fast. That would also be really hard to support for circular CreateModel across apps. To summarize I think it's a great addition for AddField (and probably AlterField?) but it's likely not worth the complexity it would incur for CreateModel.

I left a few comments on your PR for improvements, mainly around adding support for MySQL and Oracle as well. Please uncheck patch needs improvement once you've addressed them so the patch appears in the review queue.

comment:4 by Dan Tao, 6 years ago

To summarize I think it's a great addition for AddField (and probably AlterField?) but it's likely not worth the complexity it would incur for CreateModel.

I think we're agreed on this. The benefit for CreateModel is negligible, especially compared to the complexity of the refactor that would be necessary to support it.

mainly around adding support for MySQL and Oracle

I am happy to do this. My only concern is that I am nowhere near an expert in either of these databases (not that I am an expert in PostgreSQL, either, but everything's relative!). From a cursory investigation, it seems that the MySQL syntax for doing this is a bit different and involves basically combining two statements (an ADD COLUMN and an ADD CONSTRAINT) with a comma. The Oracle syntax seems quite similar to the PostgreSQL syntax.

Both the MySQL and the Oracle versions require that you specify a name for the constraint, whereas it appears PostgreSQL handles that automatically. Passing this along in the base schema editor will require a bit more refactoring, though it looks very doable.

I'll do my best, and then we'll see how you feel about how things turn out. I'm wondering, in the event we run into further complications with MySQL and/or Oracle (for example, if you are not happy with the way I refactor the code to support them), if you'd be opposed to deferring those backends for now so that someone with more expertise in them can contribute the necessary changes? Perhaps this will be moot—I'm just preparing for the worst :)

And one more thing: would you prefer that we strive to keep the majority of the discussion here on the ticket, or in the pull request? I'm assuming we should keep it mostly here aside from comments that are specific to details of the code; but if I've got that backwards I'm happy to continue the discussion on the PR.

comment:5 by Simon Charette, 6 years ago

Cc: Simon Charette added

I'll do my best, and then we'll see how you feel about how things turn out. I'm wondering, in the event we run into further complications with MySQL and/or Oracle (for example, if you are not happy with the way I refactor the code to support them), if you'd be opposed to deferring those backends for now so that someone with more expertise in them can contribute the necessary changes? Perhaps this will be moot—I'm just preparing for the worst :)

I'm not opposed to that at all. Let's see how complex that is and if it doesn't work out it can be revisited later.

And one more thing: would you prefer that we strive to keep the majority of the discussion here on the ticket, or in the pull request? I'm assuming we should keep it mostly here aside from comments that are specific to details of the code; but if I've got that backwards I'm happy to continue the discussion on the PR.

I think keeping all of the non-review related discussion here is more valuable given the ticket already has a lot of context.

comment:6 by Dan Tao, 6 years ago

Patch needs improvement: unset

comment:7 by Simon Charette, 6 years ago

Patch needs improvement: set

Left a few comments for improvement but should be RFC once they are addressed.

comment:8 by Dan Tao, 6 years ago

Patch needs improvement: unset

comment:9 by Simon Charette, 6 years ago

Triage Stage: AcceptedReady for checkin

RFC except for some cosmetic changes (quote types) that can be handled by the committer.

comment:10 by Tim Graham <timograham@…>, 6 years ago

Resolution: fixed
Status: newclosed

In 738faf9:

Fixed #30108 -- Allowed adding foreign key constraints in the same statement that adds a field.

comment:11 by Simon Charette, 2 years ago

FWIW this might have had some unintended side effects on MySQL, we're still investigating if it's the origin of the issue we're running into.

When combining multiple operations in a single ALTER STATEMENT MySQL will pick a single ALGORITHM (the way it performs the operation) that must be compatible with each operations.

In most cases an ADD COLUMN operation can use the INSTANT algorithm while the ADD CONSTRAINT requires the usage of INPLACE or even COPY when foreign key checks are enabled which is the default. From MariaDB's docs (which seem to be more helpful on the subject) it seems that this is required because MySQL cannot create indexes INSTANTly.

This means that combining operations now forces the usage of the COPY algorithm for both the column addition and foreign key constraint creation which we're still trying to determine if it might play a role in making some of our schema changes more problematic since we upgraded to 3.2 LTS.

In summary the statement issues on ForeignKey addition before this change on recent versions of MySQL were along the lines of

-- < Django 3.0 with explicit ALGORITHM
ALTER TABLE foo ADD COLUMN bar_id integer, ALGORITHM=INSTANT;
ALTER TABLE foo ADD CONSTRAINT bar_id_fk FOREIGN KEY foo(bar_id) REFERENCES bar(id), ALGORITM=COPY;

-- >= Django 3.0 with explicit ALGORITHM
ALTER TABLE foo ADD COLUMN bar_id integer, ADD CONSTRAINT bar_id_fk FOREIGN KEY foo(bar_id) REFERENCES bar(id), ALGORITM=COPY;

On a side note we've noticed that having the MySQL backend wrap foreign key constraint creation with SET foreign_key_check=0 and SET foreign_key_check=1 will allow the usage of ALGORITHM=INPLACE which might be a desirable behaviour to add in core since the schema editor can guarantee that no constraint violations will happen in the current session while adding the constraint.

In other words, toggling foreign_key_check allows for either

ALTER TABLE foo ADD COLUMN bar_id integer, ALGORITHM=INSTANT;
SET foreign_key_checks=0;
ALTER TABLE foo ADD CONSTRAINT bar_id_fk FOREIGN KEY foo(bar_id) REFERENCES bar(id), ALGORITM=INPLACE;
SET foreign_key_checks=1;
-- OR
SET foreign_key_checks=0;
ALTER TABLE foo ADD COLUMN bar_id integer, ADD CONSTRAINT bar_id_fk FOREIGN KEY foo(bar_id) REFERENCES bar(id), ALGORITM=INPLACE;
SET foreign_key_checks=1;
Last edited 2 years ago by Simon Charette (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top