Opened 5 years ago

Last modified 19 months ago

#30108 closed Cleanup/optimization

Include FK constraints when adding columns in PostgreSQL — at Version 2

Reported by: Dan Tao Owned by: nobody
Component: Migrations Version: dev
Severity: Normal Keywords:
Cc: Dan Tao, Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Dan Tao)

On the Bitbucket team, we have been stung by Django's default behavior, when adding a nullable ForeignKey field to a large, high-traffic table, of doing so in two separate statements:

ALTER TABLE [...] ADD COLUMN [...]
ALTER TABLE [...] ADD CONSTRAINT [...] FOREIGN KEY [...] REFERENCES [...]

The reason this is problematic is that, again for a very large table with many millions of rows and significant contention, the statement to add the constraint will acquire a lock and perform a full table scan to validate that there are no invalid foreign keys in the table.

In PostgreSQL, if you execute ADD COLUMN with a REFERENCES clause, it can bypass the table scan since there cannot possibly be any invalid foreign keys in the table.

We have written a custom operation to do just that whenever we encounter this scenario. Without it, we simply would not be able to add columns with foreign key constraints to some of our larger tables, other than by executing the SQL manually and then faking the migration. It would be nice if Django could take the more optimal approach by default, at least for the PostgreSQL database backend. (I'm not familiar enough with the other database backends to know whether this is also a problem for them.)

Change History (2)

comment:1 by Dan Tao, 5 years ago

I've taken a look at the code already, and I believe it's almost there.

The BaseDatabaseSchemaEditor class includes the attribute sql_create_inline_fk, None by default, which is appended to a column definition when creating a table. It is not included when adding a field, but as far as I can tell that is simply an oversight.

Somewhat surprisingly, this attribute is only defined on the sqlite3 backend. (Why there was any need to optimize this particular backend, I couldn't say.)

I attempted to make the following changes:

  1. Update the logic in BaseDatabaseSchemaEditor.add_field to append sql_create_inline_fk (if defined) to the column definition.
  2. Add the sql_create_inline_fk attribute to the postgresql backend class.

Unfortunately this ended up causing test failures because there are test models defined with a cyclic dependency. It seems that SQLite doesn't mind if you create a table with a foreign key to another table that doesn't exist yet, but PostgreSQL does.

psycopg2.ProgrammingError: relation "mutually_referential_child" does not exist

I'm not sure what the solution is. Perhaps the code could be refactored so that the postgresql backend only applies the sql_create_inline_fk optimization in add_field (which is when this scenario can occur) and not in create_model (when it shouldn't matter, because a table scan on an empty table is not problematic).

comment:2 by Dan Tao, 5 years ago

Cc: Dan Tao added
Description: modified (diff)
Has patch: set
Note: See TracTickets for help on using tickets.
Back to Top