hash() in creating db names is unpredictable, and unnecessary
|Reported by:||Ned Batchelder||Owned by:||elbarto|
|Has patch:||no||Needs documentation:||no|
|Needs tests:||no||Patch needs improvement:||no|
In this code fragment from django/core/management/sql.py:
# For MySQL, r_name must be unique in the first 64 characters. # So we are careful with character usage here. r_name = '%s_refs_%s_%x' % (r_col, col, abs(hash((r_table, table)))) final_output.append(style.SQL_KEYWORD('ALTER TABLE') + ' %s ADD CONSTRAINT %s FOREIGN KEY (%s) REFERENCES %s (%s)%s;' % \ (qn(r_table), truncate_name(r_name, connection.ops.max_name_length()), qn(r_col), qn(table), qn(col), connection.ops.deferrable_sql()))
The r_name is computed with a hash() component. This creates a shorter name, but hash() is not guaranteed to be stable across versions of Python or across 32- and 64-bit architectures. This makes the SQL produced by a model unpredictable, and a development machine and production machine in the same shop could disagree on names.
This became a problem for us when we had to write db upgrades on one machine and apply them to another. We use the SQL generation to verify that the upgrade was successful, and the difference in the names made the validation fail.
The r_name variable is eventually passed into truncate_name anyway: this function does the right thing, shortening the name by using md5 to produce a repeatable hash from the text of the name.
If we remove the hash() from the name generation, we'll have repeatable names.