TextField with unique (or in unique_together) constraint breaks for large inputs in Postgres
|Reported by:||jorn||Owned by:||nobody|
|Component:||Database layer (models, ORM)||Version:||1.2|
|Severity:||Normal||Keywords:||postgresql, index, textfield|
|Has patch:||no||Needs documentation:||no|
|Needs tests:||no||Patch needs improvement:||no|
i have a model like this one:
class foo(Model): a = CharField(max_length=8) b = TextField() class Meta: unique_together = (("a", "b"),)
Now this leads to a sql create table statement like the following in Postgresql:
CREATE TABLE "myapp_foo" ( "id" serial NOT NULL PRIMARY KEY, "a" varchar(8) NOT NULL, "b" text NOT NULL, UNIQUE ("a", "b") );
The problem here is that b is a textfield which can get arbitrarily big (that's what they are there for), while values in a btree index can not:
If the input for b is above a certain size postgresql will complain:
django.db.utils.DatabaseError: index row size 3032 exceeds maximum 2712 for index "myapp_a_key" HINT: Values larger than 1/3 of a buffer page cannot be indexed. Consider a function index of an MD5 hash of the value, or use full text indexing.
The problem here is that the used unique constraint as of Postgres 8.4.5 uses a btree internally, which as stated above doesn't work with values above a certain size.
The solution for this problem would be: If a TextField has a / is part of a unique constraint or has an explicit index_db=True: _always_ index an md5 or better sha256 sum of TextField columns only (note that sha256 is in pgcrypto module only).
In this case:
UNIQUE ("a", md5("b"))
As mentioned this also applies to the universal index_db=True field argument.