Opened 15 years ago
Last modified 9 years ago
#14904 new Bug
TextField with unique (or in unique_together) constraint breaks for large inputs in Postgres
| Reported by: | jorn | Owned by: | nobody |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 1.2 |
| Severity: | Normal | Keywords: | postgresql, index, textfield |
| Cc: | Triage Stage: | Accepted | |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
Hi,
i have a model like this one:
class foo(Model):
a = CharField(max_length=8)
b = TextField()
class Meta:
unique_together = (("a", "b"),)
Now this leads to a sql create table statement like the following in Postgresql:
CREATE TABLE "myapp_foo" (
"id" serial NOT NULL PRIMARY KEY,
"a" varchar(8) NOT NULL,
"b" text NOT NULL,
UNIQUE ("a", "b")
);
The problem here is that b is a textfield which can get arbitrarily big (that's what they are there for), while values in a btree index can not:
If the input for b is above a certain size postgresql will complain:
django.db.utils.DatabaseError: index row size 3032 exceeds maximum 2712 for index "myapp_a_key" HINT: Values larger than 1/3 of a buffer page cannot be indexed. Consider a function index of an MD5 hash of the value, or use full text indexing.
The problem here is that the used unique constraint as of Postgres 8.4.5 uses a btree internally, which as stated above doesn't work with values above a certain size.
The solution for this problem would be: If a TextField has a / is part of a unique constraint or has an explicit index_db=True: _always_ index an md5 or better sha256 sum of TextField columns only (note that sha256 is in pgcrypto module only).
In this case:
UNIQUE ("a", md5("b"))
As mentioned this also applies to the universal index_db=True field argument.
Change History (4)
comment:1 by , 15 years ago
| Triage Stage: | Unreviewed → Accepted |
|---|
comment:2 by , 15 years ago
| Severity: | → Normal |
|---|---|
| Type: | → Bug |
comment:3 by , 14 years ago
| Easy pickings: | unset |
|---|---|
| UI/UX: | unset |
Have the same problem with mysql,
_mysql_exceptions.OperationalError: (1071, 'Specified key was too long; max key length is 767 bytes')
With
name = models.CharField(null=True, blank=True, max_length=300)
user = models.ForeignKey(User,null=True, blank=True)
class Meta :
unique_together = (('name', 'user'),)
comment:4 by , 9 years ago
Have also observed this with Django 1.8 and Postgresql 9.3
I'm linking references for fixes / workarounds found in a quick search:
- http://stackoverflow.com/questions/20725671/how-to-truncate-column-in-order-to-create-indexes
- https://www.postgresql.org/message-id/AANLkTikG_nHARKr9qeQXpS7Q6QXgJvuFUi6Wxpd0o7H7%40mail.gmail.com
- https://www.postgresql.org/docs/current/static/indexes-opclass.html
At the moment, it looks like any workaround involves writing some raw SQL. Digging through Django's code, indices are created by django.db.backends.base.schema.BaseDatabaseSchemaEditor._sql_create_index() That's pretty far outside the public API.
It would be helpful to have a complete simple sample case description especially a hint at how much text content should be assigned to the field for this condition triggering, and thus can be reproduced and/or tested against other postgresql versions.