Opened 11 years ago
Last modified 5 years ago
#24082 new Bug
Unique=True on TextField or CharField should not create an index
| Reported by: | djbug | Owned by: | |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 1.7 |
| Severity: | Normal | Keywords: | db-indexes |
| Cc: | Shai Berger, Simon Charette, emorley@…, Mariusz Felisiak, Phil Krylov, Semyon Pupkov, Can Sarıgöl, Peter Thomassen | Triage Stage: | Accepted |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description (last modified by )
I've experienced this with PostgreSQL but I suspect it could be true for other databases too.
PostgreSQL docs say:
there's no need to manually create indexes on unique columns; doing so would just duplicate the automatically-created index.
Further the docs say:
The index covers the columns that make up the [...] unique constraint [...] and is the mechanism that enforces the constraint.
However this model in Django with unique=True on a TextField creates an index on top of the unique constraint.
class Book(models.Model):
name = models.TextField(unique=True)
creates following table & constraints in PostgreSQL:
CREATE TABLE book (
id integer NOT NULL,
name text NOT NULL,
);
ALTER TABLE ONLY book ADD CONSTRAINT book_name_key UNIQUE (name);
CREATE INDEX book_name_like ON book USING btree (name text_pattern_ops);
Please correct me if I'm wrong. My conclusion is that databases enforce unique constraint by way of an index. Adding another index is a waste. There's some mention of this fact in an old bug report (comment 3 & comment 6 ) but it looks like the issue got dropped.
I've also verified this with the following create table statement in PostgreSQL (no explicit index). A SELECT on name uses an index scan instead of a sequential scan (which means there's an implicit index). So in this case, Django doesn't need to add a CREATE INDEX statement.
CREATE TABLE book (
id serial primary key,
name text UNIQUE
);
However, if the justification to add a second index is to use text_pattern_ops ( Bug Report 12234 ) then it might be more efficient to interpret a unique=True in the above table as
CREATE TABLE book (
id serial primary key,
name text
);
CREATE UNIQUE INDEX book_name_like ON book USING btree (name text_pattern_ops);
i.e. no UNIQUE constraint in the table, only a UNIQUE INDEX.
Change History (19)
comment:1 by , 11 years ago
| Description: | modified (diff) |
|---|
comment:2 by , 11 years ago
| Description: | modified (diff) |
|---|
comment:3 by , 11 years ago
| Description: | modified (diff) |
|---|
follow-up: 5 comment:4 by , 11 years ago
follow-up: 8 comment:5 by , 11 years ago
Imposing uniqueness on text columns is almost always nonsense.
For PostgreSQL, CharField maps to varchar(n) and TextField maps to text. According to the official docs, there's no performance difference between the two. Most DBA recommend text over varchar(n) (Ref 1) (Ref 2). That said, the above discussion about unique & index is true for CharField too as far as Django is concerned.
the issue here really is whether
unique=Trueimpliesdb_index=True. This holds for most field types, so I think most people would expect it to hold for text fields as well.
Could you explain why should unique=True imply creation of an index explicitly ? AFAIK, this would duplicate the index in PostgreSQL (& a few other databases according to the old bug reports mentioned above). If we remove this explicit index, there's no drop in performance because internally the db maintains an index for a unique constraint.
You should get the behavior you expect, probably, with
unique=True, db_index=False. If this indeed gives you what you want, I'd resolve this ticket by adding an admonition about this in theTextFielddocumentation.
I've tried your suggestion. For a TextFIeld, the following two generate the same constraint & index:
TextField( unique=True, db_index=False)
and
TextField( unique=True)
follow-up: 7 comment:6 by , 11 years ago
In my opinion it would be better to not have unique=True imply db_index=True. How the database enforces the constraint seems more like an implementation detail than something Django should guarantee.
However, this is the currently documented behavior ("Note that when unique is True you don’t need to specify db_index, because unique implies the creation of an index"), so there would be a backwards-compatibility problem in changing it.
The fact that you can't prevent the creation of the extra index with an explicit db_index=False does feel like a bug, though, and something that could be fixed without a deprecation cycle.
As for the suggestion to use a single text_pattern_ops unique index to serve double duty, the PostgreSQL documentation makes it clear that that will not suffice to support all of Django's lookup types: "Note that you should also create an index with the default operator class if you want queries involving ordinary <, <=, >, or >= comparisons to use an index. Such queries cannot use the xxx_pattern_ops operator classes."
follow-up: 9 comment:7 by , 11 years ago
Replying to marfire:
However, this is the currently documented behavior ("Note that when
uniqueisTrueyou don’t need to specifydb_index, becauseuniqueimplies the creation of an index"), so there would be a backwards-compatibility problem in changing it.
The fact that you can't prevent the creation of the extra index with an explicit
db_index=Falsedoes feel like a bug, though, and something that could be fixed without a deprecation cycle.
Agreed about backward compatibility & having db_index=False as a fix.
As for the suggestion to use a single
text_pattern_opsunique index to serve double duty, the PostgreSQL documentation makes it clear that that will not suffice to support all of Django's lookup types: "Note that you should also create an index with the default operator class if you want queries involving ordinary <, <=, >, or >= comparisons to use an index. Such queries cannot use the xxx_pattern_ops operator classes."
A single text_pattern_ops index works for both LIKE & = queries. But it won't work for other operators like < <= etc.. But, I feel that, it would be an overkill to create 2 indexes by default. It should be left to the user to decide if they want double index or a single one with whichever operator class they want.
comment:8 by , 11 years ago
| Cc: | added |
|---|---|
| Triage Stage: | Unreviewed → Accepted |
First of all, +1 everything marfire said.
Replying to djbug:
Imposing uniqueness on text columns is almost always nonsense.
For PostgreSQL,
CharFieldmaps tovarchar(n)andTextFieldmaps totext. According to the official docs, there's no performance difference between the two. Most DBA recommendtextovervarchar(n)(Ref 1) (Ref 2).
Well, I rephrase: s/text column/TextField/. TextFields are for big blobs of text, not identifiers.
That said, the above discussion about unique & index is true for
CharFieldtoo as far as Django is concerned.
the issue here really is whether
unique=Trueimpliesdb_index=True. This holds for most field types, so I think most people would expect it to hold for text fields as well.
Could you explain why should
unique=Trueimply creation of an index explicitly ?
It generally does not. An index is created explicitly only to support LIKE operations (as you noted, #12234).
You should get the behavior you expect, probably, with
unique=True, db_index=False. If this indeed gives you what you want, I'd resolve this ticket by adding an admonition about this in theTextFielddocumentation.
I've tried your suggestion. For a
TextFIeld, the following two generate the same constraint & index:
TextField( unique=True, db_index=False)and
TextField( unique=True)
I'm accepting based on this (and documentation should be added as well).
comment:9 by , 11 years ago
Replying to djbug:
A single
text_pattern_opsindex works for bothLIKE&=queries. But it won't work for other operators like< <= etc.. But, I feel that, it would be an overkill to create 2 indexes by default. It should be left to the user to decide if they want double index or a single one with whichever operator class they want.
I disagree; it is much easier to notice a superfluous index, or an explicitly-removed one (once db_index=False works), than one that is partly-missing (will work for anything except LIKE). If you're at the point where a superfluous index is hurting your performance, you're likely to be paying much closer attention to your database anyway; a missing index will hurt you much earlier.
comment:10 by , 11 years ago
In reply to shaib:
TextFields are for big blobs of text, not identifiers.
I feel that's not the case with PostgreSQL. Whatever I've read so far pushes me to use TextField for all types of text.
I'll add one more thing to the above discussion:
Based on my tests, the text_pattern_ops does not use index scan with < <= > >= operators. So the point about requiring 2 index is questionable. It's possible that I'm doing something wrong and would love to have someone verify this before coming to a conclusion.
According to PG docs, the following operators are supported by text_pattern_ops:
"btree";"text_pattern_ops";"=(text,text)" "btree";"text_pattern_ops";"~<~(text,text)" "btree";"text_pattern_ops";"~<=~(text,text)" "btree";"text_pattern_ops";"~>=~(text,text)" "btree";"text_pattern_ops";"~>~(text,text)"
comment:11 by , 10 years ago
| Cc: | added |
|---|
comment:12 by , 10 years ago
| Keywords: | db-indexes added |
|---|
comment:13 by , 9 years ago
| Cc: | added |
|---|
comment:14 by , 9 years ago
| Cc: | added |
|---|---|
| Owner: | changed from to |
| Status: | new → assigned |
comment:15 by , 8 years ago
| Owner: | removed |
|---|---|
| Status: | assigned → new |
comment:16 by , 8 years ago
| Cc: | added |
|---|
comment:17 by , 7 years ago
| Cc: | added |
|---|
comment:18 by , 7 years ago
| Cc: | added |
|---|
comment:19 by , 5 years ago
| Cc: | added |
|---|
Imposing uniqueness on text columns is almost always nonsense (and doesn't work at all on Oracle). If it weren't for backwards compatibility, I'd suggest we flag it as an error.
That said, the issue here really is whether
unique=Trueimpliesdb_index=True. This holds for most field types, so I think most people would expect it to hold for text fields as well.You should get the behavior you expect, probably, with
unique=True, db_index=False. If this indeed gives you what you want, I'd resolve this ticket by adding an admonition about this in theTextFielddocumentation.