Code

Opened 6 years ago

Closed 3 years ago

Last modified 3 years ago

#8901 closed Bug (fixed)

Django's guessed postgresql sequence name is incorrect if the resulting sequence name is longer than max_identifier_length

Reported by: adam@… Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: postgresql
Cc: aball@…, Skaffen, LvanderRee Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

The autogenerated sequence name for a primary key in postgresql is "tablename_pkname_seq". last_insert_id() constructs this name from the tablename and the pkname, but it doesn't correctly truncate it if it's too long. The sequence name must be <= 63 characters. If it's too long, then the tablename and pkname are truncated to 29 characters each. It's possible for the tablename to be 28 characters and the pkname to be 30 characters. 30 is greater than 29, but the whole thing is exactly 63 characters, so no truncation is needed. I've included a patch against Django 1.0. It's been tested on postgres 8.1 on Windows. I have not tried any other versions.

Attachments (5)

postgres_sequence_name.patch (1.0 KB) - added by adam@… 6 years ago.
postgresql-get-sequence-name.diff (3.0 KB) - added by lemuelf 5 years ago.
Added a get_sequence_name function that queries postgresql function pg_get_serial_sequence and updated last_insert_id, sql_flush, and sequence_reset_sql methods.
django8901_msh_patch1.diff (7.6 KB) - added by Skaffen 4 years ago.
Use pg_get_serial_sequence to fetch sequence name for a given serial column (includes test patches)
8901-r13323.diff (8.4 KB) - added by russellm 4 years ago.
Cleaup of Skaffen's patch
8901-fixindextruncation.diff (951 bytes) - added by Skaffen 4 years ago.
Call truncate_name in postgresql index creation to avoid letting postgres truncate the name badly

Download all attachments as: .zip

Change History (32)

Changed 6 years ago by adam@…

comment:1 Changed 6 years ago by mtredinnick

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

Nice catch. This patch does need a comment, but the one you've got is confusing, since it leaves out the reason for what's going on. A comment that just explains why we're doing the truncating is better. It also looks very, very fragile: is the way this sequence name is constructed documented anywhere? I'd like some confirmation that we aren't going to be chasing versions of PostgreSQL forever with changing things here (particularly since 8.1 is now a couple of versions behind the latest). It might be more robust to also change the SQL creation code to explicitly specify the sequence name, as we do with a lot of index names to ensure they fit inside the 63 character limit.

Anyway, we should definitely fix the problem here, including doing so for existing setups. Precisely how needs a bit of research to make sure it's robust.

comment:2 follow-up: Changed 5 years ago by vad

Probably it would be better to use postgres' pg_get_serial_sequence('tablename', 'pkname')

comment:3 Changed 5 years ago by aball

  • Cc aball added

comment:4 Changed 5 years ago by aball

  • Cc aball@… added; aball removed

Changed 5 years ago by lemuelf

Added a get_sequence_name function that queries postgresql function pg_get_serial_sequence and updated last_insert_id, sql_flush, and sequence_reset_sql methods.

comment:5 Changed 5 years ago by lemuelf

  • Keywords postgresql added
  • Needs tests set
  • Summary changed from last_insert_id() for postgres fails when the autoincrement sequence name is too long. to Django's guessed postgresql sequence name is incorrect if the resulting sequence name is longer than max_identifier_length
  • Version changed from 1.0 to SVN

comment:6 in reply to: ↑ 2 ; follow-up: Changed 5 years ago by lemuelf

Replying to vad:

Probably it would be better to use postgres' pg_get_serial_sequence('tablename', 'pkname')

Yes, it seems that The Way to do it is through postgres' pg_get_serial_sequence function because aside from the kinda tricky way postgres truncates the table or field name for the sequence name, it also adds a counter to the end of the sequence name (after truncating more characters) if the implicit sequence name would result in a name collision. Also, I wasn't able to find any documentation explaining how postgresql generates the sequence names.

comment:7 Changed 4 years ago by Skaffen

Just to add a note to say that I'm now hitting this problem in an application after upgrading to Django 1.2 from Django 1.1.

The reason I've started hitting it after migration is that ManyRelatedManager._add_items has changed how it works. Looking at the code it used to do the inserts itself into the m2m table and didn't care about the id column values were for those new entries so it never called last_insert_id. In Django 1.2 it looks like it uses a model save to create those entries and that will call last_insert_id. So if the length of your app name, model name and m2m field name combined plus the suffix of "_id_seq" are greater than 63 characters you will get an error.

So if anyone using postgresql and django are getting an error of 'relation "..." does not exist' related to a SELECT CURRVAL when saving a model with m2m relationships after upgrading to Django 1.2, they've probably now hit this issue too. #

I guess there's more chance people will hit this bug now as there's a greater chance of model name + m2m field name being long enough to trigger this (previously it would have only affected people who have defined very long model names?).

comment:8 Changed 4 years ago by Skaffen

BTW - having looked at the postgresql-get-sequence-name.diff patch I notice that it does a query in a function called get_sequence_name to get the sequence name which it then shovels into the call to select CURRVAL, which doesn't seem ideal. Instead it would be better to rework things slightly to just inline the call to pg_get_serial_sequence, e.g.:
SELECT CURRVAL(pg_get_serial_sequence('tablename','id'));

comment:9 Changed 4 years ago by Skaffen

Note that pg_get_serial_sequence doesn't exist prior to postgres version 8 - I'm not sure what version of postgresql Django is supposed to support (a quick scout through the docs didn't really help). My guess, since I can only see references to differences in different version 8 minor releases, is that it's only been used/tested against version 8. Since version 8 was released in 2005 hopefully everyone who is using something as modern as Django 1.x is also on a version 8 of postgres :).

Changed 4 years ago by Skaffen

Use pg_get_serial_sequence to fetch sequence name for a given serial column (includes test patches)

comment:10 Changed 4 years ago by Skaffen

I've had a go at a patch myself to fix the issue and have included some tests (I wasn't sure where to put them, so apologies if I've not added them into the best place). I've not submitted any code for django before so apologies for any newbie errors :). I figure the tests ought to behave on any database back-end assuming the back-end doesn't have any issues with auto field sequences on long model names or with long pk fields, so I haven't made them run only on postgres. I haven't included a test for the fix to sequence_reset_sql since there is no existing test to cover that which I could see in the test suites and the command doesn't run the sql it just displays it. I have tested the output by hand tho' to make sure it's valid - I guess the only way to do a unit test is either to verify the sql put out is what's expected or else actually try running the sql (if there is any for the back-end) - perhaps someone else could do that if it's necessary (or is a by-eye or by-hand check sufficient for that one)?

Changed 4 years ago by russellm

Cleaup of Skaffen's patch

comment:11 Changed 4 years ago by russellm

  • Needs tests unset
  • Patch needs improvement set

@Skaffen - I've attached a slightly cleaned up patch (mostly style changes for consistency, like naming of tables, reusing existing models, and trimming some comments). I've also slightly changed the flush tests -- it turns out that calling flush is an extremely expensive operation when you have a full test suite. We've just gone through the process of removing a half a dozen calls to flush, and halved the time it takes to run the full test suite as a result. So -- I've modified the test to exercise the bits of the flush that matter from the point of view of this patch.

Other than those relatively minor changes, your patch looks good. However, there's one issue preventing me from committing this to trunk: When you run the tests with this patch applied, there is an error raised during test setup:

...
Installing index for admin.LogEntry model
Installing index for backends.VeryLongModelNameZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ_m2m_also_quite_long_zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz model
Failed to install index for backends.VeryLongModelNameZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ_m2m_also_quite_long_zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz model: relation "backends_verylongmodelnamezzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzf21a" already exists

I'm guessing that this is some sort of confusion relating to the fact that the very long table and the very long m2m table have the same truncated name, or possibly that the key names on those tables are being truncated.

This setup error cause any tests to fail, but I'm hesitant to commit a patch that causes a visible error during test setup unless there's an unavoidable reason for it. I can't see an obvious cause, so I'll throw it back to you for some deeper analysis.

comment:12 Changed 4 years ago by Skaffen

@russellm - ah, I had verbose logging on when I ran the test and missed that in the scroll of stuff as I was mainly looking for when the test ran!

It looks like it's a repeat of #12977 - the postgresql back-end has its own version of sql_indexes_for_field and it looks like that fix didn't get applied to the postgresql backend version. I guess the postgresql specific one is there to add in the extra "like" indexes - I guess the author of that postgresql backend method decided it was tidier to duplicate the whole method's functionality and add in the extra stuff rather than calling the base method and then having to duplicate a chunk of the code anyway to calculate the index SQL for the "like" indexes.

Easy fix, I think, to add in a call to truncate_name around the index name. I'll add a patch covering just that bit shortly - I'm not sure if you'd want to raise it as a separate ticket or whether it's ok to fix as part of this one. I'm also not sure whether you care about adding in a test to cover that case - it seems that there was no test added as part of #12977 which is perhaps why this didn't get caught. I guess it may be sufficient that my horribly long model name will now trigger this issue and the messages during setup will at least indicate something is going wrong so that'll serve as sufficient for regression testing? :) The one extra thing that might be worth doing is adding something like this to VeryLongModelNameZZZZZZ in the test models...

varchar_thats_also_long_zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz = models.CharField(max_length=20)

... just to exercise the code path in the postgresql backend sql_indexes_for_field that adds the "like" index for the long name too. I'll leave that up to you to add in if you think it's worth doing here!

Changed 4 years ago by Skaffen

Call truncate_name in postgresql index creation to avoid letting postgres truncate the name badly

comment:13 Changed 4 years ago by Skaffen

  • Cc Skaffen added

comment:14 Changed 4 years ago by russellm

  • Resolution set to fixed
  • Status changed from new to closed

(In [13328]) Fixed #8901 -- Corrected the PostgreSQL sequence reset code when field identifiers exceed the maximum identifier length. Thanks to adam@… for the report, and Matt Hoskins for the patch.

comment:15 Changed 4 years ago by russellm

(In [13348]) Refs #8901 -- Reverted r13328 because the patch imposes a minimum version requirement of Postgres 8.0; we can't impose that on Django 1.2, so we need to wait until the 1.3 is branched before we can apply this change.

comment:16 Changed 4 years ago by russellm

  • milestone set to 1.3
  • Patch needs improvement unset
  • Resolution fixed deleted
  • Status changed from closed to reopened

For personal reference - when we reapply this, we should also add documentation that notes the new minimum database version.

comment:17 Changed 4 years ago by russellm

  • Resolution set to fixed
  • Status changed from reopened to closed

(In [13363]) Fixed #8901 -- Reapplied r13328 (with extra docs) now that we have a 1.3 development branch.

comment:18 Changed 4 years ago by guettli

  • Cc hv@… added

comment:19 Changed 4 years ago by LvanderRee

Hi all,

After developing for several years in C# (.Net and Mono) and Java, and for the last 4 years using and patching Symfony and Propel in Php, I have now started developing in Python (worked my way through diveIntoPython already) and now working on my first Django project. To make things somewhat more challenging I am also using PostgreSQL for the first time after working for years with MySql (and MsSql).

But even though there are many new things I experience, I do think I am running into a similar problem using Django 1.2.1 and PostgreSQL 8.4.4 as described above in yet another situation.

I am using a legacy database from the sql-ledger project. I inspected this database with Django to generate my model and adjusted the code as described (changed order, defined primary keys and relations). After that I enabled the django-admin and so far so good!

Now the problem:

From what I can see a database table I am working on has been defined like this:

CREATE TABLE project
(
  id integer DEFAULT nextval('id'::regclass),
  projectnumber text,
  description text,
  startdate date,
  enddate date,
  parts_id integer,
  production double precision DEFAULT 0,
  completed double precision DEFAULT 0,
  customer_id integer
)
WITH (
  OIDS=FALSE
);
-- Index: project_id_key

-- DROP INDEX project_id_key;

CREATE INDEX project_id_key
  ON project
  USING btree
  (id);

-- Index: projectnumber_key

-- DROP INDEX projectnumber_key;

CREATE UNIQUE INDEX projectnumber_key
  ON project
  USING btree
  (projectnumber);

I was unfamiliar with using a function to calculate the "auto-increment"-value/sequence for the PK, but as far as I understand the nextval method is used to calculate new primary keys. I think the provided 'id' field has been used for all tables and is not specifically bound to this table, I only haven't figured out what this ::regclass does.

More importantly however is that I can see this is being done differently from how it is being retrieved in Django, with:

cursor.execute("SELECT CURRVAL('\"%s_%s_seq\"')" % (table_name, pk_name))

Which is resulting in the same error 'relation "..." does not exist' as people experience above, when I am creating new records (from both admin and python-console saving a new Project instance). So this time it is not because of truncating field-names, but the sequence that is requested for currval does not exists, since the people at sql-ledger seem to have defined a completely different name for the sequence (am I right?).

I can fix this by patching operations.py with:

cursor.execute("SELECT CURRVAL(pg_get_serial_sequence('%s','%s'))" % (table_name, pk_name)) 

as described above.

So maybe this bug should also be solved (somehow) for 1.2.x as well, since not all legacy databases can currently be managed... I haven't got a clue however how to fix this for PG<8.x

comment:20 in reply to: ↑ 6 Changed 4 years ago by lukeplant

As Russell explained, backporting that patch would force the 1.2.X branch to require Postgres < 8.0, which we can't do. Without 8.0, it looks like we can't fix this robustly due to the issue brought up by lemuelf above.

So the solution to this is to update to trunk, or maintain your own patch I'm afraid. FWIW, trunk is usually pretty stable.

comment:21 Changed 4 years ago by LvanderRee

  • Cc LvanderRee added

Thanks for the info.

I noticed that for this patch Postgres should be 8.0 or bigger, but since I found another usecase which causes problems I wondered if there might be the need for another solution, although I cannot think of any either... However since I am new to both Python/Django and Postgres I cannot tell if there isn't any other solution at all.

The patch however fine by me, and actually running 1.3 (trunk) is no problem for me either. So your answer is good (and fast) enough for me!

comment:22 follow-up: Changed 3 years ago by HM

  • Resolution fixed deleted
  • Status changed from closed to reopened

This fix partially fixes #1946 but does not take into account tables that uses postgres' own inheritance scheme since
pg_get_serial_sequence() only checks the table and not its parents, if any. I have a database backend in #1946 which recurisvely looks up parents in order to find the sequence name. I'm starting to think the easiest/best way is to add a function to postgresql itself that actually looks up the real sequence name. Currently that still means an extra backend is needed though, but the only difference from the standard backend is that the line

cursor.execute("SELECT CURRVAL(pg_get_serial_sequence('%s','%s'))" % (table_name, pk_name)) 

would use a different function.

comment:23 Changed 3 years ago by lukeplant

  • Severity set to Normal
  • Type set to Bug

comment:24 Changed 3 years ago by patchhammer

  • Easy pickings unset
  • Patch needs improvement set

8901-fixindextruncation.diff fails to apply cleanly on to trunk

comment:25 in reply to: ↑ 22 Changed 3 years ago by ramiro

  • Resolution set to fixed
  • Status changed from reopened to closed
  • UI/UX unset

Replying to HM:

This fix partially fixes #1946 but does not take into account tables that uses postgres' own inheritance scheme

I don't think it is correct to reopen this ticket for that particular legacy Postgresq DB setup. The commit that closed this ticket fixed the issue reported here and both together are a self-contained piece.

That issue had its own ticket (#1946, that got closed in favor of #13295) but if it hadn't then a new ticket should have been opened instead of reopening this one.

Closing.

comment:26 Changed 3 years ago by jacob

  • milestone 1.3 deleted

Milestone 1.3 deleted

comment:27 Changed 3 years ago by guettli

  • Cc hv@… removed

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.