Opened 4 years ago
Last modified 2 months ago
#32577 assigned New feature
Add support for `UUIDAutoField` `DEFAULT_AUTO_FIELD`
Reported by: | Tomasz Wójcik | Owned by: | Tomasz Wójcik |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | uuid |
Cc: | Diego Lima, Brian Helba, raydeal, johnthagen, Michael Kuron, Alireza Safari, Petr Přikryl, Wilson E. Husin, Luiz Braga, Dmytro Litvinov, Olivier Dalang, Ülgen Sarıkavak | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Default auto field introduced in #31007, as seen in the docs, is BigAutoField
.
class BigAutoField(AutoFieldMixin, BigIntegerField): def get_internal_type(self): return 'BigAutoField' def rel_db_type(self, connection): return BigIntegerField().db_type(connection=connection)
Many projects prefer UUID field over big int for pk. So far we had to use a mixin for that but I am under the impression DEFAULT_AUTO_FIELD
exists to lift this burden.
I'd expect something like this to work (additional adjustments might be needed)
from django.db import models from django.db.models.fields import AutoFieldMixin class UUIDAutoField(AutoFieldMixin, models.UUIDField): def get_internal_type(self): return 'UUIDAutoField' def rel_db_type(self, connection): return models.UUIDField().db_type(connection=connection)
But if I use this for DEFAULT_AUTO_FIELD
I get the error
ValueError: Primary key 'common.fields.UUIDAutoField' referred by DEFAULT_AUTO_FIELD must subclass AutoField.
I noticed two things I want to share.
First, the default auto field BigAutoField
is not subclassing AutoField
so it's not consistent with the error message.
Second, AutoField
is subclassing IntegerField
. If I understand this correctly, users are limited to subclass IntegerField
for DEFAULT_AUTO_FIELD
. If so, there's no way to implement UUIDAutoField
as DEFAULT_AUTO_FIELD
.
I understand auto fields need to share a common interface but having to subclass IntegerField
is a big constraint.
I'm happy to open a PR once I know what's the desired functionality. Possible solutions I see:
- enforce subclassing of
AutoFieldMixin
insteadAutoField
- I don't think this field has to be very generic because DBs pk types are very limited. As far as I know, only ints and UUIDs make sense for pk. Maybe adding
UUIDAutoField
to django fields would make sense. That way it'd have to enforce subclass ofAutoField
orUUIDAutoField
Mentioned also in here, here and here.
Django==3.2rc1
Change History (35)
comment:1 by , 4 years ago
Summary: | Don't enforce `DEFAULT_AUTO_FIELD` to subclass from `AutoField` → Add support for `UUIDAutoField` `DEFAULT_AUTO_FIELD` |
---|
comment:2 by , 4 years ago
Description: | modified (diff) |
---|
comment:3 by , 4 years ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Type: | Uncategorized → New feature |
comment:4 by , 4 years ago
I understand it's out of scope for 3.2 as you already have a RC though I think docs could mention it has to subclass AutoField
.
The type of this implicit primary key can now be controlled via the DEFAULT_AUTO_FIELD setting and AppConfig.default_auto_field attribute. No more needing to override primary keys in all models.
This could be more specific as it doesn't apply to UUID pk projects.
I'm glad it's accepted as a feature request, thanks. I'm looking forward to hearing how you want this to be tackled. And as I said, I'm happy to open a PR as I want to use it with UUIDs in my next project.
comment:5 by , 4 years ago
Cc: | added |
---|
follow-up: 7 comment:6 by , 4 years ago
I don't think this field has to be very generic because DBs pk types are very limited. As far as I know, only ints and UUIDs make sense for pk.
While I'm not sure it would make sense in an AutoField
, string primary keys are possible. For example, you could use a language code (en
, fr
, de
, etc) as a primary key.
follow-up: 8 comment:7 by , 4 years ago
Replying to Ian Foote:
I don't think this field has to be very generic because DBs pk types are very limited. As far as I know, only ints and UUIDs make sense for pk.
While I'm not sure it would make sense in an
AutoField
, string primary keys are possible. For example, you could use a language code (en
,fr
,de
, etc) as a primary key.
Sure, my phrasing wasn't the best. What I meant is I think users only use numeric types and uuids for generic app-wide primary keys so I don't think it's necessary to allow other types for AutoField
, such as string. So instead of refactoring the entire AutoField
, another type - UUIDAutoField
could be added.
follow-up: 9 comment:8 by , 4 years ago
Replying to Tomasz Wójcik:
What I meant is I think users only use numeric types and uuids for generic app-wide primary keys so I don't think it's necessary to allow other types for
AutoField
, such as string. So instead of refactoring the entireAutoField
, another type -UUIDAutoField
could be added.
Could I argue that IDs based on human-readable strings, like coolname, are a valid use case?
It's easier to reference '"soft-cuddly-shrew-of-expertise" than "a1e1b1c5-5d5e-4eef-9ee9-e74e87ce71d6" or "2467584".
Would this be seen as bad practice and we should use a slugfield that's not the primary key for something like this instead?
comment:9 by , 4 years ago
Replying to Antonin Grêlé:
Replying to Tomasz Wójcik:
What I meant is I think users only use numeric types and uuids for generic app-wide primary keys so I don't think it's necessary to allow other types for
AutoField
, such as string. So instead of refactoring the entireAutoField
, another type -UUIDAutoField
could be added.
Could I argue that IDs based on human-readable strings, like coolname, are a valid use case?
It's easier to reference '"soft-cuddly-shrew-of-expertise" than "a1e1b1c5-5d5e-4eef-9ee9-e74e87ce71d6" or "2467584".
Would this be seen as bad practice and we should use a slugfield that's not the primary key for something like this instead?
Popular use case for a "readable uuid" like that is a randomly generated username. It's usually not a pk, just has a unique constraint, because you often perform many operations against users table and you don't want this table to be slow on joins. For other cases, why does it need to be readable at all?
To my understanding there's no significant performance hit from using UUID as a primary key. It is 4x larger than int32 (128bit). So also index is bigger, cache will store less data etc.
To my knowledge, varchar is a completely different story. It's way bigger and more difficult to sort, because it's very difficult to compare.
follow-up: 12 comment:11 by , 4 years ago
I've been involved in some of the changes and discussions related to AutoField
improvements and DEFAULT_AUTO_FIELD
and wanted to clarify a few things, as well as what needs to be done to achieve the creation of UUIDAutoField
.
Some points about DEFAULT_AUTO_FIELD
which was added in b5e12d490af3debca8c55ab3c1698189fdedbbdb:
- The current implementation restricts to values that are a subclass of
AutoField
. (See here.) - It was named
DEFAULT_AUTO_FIELD
to indicate that it referred to the automatically generatedid
field.- This was instead of
DEFAULT_AUTOFIELD
which would imply that it could only ever be related toAutoField
. - Another suggestion had been
DEFAULT_MODEL_PK
, but we weren't ready to allow other fields withoutAutoField
-like semantics.
- This was instead of
- We could lift the restriction to allow other non-
AutoField
types, but the concept of an automatically generatedid
field needs to be decoupled fromAutoField
a bit.
Some history and thoughts about AutoField
:
- Much of the logic related to the automatically generated
id
field is hardwired to look forAutoField
subclasses. This needs to be fixed. AutoField
has essentially always been like anIntegerField
, but historically copy-pasted some of that rather than inherit fromIntegerField
directly.BigAutoField
inherited fromAutoField
, but overrode with parts ofBigIntegerField
, but it didn't originally inherit fromBigIntegerField
.- As a result, these
*AutoFields
s didn't support the system checks or validators of their*IntegerField
equivalents. - I addressed some of this in 21e559495b8255bba1e8a4429cd083246ab90457 for #29979:
- Moved the logic about automatically generated fields into
AutoFieldMixin
. - Fixed the
*AutoField
classes to inherit fromAutoFieldMixin
and their appropriate*IntegerField
type. - Added
AutoFieldMeta
to maintain backward compatibility forisinstance(..., AutoField)
checks and provide a hook point for deprecation of it in the future.
- Moved the logic about automatically generated fields into
One more thing to mention about an automatically generated id
field is that the value is generated by the database and returned - it must not be generated in Python.
What needs to be done to get to UUIDAutoField
? Well, here is what I think the path is...
- Add a
db_generated
/is_auto
(or another better name) flag toAutoFieldMixin
. (See my comment. I'm not convinced #24042 should have been closed.) - Replace
isinstance(..., AutoField)
checks with the new flag. (I seem to recall thatisinstance(..., AutoFieldMixin)
wasn't really favoured.)- This will also open up
DEFAULT_AUTO_FIELD
to a wider range of non-integer automatically generated field types.
- This will also open up
- Deprecate
isinstance
(andissubclass
?) checks onAutoField
withinAutoFieldMeta
. (The metaclass can be scrapped when the deprecation period ends.) - Implement support for
.db_default
so that we can specify a function for generating a UUID on the database rather than in Python, e.g. RandomUUID. - Make
AutoFieldMixin
part of the public API and document it so that others can use it for their own non-integer, non-UUID implementations. - Finally, implement
UUIDAutoField
itself. This is a start, there will be more pieces to the puzzle:
class UUIDAutoField(AutoFieldMixin, UUIDField): def __init__(self, *args, **kwargs): kwargs['db_default'] = RandomUUID() super().__init__(*args, **kwargs) def deconstruct(self): name, path, args, kwargs = super().deconstruct() del kwargs['db_default'] return name, path, args, kwargs def get_internal_type(self): return 'UUIDAutoField' def rel_db_type(self, connection): return UUIDField().db_type(connection=connection)
It is likely that AutoFieldMixin.get_db_prep_value()
would need attention because of connection.ops.validate_autopk_value()
.
Note that this field would also be limited to databases that have a function to generate a UUID.
As Mariusz said, implementing a non-integer auto field such as UUIDAutoField
would close #17337.
Here are some responses to some of the other things mentioned in the comments above.
First, the default auto field BigAutoField is not subclassing AutoField so it's not consistent with the error message.
This is a little confusing as it is done a bit magically for now - see the bit about AutoFieldMeta
above.
There was, however, a bug for subclasses of SmallAutoField
and BigAutoField
which was fixed by 45a58c31e64dbfdecab1178b1d00a3803a90ea2d.
While I'm not sure it would make sense in an AutoField, string primary keys are possible. For example, you could use a language code (en, fr, de, etc) as a primary key.
Yes, you can create a textual primary key, but you have to override id
for every model, e.g.
id = models.CharField(max_length=64, primary_key=True)
It is unlikely that it'd make sense as an "AutoField
" unless you can generate the value on the database-side.
Could I argue that IDs based on human-readable strings, like coolname, are a valid use case?
It's easier to reference '"soft-cuddly-shrew-of-expertise" than "a1e1b1c5-5d5e-4eef-9ee9-e74e87ce71d6" or "2467584".
Would this be seen as bad practice and we should use a slugfield that's not the primary key for something like this instead?
Again, unless this were generated by the database, you'd probably be better off doing something like:
def generate_coolname(): ... # Implementation here. class MyModel(models.Model): id = models.CharField(default=generate_coolname, max_length=64, primary_key=True)
Guaranteeing uniqueness is always a fun challenge...
To my understanding there's no significant performance hit from using UUID as a primary key. It is 4x larger than int32 (128bit). So also index is bigger, cache will store less data etc.
I dredged up some articles on this topic I recalled seeing a while back:
- https://www.2ndquadrant.com/en/blog/sequential-uuid-generators/
- https://www.2ndquadrant.com/en/blog/sequential-uuid-generators-ssd/
And as I said, I'm happy to open a PR as I want to use it with UUIDs in my next project.
You can use UUID primary keys today. You just have to define a custom primary key so that the auto field isn't generated:
class MyModel(models.Model): id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
Granted, this is less convenient, but it is possible and documented.
comment:12 by , 3 years ago
Replying to Mariusz Felisiak:
Note: Fixing this ticket will also resolve #17337.
This would be an interesting development, and it will be great to make some progress on the nonrel front.
Replying to Nick Pope:
- It was named
DEFAULT_AUTO_FIELD
to indicate that it referred to the automatically generatedid
field.
- This was instead of
DEFAULT_AUTOFIELD
which would imply that it could only ever be related toAutoField
.- Another suggestion had been
DEFAULT_MODEL_PK
, but we weren't ready to allow other fields withoutAutoField
-like semantics.- We could lift the restriction to allow other non-
AutoField
types, but the concept of an automatically generatedid
field needs to be decoupled fromAutoField
a bit.
While the situation currently doesn't support it, having a DEFAULT_MODEL_PK
which accepts any field which sets primary_key=True
and unique=True
would be very very useful as there are lots of use cases, such as legacy and interoperability scenarios, where having the primary key generation logic in python is going to be necessary, and its good to hear that the situation is "were not ready" not "we don't want to do this".
- Add a
db_generated
/is_auto
(or another better name) flag toAutoFieldMixin
. (See my comment. I'm not convinced #24042 should have been closed.)
Reading through that, I agree, it feels like closing the ticket was based on the smallest possible interpretation of the issue.
Note that this field would also be limited to databases that have a function to generate a UUID.
This is another one of reasons why I think DEFAULT_MODEL_PK
(and thus lifting the need to have the database generate the primary key value), should be the ultimate goal, as someone maintaining a ULID Field library for use as a primary key, even if I go the extra mile and write/maintain multiple independent database functions/stored procedures (and all the extra management code to ensure they are loaded, updated, etc) for each supported database (currently Sqlite, MySQL, MariaDB, PostgreSQL, CockroachDB, with MSSQL on the horizon), I would "still" like to support generating the ULID value in python code for the broadest possible compatibility.
comment:13 by , 3 years ago
Cc: | added |
---|
comment:14 by , 3 years ago
Cc: | added |
---|
comment:15 by , 3 years ago
Cc: | added |
---|
comment:16 by , 3 years ago
In my opinion allowing users to use uuid as DEFAULT_AUTO_FIELD isn't good unless top db providers (Oracle, MS SQL, IBM, Postgres, SQLite) will support it in db as a kind of sequence and they will introduce uuid as auto primary key type.
DEFAULT_AUTO_FIELD is dedicated to create id field in every model. I can imagine some situation where UUID is better than INT as Primary Key, but in most cases it isn't. If it was, the top db providers would implement it a long time ago.
UUID is random generated value by default and it doesn't guarantee unique values like a sequence based on INT does. Generation a value is slower than from sequence.
AutoField in Django that can be DEFAULT_AUTO_FIELD uses db sequence that gives 100% unique values.
https://franckpachot.medium.com/uuid-aka-guid-vs-oracle-sequence-number-ab11aa7dbfe7
https://asktom.oracle.com/pls/apex/f?p=100:11:0::::p11_question_id:2570145300346198113
If UUIDAutoField is created in Django based on @Nick Pope proposal (RandomUUID), it will be worth to add in documentation that it doesn't qurantee that value is unique.
But, definitely, it is worth to allow users to create a custom AutoField based on any other field type (as reported in #17337), maybe except Text/Boolean types, and use it as PK. Anyone can then create their own UUIDAutoField and take responsibility for uniques.
See last comment here
"So yes, having a sequence PK (that stays hidden) and a GUID UK (that customers see) can be a good option"
comment:17 by , 3 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:18 by , 2 years ago
Cc: | added |
---|
comment:19 by , 2 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:20 by , 2 years ago
Cc: | added |
---|
follow-up: 22 comment:21 by , 2 years ago
"So yes, having a sequence PK (that stays hidden) and a GUID UK (that customers see) can be a good option"
It depends, on centralized databases this may be fine because everything is local, however on distributed systems (for example cockroachdb), having to handle incremental integer is a much slower (and exponentially slower with more nodes), because you basically need to coordinate all the nodes that can write data to stop, get the last sequence value, insert your line, increment the sequence and then unlock it (so you lock insert on the table during this time, even on others nodes from the cluster).
Example: https://github.com/cockroachdb/cockroach/issues/41258
comment:22 by , 2 years ago
Replying to Mathieu Poussin:
It depends, on centralized databases this may be fine because everything is local, however on distributed systems (for example cockroachdb), having to handle incremental integer is a much slower (and exponentially slower with more nodes), because you basically need to coordinate all the nodes that can write data to stop, get the last sequence value, insert your line, increment the sequence and then unlock it (so you lock insert on the table during this time, even on others nodes from the cluster).
Example: https://github.com/cockroachdb/cockroach/issues/41258
Good point, in distributed system it could be slower.
Maybe it is only cocroachdb problem? Why algorithm of coordination of nodes for sequence is slow? is it much slower then uuid or only little?
There is more questions to ask:
- What happen if in more than one node uuid will be the same for different data? do they have mechanism to sort it out?
- Is there more "writes" or "reads" from db? is "writing" really too slow in my project?
- why chose distributed database (cockroachdb) instead PostgresSQL (with sharding)?
and more questions, depending on project, to chose db, design correct db structure with correct field types, and avoid headache in future.
comment:23 by , 20 months ago
Cc: | added |
---|
comment:24 by , 19 months ago
Cc: | added |
---|
comment:25 by , 18 months ago
Cc: | added |
---|
comment:26 by , 13 months ago
Cc: | added |
---|
comment:27 by , 9 months ago
Small note that we might see a renewed interest for this feature with the upcoming support for UUIDv7 in Postgres.
Not that it changes a lot of the arguments made here against allowing this to be default but it seems that UUIDv7, due to fact it creates less sparse and fragmented btrees, performs reasonably well when compared to bigint.
comment:28 by , 7 months ago
Cc: | added |
---|
follow-up: 32 comment:29 by , 4 months ago
After 3 years I've decided to give it a try. It's very WIP, but I'd appreciate if someone was able to advise if this approach is feasible.
My reasoning:
- I still haven't found if all those
if isinstance(obj, AutoField)
can be migrated toAutoFieldMixin
check, my tests seem to pass - if it has to be generated in the DB, then PostgreSQL is the only DB (I think), that supports generating UUIDs, without any DB extensions
- if that's correct, then I'm not convinced if this implementation of the Postgres-only UUIDv4-db-generated-pk field should be part of Django or rather an external library. As a dev I wouldn't mind, but it might be confusing for others that are not using PostgreSQL and are looking for a similar solution
Please advise.
comment:30 by , 4 months ago
Cc: | added |
---|
comment:31 by , 4 months ago
Has patch: | set |
---|---|
Version: | 3.2 → dev |
follow-up: 34 comment:32 by , 4 months ago
Patch needs improvement: | set |
---|
Replying to Tomasz Wójcik:
After 3 years I've decided to give it a try. It's very WIP, but I'd appreciate if someone was able to advise if this approach is feasible.
Thank you for giving this a try!
Please note that the PR has test errors and lint issues, these should be addressed before progressing with the review. Once those are solved, please update the flags in this ticket following the contributing guidelines so this PR gets added back to the "branches needing review" section of the Django Developer Dahsboard.
comment:33 by , 3 months ago
Keywords: | uuid added |
---|---|
Owner: | set to |
Patch needs improvement: | unset |
Status: | new → assigned |
comment:34 by , 3 months ago
Hey Natalia Bidart, thanks for getting back to me. The CI is green, I can see it in the "Patches pending review" section of the dashboard. See you in the PR!
comment:35 by , 2 months ago
Patch needs improvement: | set |
---|
OK, yes, part of the discussion was possible extension to other kinds if field, so we can accept this as a new feature.
(To be clear, I consider it out-of-scope for the original ticket, and so not a release blocker for 3.2)