Opened 6 months ago

Last modified 4 months ago

#32577 new New feature

Add support for `UUIDAutoField` `DEFAULT_AUTO_FIELD`

Reported by: Tomasz Wójcik Owned by: nobody
Component: Database layer (models, ORM) Version: 3.2
Severity: Normal Keywords:
Cc: Diego Lima Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Tomasz Wójcik)

Default auto field introduced in #31007, as seen in the docs, is BigAutoField.

class BigAutoField(AutoFieldMixin, BigIntegerField):
    def get_internal_type(self):
        return 'BigAutoField'

    def rel_db_type(self, connection):
        return BigIntegerField().db_type(connection=connection)

Many projects prefer UUID field over big int for pk. So far we had to use a mixin for that but I am under the impression DEFAULT_AUTO_FIELD exists to lift this burden.

I'd expect something like this to work (additional adjustments might be needed)

from django.db import models
from django.db.models.fields import AutoFieldMixin


class UUIDAutoField(AutoFieldMixin, models.UUIDField):
    def get_internal_type(self):
        return 'UUIDAutoField'

    def rel_db_type(self, connection):
        return models.UUIDField().db_type(connection=connection)

But if I use this for DEFAULT_AUTO_FIELD I get the error

ValueError: Primary key 'common.fields.UUIDAutoField' referred by DEFAULT_AUTO_FIELD must subclass AutoField.

I noticed two things I want to share.

First, the default auto field BigAutoField is not subclassing AutoField so it's not consistent with the error message.

Second, AutoField is subclassing IntegerField. If I understand this correctly, users are limited to subclass IntegerField for DEFAULT_AUTO_FIELD. If so, there's no way to implement UUIDAutoField as DEFAULT_AUTO_FIELD.

I understand auto fields need to share a common interface but having to subclass IntegerField is a big constraint.

I'm happy to open a PR once I know what's the desired functionality. Possible solutions I see:

  • enforce subclassing of AutoFieldMixin instead AutoField
  • I don't think this field has to be very generic because DBs pk types are very limited. As far as I know, only ints and UUIDs make sense for pk. Maybe adding UUIDAutoField to django fields would make sense. That way it'd have to enforce subclass of AutoField or UUIDAutoField

Mentioned also in here, here and here.

Django==3.2rc1

Change History (12)

comment:1 Changed 6 months ago by Tomasz Wójcik

Summary: Don't enforce `DEFAULT_AUTO_FIELD` to subclass from `AutoField`Add support for `UUIDAutoField` `DEFAULT_AUTO_FIELD`

comment:2 Changed 6 months ago by Tomasz Wójcik

Description: modified (diff)

comment:3 Changed 6 months ago by Carlton Gibson

Triage Stage: UnreviewedAccepted
Type: UncategorizedNew feature

OK, yes, part of the discussion was possible extension to other kinds if field, so we can accept this as a new feature.

(To be clear, I consider it out-of-scope for the original ticket, and so not a release blocker for 3.2)

comment:4 Changed 6 months ago by Tomasz Wójcik

I understand it's out of scope for 3.2 as you already have a RC though I think docs could mention it has to subclass AutoField.

The type of this implicit primary key can now be controlled via the DEFAULT_AUTO_FIELD setting and AppConfig.default_auto_field attribute. No more needing to override primary keys in all models.

This could be more specific as it doesn't apply to UUID pk projects.


I'm glad it's accepted as a feature request, thanks. I'm looking forward to hearing how you want this to be tackled. And as I said, I'm happy to open a PR as I want to use it with UUIDs in my next project.

comment:5 Changed 6 months ago by Diego Lima

Cc: Diego Lima added

comment:6 Changed 6 months ago by Ian Foote

I don't think this field has to be very generic because DBs pk types are very limited. As far as I know, only ints and UUIDs make sense for pk.

While I'm not sure it would make sense in an AutoField, string primary keys are possible. For example, you could use a language code (en, fr, de, etc) as a primary key.

comment:7 in reply to:  6 ; Changed 6 months ago by Tomasz Wójcik

Replying to Ian Foote:

I don't think this field has to be very generic because DBs pk types are very limited. As far as I know, only ints and UUIDs make sense for pk.

While I'm not sure it would make sense in an AutoField, string primary keys are possible. For example, you could use a language code (en, fr, de, etc) as a primary key.

Sure, my phrasing wasn't the best. What I meant is I think users only use numeric types and uuids for generic app-wide primary keys so I don't think it's necessary to allow other types for AutoField, such as string. So instead of refactoring the entire AutoField, another type - UUIDAutoField could be added.

comment:8 in reply to:  7 ; Changed 5 months ago by Antonin Grêlé

Replying to Tomasz Wójcik:

What I meant is I think users only use numeric types and uuids for generic app-wide primary keys so I don't think it's necessary to allow other types for AutoField, such as string. So instead of refactoring the entire AutoField, another type - UUIDAutoField could be added.

Could I argue that IDs based on human-readable strings, like coolname, are a valid use case?

It's easier to reference '"soft-cuddly-shrew-of-expertise" than "a1e1b1c5-5d5e-4eef-9ee9-e74e87ce71d6" or "2467584".

Would this be seen as bad practice and we should use a slugfield that's not the primary key for something like this instead?

comment:9 in reply to:  8 Changed 5 months ago by Tomasz Wójcik

Replying to Antonin Grêlé:

Replying to Tomasz Wójcik:

What I meant is I think users only use numeric types and uuids for generic app-wide primary keys so I don't think it's necessary to allow other types for AutoField, such as string. So instead of refactoring the entire AutoField, another type - UUIDAutoField could be added.

Could I argue that IDs based on human-readable strings, like coolname, are a valid use case?

It's easier to reference '"soft-cuddly-shrew-of-expertise" than "a1e1b1c5-5d5e-4eef-9ee9-e74e87ce71d6" or "2467584".

Would this be seen as bad practice and we should use a slugfield that's not the primary key for something like this instead?

Popular use case for a "readable uuid" like that is a randomly generated username. It's usually not a pk, just has a unique constraint, because you often perform many operations against users table and you don't want this table to be slow on joins. For other cases, why does it need to be readable at all?

To my understanding there's no significant performance hit from using UUID as a primary key. It is 4x larger than int32 (128bit). So also index is bigger, cache will store less data etc.

To my knowledge, varchar is a completely different story. It's way bigger and more difficult to sort, because it's very difficult to compare.

comment:10 Changed 5 months ago by Mariusz Felisiak

Note: Fixing this ticket will also resolve #17337.

comment:11 Changed 5 months ago by Nick Pope

I've been involved in some of the changes and discussions related to AutoField improvements and DEFAULT_AUTO_FIELD and wanted to clarify a few things, as well as what needs to be done to achieve the creation of UUIDAutoField.


Some points about DEFAULT_AUTO_FIELD which was added in b5e12d490af3debca8c55ab3c1698189fdedbbdb:

  • The current implementation restricts to values that are a subclass of AutoField. (See here.)
  • It was named DEFAULT_AUTO_FIELD to indicate that it referred to the automatically generated id field.
    • This was instead of DEFAULT_AUTOFIELD which would imply that it could only ever be related to AutoField.
    • Another suggestion had been DEFAULT_MODEL_PK, but we weren't ready to allow other fields without AutoField-like semantics.
  • We could lift the restriction to allow other non-AutoField types, but the concept of an automatically generated id field needs to be decoupled from AutoField a bit.

Some history and thoughts about AutoField:

  • Much of the logic related to the automatically generated id field is hardwired to look for AutoField subclasses. This needs to be fixed.
  • AutoField has essentially always been like an IntegerField, but historically copy-pasted some of that rather than inherit from IntegerField directly.
  • BigAutoField inherited from AutoField, but overrode with parts of BigIntegerField, but it didn't originally inherit from BigIntegerField.
  • As a result, these *AutoFieldss didn't support the system checks or validators of their *IntegerField equivalents.
  • I addressed some of this in 21e559495b8255bba1e8a4429cd083246ab90457 for #29979:
    • Moved the logic about automatically generated fields into AutoFieldMixin.
    • Fixed the *AutoField classes to inherit from AutoFieldMixin and their appropriate *IntegerField type.
    • Added AutoFieldMeta to maintain backward compatibility for isinstance(..., AutoField) checks and provide a hook point for deprecation of it in the future.

One more thing to mention about an automatically generated id field is that the value is generated by the database and returned - it must not be generated in Python.


What needs to be done to get to UUIDAutoField? Well, here is what I think the path is...

  • Add a db_generated/is_auto (or another better name) flag to AutoFieldMixin. (See my comment. I'm not convinced #24042 should have been closed.)
  • Replace isinstance(..., AutoField) checks with the new flag. (I seem to recall that isinstance(..., AutoFieldMixin) wasn't really favoured.)
    • This will also open up DEFAULT_AUTO_FIELD to a wider range of non-integer automatically generated field types.
  • Deprecate isinstance (and issubclass?) checks on AutoField within AutoFieldMeta. (The metaclass can be scrapped when the deprecation period ends.)
  • Implement support for .db_default so that we can specify a function for generating a UUID on the database rather than in Python, e.g. RandomUUID.
    • Work is ongoing in this PR to address #470 which will implement this.
    • It is also possible that this PR to address #30032 is an acceptable alternative path.
  • Make AutoFieldMixin part of the public API and document it so that others can use it for their own non-integer, non-UUID implementations.
  • Finally, implement UUIDAutoField itself. This is a start, there will be more pieces to the puzzle:
class UUIDAutoField(AutoFieldMixin, UUIDField):

    def __init__(self, *args, **kwargs):
        kwargs['db_default'] = RandomUUID()
        super().__init__(*args, **kwargs)

    def deconstruct(self):
        name, path, args, kwargs = super().deconstruct()
        del kwargs['db_default']
        return name, path, args, kwargs

    def get_internal_type(self):
        return 'UUIDAutoField'

    def rel_db_type(self, connection):
        return UUIDField().db_type(connection=connection)

It is likely that AutoFieldMixin.get_db_prep_value() would need attention because of connection.ops.validate_autopk_value().

Note that this field would also be limited to databases that have a function to generate a UUID.

As Mariusz said, implementing a non-integer auto field such as UUIDAutoField would close #17337.


Here are some responses to some of the other things mentioned in the comments above.

First, the default auto field BigAutoField is not subclassing AutoField so it's not consistent with the error message.

This is a little confusing as it is done a bit magically for now - see the bit about AutoFieldMeta above.
There was, however, a bug for subclasses of SmallAutoField and BigAutoField which was fixed by 45a58c31e64dbfdecab1178b1d00a3803a90ea2d.

While I'm not sure it would make sense in an AutoField, string primary keys are possible. For example, you could use a language code (en, fr, de, etc) as a primary key.

Yes, you can create a textual primary key, but you have to override id for every model, e.g.

id = models.CharField(max_length=64, primary_key=True)

It is unlikely that it'd make sense as an "AutoField" unless you can generate the value on the database-side.

Could I argue that IDs based on human-readable strings, like ​coolname, are a valid use case?

It's easier to reference '"soft-cuddly-shrew-of-expertise" than "a1e1b1c5-5d5e-4eef-9ee9-e74e87ce71d6" or "2467584".

Would this be seen as bad practice and we should use a slugfield that's not the primary key for something like this instead?

Again, unless this were generated by the database, you'd probably be better off doing something like:

def generate_coolname():
    ...  # Implementation here.

class MyModel(models.Model):
    id = models.CharField(default=generate_coolname, max_length=64, primary_key=True)

Guaranteeing uniqueness is always a fun challenge...

To my understanding there's no significant performance hit from using UUID as a primary key. It is 4x larger than int32 (128bit). So also index is bigger, cache will store less data etc.

I dredged up some articles on this topic I recalled seeing a while back:

And as I said, I'm happy to open a PR as I want to use it with UUIDs in my next project.

You can use UUID primary keys today. You just have to define a custom primary key so that the auto field isn't generated:

class MyModel(models.Model):
    id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)

Granted, this is less convenient, but it is possible and documented.

comment:12 in reply to:  11 Changed 4 months ago by Samuel Bishop

Replying to Mariusz Felisiak:

Note: Fixing this ticket will also resolve #17337.

This would be an interesting development, and it will be great to make some progress on the nonrel front.


Replying to Nick Pope:

  • It was named DEFAULT_AUTO_FIELD to indicate that it referred to the automatically generated id field.
    • This was instead of DEFAULT_AUTOFIELD which would imply that it could only ever be related to AutoField.
    • Another suggestion had been DEFAULT_MODEL_PK, but we weren't ready to allow other fields without AutoField-like semantics.
  • We could lift the restriction to allow other non-AutoField types, but the concept of an automatically generated id field needs to be decoupled from AutoField a bit.

While the situation currently doesn't support it, having a DEFAULT_MODEL_PK which accepts any field which sets primary_key=True and unique=True would be very very useful as there are lots of use cases, such as legacy and interoperability scenarios, where having the primary key generation logic in python is going to be necessary, and its good to hear that the situation is "were not ready" not "we don't want to do this".

  • Add a db_generated/is_auto (or another better name) flag to AutoFieldMixin. (See my comment. I'm not convinced #24042 should have been closed.)

Reading through that, I agree, it feels like closing the ticket was based on the smallest possible interpretation of the issue.

Note that this field would also be limited to databases that have a function to generate a UUID.

This is another one of reasons why I think DEFAULT_MODEL_PK (and thus lifting the need to have the database generate the primary key value), should be the ultimate goal, as someone maintaining a ULID Field library for use as a primary key, even if I go the extra mile and write/maintain multiple independent database functions/stored procedures (and all the extra management code to ensure they are loaded, updated, etc) for each supported database (currently Sqlite, MySQL, MariaDB, PostgreSQL, CockroachDB, with MSSQL on the horizon), I would "still" like to support generating the ULID value in python code for the broadest possible compatibility.

Note: See TracTickets for help on using tickets.
Back to Top