Opened 2 years ago

Closed 7 months ago

Last modified 5 months ago

#19463 closed New feature (fixed)

Add UUID Field to core

Reported by: guettli Owned by: mjtamlyn
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: trbs@…, matt@…, mike@…, glic3rinu, cyphase@…, jonathan+django@…, tomek@…, saxix.rome@…, loic@…, galuszkak@…, ashwoods, anubhav9042@…, lukas-hetzenecker Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

on django-dev Dec 2012

If someone can come up with a good patch I'd be fine considering it for core.

Jacob (Kaplan-Moss)

Related: #4682 was closed five years ago.

I (Thomas Güttler) want to moderate this ticket, but won't create patch.

Change History (26)

comment:1 Changed 2 years ago by trbs

  • Cc trbs@… added
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 2 years ago by claudep

Note that in databases other than PostgreSQL, it might be desirable to store internally the UUID value as binary, not as a char, both for performance reasons and for compatibility with Postgres' uuid (stored as a 128 bits binary). So we might need to solve #2417 beforehand...

comment:3 Changed 2 years ago by schinckel

  • Cc matt@… added

One thing I found with my UUIDField is that I needed to supply code to enable South to (a) handle this field type on migrations, and (b) prevent it trying to create a default at the time a migration is run.

Specifically, https://bitbucket.org/schinckel/django-uuidfield/commits/69f7c0cdf91d28da2cceaff6f46ece34f733b560 shows how to do this.

I would assume we wouldn't want to have any code related to providing data to south in django core, so perhaps we would need to ensure that South releases a version around the same time as after this patch is included.

comment:4 Changed 2 years ago by carbonXT

  • Cc mike@… added

comment:5 Changed 2 years ago by akaariai

I've been thinking that we would likely want to have a new field type: GeneratedField. This is like AutoField - the field gets a value on save() if it doesn't already have a value, and this field type is always a primary key (I am not 100% sure of the PK requirement, but it could simplify things). GeneratedField would have a backing field (the db storage type) and some generator, where the generator could fetch the value from DB using RETURNING, could generate the value in Python (like default, but with access to connection), or it could fetch the value after save from the DB (AutoField does this using select currval(someseq) on some backends).

I think such a field type would cover a lot of requests we have currently - unsigned serial fields, tiny/big/...integer serial fields, UUID fields (no matter what the UUID generator function is), and likely some more.

I don't know how hard such a field will be to write, or what the exact API should be - so this is mostly hand waving at the moment. Still, it seems there are only two public API places where this would affect current code - model.save() and bulk_create(), so it seems this should not be totally out of reach as a feature.

comment:6 Changed 2 years ago by akaariai

  • Triage Stage changed from Unreviewed to Accepted

Quoting Jacob from the recent django-developers discussion: "If someone can come up with a good patch I'd be fine considering it for core.".

So, marking as accepted based on that.

comment:7 Changed 2 years ago by glic3rinu

  • Cc glic3rinu added

comment:8 Changed 2 years ago by cyphase

  • Cc cyphase@… added

comment:9 Changed 2 years ago by jonathan

  • Cc jonathan+django@… added

comment:10 Changed 23 months ago by oinopion

  • Cc tomek@… added

comment:11 Changed 22 months ago by saxix

  • Cc saxix.rome@… added

comment:12 Changed 22 months ago by loic84

  • Cc loic@… added

Big +1 on @akaariai's GeneratedField idea.

For example I use extensively what I call a "readable unique ID", similar to YouTube video IDs (i.e. "sc5vraPpTcA"), for which I made a custom Field. It functions like a UUID but trades the creation convenience (guaranteed uniqueness) for usage convenience (being able to read it out loud, shorter URL, etc.). A GeneratedField would allow me to implement that cleanly.

That said, some databases have native support for UUIDs and it's pretty much the standard for sharding, so we could have the generic GeneratedField and a UUIDField subclass.

I'd work on a patch with some guidance from @akaariai.

Last edited 22 months ago by loic84 (previous) (diff)

comment:13 Changed 17 months ago by galuszkak

  • Cc galuszkak@… added
  • Version changed from 1.4 to master

comment:14 Changed 14 months ago by ashwoods

  • Cc ashwoods added

comment:15 Changed 14 months ago by mjtamlyn

  • Owner changed from nobody to mjtamlyn
  • Status changed from new to assigned

For postgres at least, this will form part of my upcoming work on django.contrib.postgres. Support for bigserial is also likely to come in with that, so a more general base class for AutoField might be useful. That said, a UUIDField does not always want to be autogenerated (unlike an autoincrementing which probably should be) - it is a reasonable use case for an API client to generate a uuid (using the uuid4 approach which has a very high probability of avoiding clashes) and expect that to be saved by a Django backed API.

Supporting a simple UUIDField(default=uuid.uuid4) should be a good start.

comment:16 Changed 12 months ago by japrogramer@…

I have written a UUID Field for django that supports 1.7 and its features, migrations serialization etc.
The field can be set with a UUID instance, either a hyphenated str or one that is not. also it can be created with bytes if that is needed. It can auto generate the uuid aka uuid4 and supports the other variants that python's uuid module offers (1,3,4,5). Queries work with either str or UUID instances but not with bytes because who is ever going to query by the bytes, em I right? https://github.com/japrogramer/django-uuid-contour
P.S.
Many tests are included and supports python 3.4 ;)

comment:17 Changed 9 months ago by mjtamlyn

  • Has patch set

comment:18 Changed 9 months ago by mjtamlyn

PR updated to be in core rather than contrib.postgres.

comment:19 Changed 8 months ago by akaariai

  • Patch needs improvement set

There seems to be one issue that needs solving: should we use SubfieldBase or not? SubfieldBase is used so that the field's to_python method is called any time a value is assigned to a model instance. In particular this happens when setting a value in model.__init__. So, if a database value is just bytes or string, then when the model is initialized from the database we get correctly UUID instance in the uuid field because to_python is called.

There isn't any field in core that uses to_python. There are some disadvantages when using to_python:

  1. It doesn't work when using .values('uuid_field')
  2. There is a small performance penalty when setting the field value, in particular model.init will be 10-20% slower for each field that uses SubfieldBase.
  3. Fields with subfieldbase work a bit differently from other core fields. SubfieldBase fields do value conversion on assignment, so:
     >>> s = SomeModel()
     >>> s.uuid_field = "f47ac10b-58cc-4372-a567-0e02b2c3d479"
     >>> s.uuid_field
     OUT: uuid("f47ac10b-58cc-4372-a567-0e02b2c3d479") when using SubfieldBase
     OUT: "f47ac10b-58cc-4372-a567-0e02b2c3d479" when not using SubfieldBase
    

Now, one could consider this to be a feature. But, no other field in core or contrib does this kind of conversion on assignment, so we should avoid this if possible.

Other ways forward are:

  1. Add a more generic field value conversion framework: add field.from_db_value(value, connection). This is a larger amount of work, but is needed in any case. This solution would work in .values(), and it would also be considerably faster than the current SubfieldBase way of doing things. Unfortunately this means that we can't merge this ticket before we have added the from_db_value method.
  2. Use backend specific converters. Unfortunately it seems one needs to create custom compilers for each backend (see django/db/backends/oracle/compiler.py for example)

So, in the end there seems to be just two choices: wait for field.from_db_value() or use SubfieldBase (with the possibility of removing use of SubfieldBase when field.from_db_value is introduced).

I'll mark patch needs improvement for lack of better marker that this isn't ready for merge before we agree on a solution on the SubfieldBase issue.

comment:20 Changed 8 months ago by coder9042

  • Cc @… added

comment:21 Changed 8 months ago by coder9042

  • Cc anubhav9042@… added; @… removed

comment:22 Changed 8 months ago by timgraham

  • Patch needs improvement unset
  • Triage Stage changed from Accepted to Ready for checkin

comment:23 Changed 7 months ago by lukas-hetzenecker

  • Cc lukas-hetzenecker added

comment:24 Changed 7 months ago by Marc Tamlyn <marc.tamlyn@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In ed7821231b7dbf34a6c8ca65be3b9bcbda4a0703:

Fixed #19463 -- Added UUIDField

Uses native support in postgres, and char(32) on other backends.

comment:25 Changed 5 months ago by deronnax

What about MariaDB which now supports UUID ?

comment:26 Changed 5 months ago by charettes

@deronnax please open a new feature request instead of commenting on a closed ticket.

Note: See TracTickets for help on using tickets.
Back to Top