Multi-Column Primary Key support
Relational database designs use a set of columns as the primary key for a table. When this set includes more than one column, it is known as a “composite” or “compound” primary key. (For more on the terminology, here is an article discussing database keys).
Currently Django models only support a single column in this set, denying many designs where the natural primary key of a table is multiple columns. Django currently can't work with these schemas; they must instead introduce a redundant single-column key (a “surrogate” key), forcing applications to make arbitrary and otherwise-unnecessary choices about which key to use for the table in any given instance.
This page discusses how to have Django support these composite primary keys. There are a lot of details to get right here, but done right, it would allow for more flexibility and potential simplicity in data modeling.
Current Status
Current state is that the issue is accepted/assigned and being worked on, and there is a partial implementation at http://github.com/dcramer/django-compositepks. The implementation allows having composite primary keys. However, support for composite keys is missing in ForeignKey and RelatedManager. As a consequence, it isn't possible to navigate relationships from models that have a composite primary key.
Discussions:
- http://groups.google.com/group/django-developers/browse_thread/thread/eac7c296c3797f5a for David Cramer's initial patch
- http://groups.google.com/group/django-developers/browse_thread/thread/4b2370a0652d9135 for the composite foreign key API design
Major Issues
Taken from Jacob's comment on Ticket 373 There are three basic problems in dealing with composite primary keys in Django.
- A number of APIs use "obj._meta.pk" to access the primary key, on the assumption it is a single field (for example, to do "pk=whatever" lookups). A composite PK implementation would need to emulate this in some way to avoid breaking everything.
- A number of things use (content_type_id, object_pk) tuples to refer to some object -- look at the comment framework, or the admin log API. Again, a composite PK system would need to somehow not break this.
- Admin URLs; they're of the form "/app_label/module_name/pk/"; there would need to be a way to map URLs to objects with a set of columns for the primary key.
mjm: I'm not very familiar with Django's internals, but I suspect the first point's answer lies somewhere near the observation that a CK needs a compound data structure.
bignose: In Python, the obvious choice is to make the primary key a set (a set of columns, in the model definition; a set of values, for a given instance). A compound key does not have an implied order, and a Python set is the obvious type to represent this.
gcc: @bignose: there is often only one correct order. RITA uses (site_id,local_id) as a composite PK, you cannot reverse them! The values might sometimes accidentally be the same too, e.g. (1801,1801) is a valid RITA ID that might happen one day. A tuple sounds more appropriate to me.
mjm: In my work using sequences I chose - not without some pain - to make all keys be sequences (and all keys' names were sequences, too). This is at least consistent, and the changes were straightforward. The fact that strings are themselves sequences actually made it more difficult, since overlooked cases would (often) misbehave rather than tossing an exception immediately - a point in favor of using mappings, maybe.
mjm: As for admin URLs, obviously a choice has to be made, and some existing cases will change. I wouldn't call the result "broken", since only hand-generated admin URLs would have a problem (well, bookmarked ones, it occurs to me, if anyone actually does that). Is anyone really concerned about making admin URLs slightly less easily hand-generatable?
bignose: The solution should at least retain the RESTful property that the URL is transparently correlated to what is being referenced. That's true of admin URLs now, and it is valuable that it be retained when changed to accommodate compound keys.
Proposed Solutions
Proposed solutions for the admin URL issue:
"/app_label/module/pk1,pk2/" -- does not support text columns which include a comma. URL encoding to the rescue!
"/app_label/module/pk1/pk2/" -- With support for a generic view on just "/pk1/" (like the date based generic view)
mjm: From Adrian's remark about how greedy the admin URL matching is, I take it that the entire tail is taken to be the key matter. This wouldn't change: the admin code decides how it will recognize key part boundaries, and it generates URLs that conform to that. The discussion about allowing a partial key spec to select a set (rather than a single record) seems to me a diversion at this level. It does point out a [some?] non-unique selection criteria that might be especially useful to have supported automatically. I would expect a selection based on matching one (or some) of a CK's columns to look the same as if that selection were configured for non-PK column(s), I think.
Alternative methods
- notnotpeter: Currently, you can "fake" it by declaring one of the keys to be primary in Django and adding a unique constraint to the model. (needs more info...examples?)
- mjm: This only works when there is a non-compound unique key to use, if I understand what's being proposed here. As such, it may be workable as a way to squeeze a design that naturally has CKs into Django, but it's of no use for working with an existing schema that has only the CK.
- djansoft: It can be done using unique-together.
- Tobu: You can't use just one key. A use case for illustration: a NamespacedTag model with a CK made of a namespace and a name. The primary key can't be just the namespace or just the name since that would be ambiguous. The only solution (and I dislike it since it is bad modelling and forces database accesses to be serialized on a counter) is to use an AutoField.
- gcc: @notnotpeter, @djansoft: in RITA, there would be thousands of records with the same site_id or the same local_id in some tables. Neither column could sensibly be made the unique key. similarly, my current project has a lesson translation table where the key is (lesson_id, language).
- toszter: Call me nutty, but why not declare any number of columns as primary key in your model, which django then uses to create a "hidden" pk_composite column that's just a hash of the three pk values in the row? When you do a lookup on a pk, you assume the ENTIRE combination of values is the primary key, nothing more or less. So when you look up according to the values needed to create the full pk_composite, the hash always computes and if the hash algorithm is public, can be used off the URL, the querystring in db lookups, and just about anywhere. Seems pretty simple in my mind but then again I am not going into great detail here.
- gcc: @toszter: the reverse lookup (hash -> matching records) is extremely expensive when the table is large. e.g. in RITA, you might have to compute the hash of a million records every time you want to retrieve one, i.e. every time you access http://rita.example.com/admin/transaction/<hash>/
Links / Notes
http://code.djangoproject.com/ticket/373 Rails/ActiveRecord doesn't support this natively, but http://compositekeys.rubyforge.org/ is an add-on which does DataMapper http://datamapper.org is an alternative to ActiveRecord that allows Rails to work with compound keys.