Code

Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#14820 closed (fixed)

Use `TextField` instead of `PositiveIntegerField` in docs and examples for generic relations.

Reported by: mrmachine Owned by: gabrielhurley
Component: Documentation Version: master
Severity: Keywords: generic relation genericforeignkey object_id type textfield sprintdec2010
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

The docs state that the object_id field on a generic model that is linked back to with GenericForeignKey fields must be the same type, usually PositiveIntegerField. This is not entirely true since [12353] which coerces the primary key value to the same type as the object_id field on the generic model.

This means that TextField should be the preferred type for object_id fields on generic models, because it will allow models with integer and non-integer (anything that can be coerced to a string) to link back to the generic model in question.

Django actually uses TextField (I presume, for this reason) for its own internal generic models (LogEntry and BaseCommentAbstractModel), so it makes sense that we should tell users how to avoid this old limitation and set a best practice by example.

Attachments (2)

14820-textfield-generic-relations-r14785.diff (8.1 KB) - added by mrmachine 3 years ago.
14820-textfield-generic-relations-r15449.diff (7.7 KB) - added by mrmachine 3 years ago.

Download all attachments as: .zip

Change History (22)

comment:1 Changed 3 years ago by mrmachine

  • Needs documentation unset
  • Needs tests unset
  • Owner changed from nobody to mrmachine
  • Patch needs improvement unset

I'll update with a patch shortly.

comment:2 Changed 3 years ago by mrmachine

  • Needs documentation set

comment:3 Changed 3 years ago by gabrielhurley

I'm not convinced that the docs should be amended as suggested. While I agree that for maximum flexibility a TextField will serve best, in the vast majority of cases a PositiveIntegerField is appropriate.

At best I could see a note in the docs indicating that generic reusable apps should use a TextField given that object_id's may be arbitrary strings.

The argument here comes down to the meaning of the word "usually". The "usual" case is that people have not overridden the default integer id behavior and a PositiveIntegerField is appropriate. Thus I am -1 on a change to the docs that says *all* cases should use a TextField for object_id.

comment:4 Changed 3 years ago by mrmachine

  • Has patch set
  • Needs documentation unset

I've uploaded a patch which uses TextField in the example instead of PositiveIntegerField and explains the limitation properly, which is to say that the object_id field doesn't have to be the same type as the primary key fields on the related models, but that it must be possible to coerce the primary key values to the same type as the object_id field with the field's get_db_prep_value method.

If you agree that TextField serves best for maximum flexibility, and the existing "majority of cases" function equally as well with TextField, but additionally the second most common case (CharField primary keys) also works, why wouldn't we want users to go with TextField from the get go?

comment:5 Changed 3 years ago by mrmachine

The patch also updates the rest of the file to wrap at 80 columns.

comment:6 Changed 3 years ago by mrmachine

  • Owner mrmachine deleted

comment:7 Changed 3 years ago by lrekucki

How about adding a section on GFKs in reusable apps at the bottom, instead? Also, while the statement about values coercible values is certainly accurate, it's not very helpful to the reader. Docs on get_db_prep_value won't help much either, so some example would be good.

comment:8 Changed 3 years ago by gabrielhurley

  • milestone set to 1.3
  • Patch needs improvement set
  • Triage Stage changed from Unreviewed to Accepted

Being realistic, whatever ends up in the example there is going to be copied and pasted and become the official recommendation despite whatever is said around it.

I suppose one of my problems with recommending the use of a TextField is that you're implicitly trading wider compatibility for a performance penalty (varying by DB): text columns often use more memory, can be less performant when returned by DB queries, and can have indexing woes depending on the database configuration... the best practice performance-wise is to use a field of the appropriate type (CharField, IntegerField, or PositiveIntegerField) if you know for a fact that the values will always be of that type.

What it all boils down to is that there isn't one right answer here.

If someone feels they have enough to say to write a whole section on best practices for using GFK's in reusable apps I'd love to see it.

If not, then I'd rather see this patch in reverse: keep the primary recommendation as "use a field of the appropriate type" and add a big solid note that authors of reusable apps may want to use a TextField in order to support both integer and non-integer PK's. The point is definitely important, though; I think it could have an admonition directive to call attention to it.

And in the end, I firmly believe there are more people reading this section of the docs just trying to make their own Django app work than there are folks trying to write reusable apps that are having bugs because their GFK didn't support a non-integer PK.

Changed 3 years ago by mrmachine

comment:9 Changed 3 years ago by mrmachine

  • Keywords sprintdec2010 added
  • Patch needs improvement unset

I've updated the patch as per your feedback. I don't have the setup to compile the docs and view the resulting HTML version, so I'm not 100% certain that I've formatted it correctly with regards to the admonition and versionchanged directives.

I changed the example to use IntegerField instead of PositiveIntegerField because the default for models is actually IntegerField, and PositiveIntegerField just adds some extra validation to that which isn't really relevant here.

I also use CharField in the the example scenario, as I believe it should not have the same database performance issues as TextField while still offering compatibility with integer and character primary keys, but also still mention that TextField provides maximum flexibility when you don't know which models will be related to yours, but with TextField comes the performance considerations you mentioned.

comment:10 Changed 3 years ago by claudep

I suggest also to add a note about adding an index to object_id. I just realized that slow queries in my app were related to the fact that this column was not indexed. I'm sure that most applications will benefit from this column being indexed.

comment:11 Changed 3 years ago by PaulM

This really is a bit of a sticky issue.

In the vast majority of cases, PositiveIntegerField is the correct choice [1]. In particular, it's going to be faster, support indexing, and generally use less memory.

TextField is really the wrong choice most of the time. Most backends implement it using "blob" storage or similar, which uses significantly more memory for smaller items, doesn't support efficient indexing, and is almost always less performant than the other options.

CharField is an interesting compromise. It is much closer to (Positive)IntegerField in terms of performance, but again, this varies on a backend basis. More problematically, it requires a specified length, which is going to vary significantly depending on the ID scheme in use. If we pick a "safe" recommended value of say, 128 characters, we waste significant database space on some backends, especially if the actual addressing scheme in use is a monotonically increasing positive integer (and we still don't support users who, for whatever reason, want to use 129 character values).

I agree that encouraging third-party developers to build reusable applications which will work across a broad variety of databases is a very good goal. In particular, as Django becomes easier to use with non-traditional databases, this sort of RDBMS-centric thinking will crop up more often. However, in this case I think that leaving the example with a PositiveIntegerField is probably the best option. It encourages new developers toward the choice which will be most performant (and compatible) in the current majority of cases. I also strongly support the addition of the pullout highlighting the issue, but cannot endorse generically recommending a Text or CharField in all situations.

I'd like to hear Alex Gaynor's thoughts on this issue.

[1] because on some backends, IntegerField actually is signed (not just checked in Django), and negative values are not a standard usage here.

comment:12 Changed 3 years ago by mrmachine

Thanks for your feedback, PaulM.

As far as I can tell, using PositiveIntegerField is the same as IntegerField, except for some additional form field validation when the model is used in a ModelForm class. I'm happy to leave the examples as using PositiveIntegerField if it really makes a practical difference anywhere, but I'm not sure that it does.

There won't be any performance benefit, because it's still the same integer field at the database backend. Only when people use the model in a ModelForm class AND a database backend that doesn't support negative integer primary keys (could you elaborate on this point?), they'll get some additional validation on that form field to prevent them typing in negative values. On the flip side, anyone using the model in a ModelForm and a database backend that DOES support negative integer primary keys will get some additional validation that they might not want.

At the end of the day, I think the form field validation for object_id fields is a non-issue. I think that in most cases, people will not assign values directly through model forms, and if they do they'll probably add their own validation to ensure that the value references an actual object in the target model (not just that it is positive). Especially if integer fields on some database backends do and some don't support negative integers, we should leave this validation out of it.

All I really care about though is adding a note that character and text fields can be used for greater compatibility, for authors of generic apps that won't know what types of objects people want to link their models through generic relations. If changing the examples from IntegerField back to PositiveIntegerField will get this change committed, I'm all for it :)

comment:13 Changed 3 years ago by Alex

Setting aside my hate for GenericForeignKeys for a moment. I'd say leave it as a PositiveIntegerField but throw a note below it.

comment:14 Changed 3 years ago by gabrielhurley

My bigger concern goes back to the fact that what goes into this section of the docs becomes a sort of "official recommendation". TextField is off the table, and I don't mind CharField, but Paul's concern about having to choose a length for a CharField is a valid one, and I don't have an answer there. What's the right compromise between "generic" and performant here?

comment:15 Changed 3 years ago by russellm

The compromise is to say, in no uncertain terms, "there's no such thing as a right answer here". Generic foreign keys are, by their very nature, hideously inefficient. It is absolutely application dependent what constitutes an 'appropriate' datatype, or an appropriate size if a CharField is warranted. I don't think we can do any better than to say "These examples use a PositiveIntegerField, which will be sufficient for storing any normal Django model with an integer primary key. However, you need to check your own requirements..."

Changed 3 years ago by mrmachine

comment:16 Changed 3 years ago by mrmachine

I've updated the patch to use PositiveIntegerField in the examples. Any objections, changes or additions to the admonition underneath?

comment:17 Changed 3 years ago by claudep

Sorry to chime in again, but I would still propose to add db_index=True in the TaggedItem example (object_id = models.PositiveIntegerField(db_index=True)). It seems to me good practice to add index on fields used to join tables, isn't it?

comment:18 Changed 3 years ago by gabrielhurley

  • Owner set to gabrielhurley
  • Resolution set to fixed
  • Status changed from new to closed

In [15545]:

Fixed #14820 -- Added more information to the generic relation docs regarding different choices for storing PK references for a GenericForeignKey. Thanks to mrmachine for the all the work on the patch.

comment:19 Changed 3 years ago by gabrielhurley

In [15546]:

[1.2.X] Fixed #14820 -- Added more information to the generic relation docs regarding different choices for storing PK references for a GenericForeignKey. Thanks to mrmachine for the all the work on the patch.

Backport of [15545] from trunk.

comment:20 Changed 3 years ago by jacob

  • milestone 1.3 deleted

Milestone 1.3 deleted

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.