Opened 3 years ago

Last modified 8 months ago

#24317 new Cleanup/optimization

Deprecate field.rel, replace it with real field instances

Reported by: Anssi Kääriäinen Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: cmawebsite@…, mmitar@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The problem is that when using get_fields(), you'll get either a field.rel instance (for reverse side of user defined fields), or a real field instance (for example ForeignKey). These behave differently, so that the user must always remember which one he is dealing with. This creates lots of non-necessary conditioning in multiple places of Django.

For example, the select_related descent has one branch for descending foreign keys and one to one fields, and another branch for descending to reverse one to one fields. Conceptually both one to one and reverse one to one fields are very similar, so this complication is non-necessary.

The idea is to deprecate field.rel, and instead add field.remote_field. The remote_field is just a field subclass, just like everything else in Django.

The benefits are:

  • Conceptual simplicity - dealing with fields and rels is non-necessary, and confusing. Everything from get_fields() should be a field.
  • Code simplicity - no special casing based on if a given relation is described by a rel or not
  • Code reuse - ReverseManyToManyField is in most regard exactly like ManyToManyField

The expected problems are mostly from 3rd party code. Users of _meta that already work on expectation of getting rel instances will likely need updating. Those users who subclass Django's fields (or duck-type Django's fields) will need updating. Examples of such projects include django-rest-framework and django-taggit.

Very work-in-progress code available from https://github.com/akaariai/django/tree/virtual_rel_field

Change History (15)

comment:1 Changed 3 years ago by Tim Graham

Triage Stage: UnreviewedAccepted

comment:2 Changed 3 years ago by Anssi Kääriäinen

My current thoughts of how relation fields should work is:

A relation in Django consits of:

  • The field itself
  • A descriptor to access the objects of the relation
  • The descriptor might need a custom manager
  • Possibly a remote field (the field to travel the relation in other direction)
  • The remote field can contain a descriptor and a manager
  • For deprecation period, field.rel is something like the remote field, but without actually being a field instance.

The loading order is as follows:

  • The relation field is created as part of loading the class (or completely separately by migrations for example).
  • The relation field is added to the class (contribute_to_class is called). The field is added to model's _meta.
  • When both the origin and the target classes are present, the remote_field is created and the descriptors are added. The remote field is added to the target class' _meta (note: the target class' _meta wasn't touched before the remote fields refactor!)
  • For migrations it is possible that a model is replaced live in the app-cache. For example, assume model Author is changed, and it is thus reloaded. Model Book has foreign key to Author, so its reverse field must be recreated in the Author model, too. The way this is done is that we collect all fields that have been auto-created as relationships into the Author model, and recreate the related field once Author has been reloaded.

I also thought of making field.rel actually a field subclass. But I'm not too enthusiastic of doing this, this ties the new implementations to the old Rel API. It seems better to write the new remote fields from scratch, and then make them as much as possible Rel like for deprecation period (the problem being that field.rel instances are what you get from _meta.get_field[s]() calls).

Note that even if this ticket is accepted, this work is still in early stages, and it isn't that clear how we actually want to do this. Maybe a DEP is in order for this? However, it is clear to me that we want to do *some* cleanup. It isn't nice at all that you can get instances that implement different APIs from get_field() and friends.

comment:3 Changed 3 years ago by Collin Anderson

This should fix #897.

comment:4 Changed 3 years ago by Collin Anderson

Cc: cmawebsite@… added

comment:5 Changed 3 years ago by Anssi Kääriäinen

I've been working on this, with an approach where I completely deprecate field.rel with new reverse field instances.

Unfortunately this seems to be an dead end for multiple reasons:

  • Patch implementing this would be huge: I'm currently at around 1700 lines changed, and I still have probably at least 500 lines to change. These changes aren't systematic or easy to verify.
  • There is a huge risk of regressions, because of the size of the patch, and because it is nearly impossible to verify changes in the related fields code.
  • 3rd party software likely relies a lot on field.rel introspection.

So, instead of trying to introduce a new field instance I think a better approach is to do incremental changes to the code. The idea is to continuously keep the field.rel as backwards compatible as possible while making it more like a relation field. The end result should be a situation where we can make field.rel a real field subclass.

comment:6 Changed 3 years ago by Anssi Kääriäinen

Every time I start working on any larger refactor I end up breaking Django's own code so badly I have to restart my work.

So, in short, the only sensible way forward is to do minor changes to the related fields API over time, and use deprecations where possible.

The main goals are:

  • Clean up the main API, that is the API defined by ForeignObject
  • Make the "Rel" instances respect the same API
  • Finally make "Rel" instances to be full Field instances (including registration to models)

The first step is to investigate all the "related_fields, foreign_related_fields, ..." attributes. It might suffice we have just one attribute: data_fields. To get what related_fields currently is, use zip(self.data_fields, self.remote_field.data_fields). The rest of the attributes can be constructed from the zipped version, self.data_fields or self.remote_field.data_fields.

Any takers for this clean-up? If not, then I'll tackle it myself when time permits.

comment:9 Changed 2 years ago by Anssi Kääriäinen

The advice I can give is that you should:

  1. Find places where rield.remote_field responds to different API than Field.
  2. Fix these one at a time while trying to have backwards compat, even if the API isn't public.

In addition, simplifications to the APIs are welcome, as is a high level documentation of how related fields actually work.

We need to try to keep backwards compat as many projects are forced to use the private APIs.

But most of all, do small incremental changes.

comment:11 in reply to:  9 Changed 2 years ago by Asif Saifuddin Auvi

Replying to akaariai:

The advice I can give is that you should:

  1. Find places where rield.remote_field responds to different API than Field.
  2. Fix these one at a time while trying to have backwards compat, even if the API isn't public.

In addition, simplifications to the APIs are welcome, as is a high level documentation of how related fields actually work.

We need to try to keep backwards compat as many projects are forced to use the private APIs.

But most of all, do small incremental changes.

is this PR related to this ticket?

https://github.com/django/django/pull/4241/files

comment:12 Changed 23 months ago by Thomas Güttler

I hope I am not coming to late here.

Example: Model Detail has a ForeignKey to Master.

If I call get_fields() on Master. I want to see class names which everyone immediately understands.

Up to now you get something like "ManyToOneRel to Detail". The meaning if it is correct, but not immediately clear.

What about "ReverseForeignKey"?

Related: http://dba.stackexchange.com/questions/126116/name-of-foreignkey-back-relation/126157#126157

comment:13 Changed 23 months ago by Sven R. Kunze

@guettli

Even shorter: BackReference?

comment:14 Changed 23 months ago by Tim Graham

I believe the idea is that if you access a reverse foreign key, you'll simply get the foreign key field of the other model.

comment:15 Changed 21 months ago by Mitar

Cc: mmitar@… added

comment:16 Changed 10 months ago by Asif Saifuddin Auvi

should the virtual field #16508 need to be addressed first? My analysis seems that should be first

comment:17 Changed 8 months ago by Gabriel Canto de Souza

I also had a huge difficulty sorting out the fields of ._meta.get_fields() , what I was trying to achieve was get ONLY the reverse fields (because in our context every relationship is bidirectional, and everything is generic/auto generated)

This gets really complicated with m2m that contain through tables, because not only you get the reverse m2m remote_field, but also the m2o relationship with the through table, so no only you have to figure out these are reverse, but group both the m2m and m2o that revert to the same relationship.

This is the code I managed to come up with to get only the reverse rels:

 #I'm sure there's a better way of doing this
 #This is the documented way of getting reverse fields, but it also gets ManyToOne rel with the through table of forward m2m rels, so we have to filter them
 reverse_fields = [f for f in opts.get_fields() if f.auto_created and not f.concrete]
 reverse_m2m = [f for f in reverse_fields if f.__class__.__name__ == 'ManyToManyRel' and f.on_delete == None and not getattr(model,f.field.name,False)]
 #Also filtering the ManyToOne rel with the through table of the reverse m2m rels, because we need to differentiate them from reverse ManyToOne with other models(not through)
 reverse_m2m_through = [f for f in reverse_fields if f.__class__.__name__ == 'ManyToOneRel'  and f.related_model in [f.through for f in reverse_m2m if getattr(f,'through',False)]]
 forward_m2m = [f.remote_field for f in opts.get_fields() if not f.auto_created and f.many_to_many]
 #Filtering the ManyToOne rels with the through table of forwards m2m rels, same logic as above
 forward_m2m_through = [f for f in reverse_fields if f.__class__.__name__ == 'ManyToOneRel'  and f.related_model in [f.through for f in forward_m2m if getattr(f,'through',False)]]
 reverse_fields_final = itertools.chain(reverse_m2m, [f for f in reverse_fields if f not in itertools.chain(reverse_m2m, reverse_m2m_through, forward_m2m, forward_m2m_through)])

I propose to add a new attribute is_reverse = True/False to facilitate this, of course this solution works in my context point of view, there might be a better approach to this

comment:18 in reply to:  17 Changed 8 months ago by Asif Saifuddin Auvi

Replying to Gabriel Canto de Souza:

I also had a huge difficulty sorting out the fields of ._meta.get_fields() , what I was trying to achieve was get ONLY the reverse fields (because in our context every relationship is bidirectional, and everything is generic/auto generated)

This gets really complicated with m2m that contain through tables, because not only you get the reverse m2m remote_field, but also the m2o relationship with the through table, so no only you have to figure out these are reverse, but group both the m2m and m2o that revert to the same relationship.

This is the code I managed to come up with to get only the reverse rels:

 #I'm sure there's a better way of doing this
 #This is the documented way of getting reverse fields, but it also gets ManyToOne rel with the through table of forward m2m rels, so we have to filter them
 reverse_fields = [f for f in opts.get_fields() if f.auto_created and not f.concrete]
 reverse_m2m = [f for f in reverse_fields if f.__class__.__name__ == 'ManyToManyRel' and f.on_delete == None and not getattr(model,f.field.name,False)]
 #Also filtering the ManyToOne rel with the through table of the reverse m2m rels, because we need to differentiate them from reverse ManyToOne with other models(not through)
 reverse_m2m_through = [f for f in reverse_fields if f.__class__.__name__ == 'ManyToOneRel'  and f.related_model in [f.through for f in reverse_m2m if getattr(f,'through',False)]]
 forward_m2m = [f.remote_field for f in opts.get_fields() if not f.auto_created and f.many_to_many]
 #Filtering the ManyToOne rels with the through table of forwards m2m rels, same logic as above
 forward_m2m_through = [f for f in reverse_fields if f.__class__.__name__ == 'ManyToOneRel'  and f.related_model in [f.through for f in forward_m2m if getattr(f,'through',False)]]
 reverse_fields_final = itertools.chain(reverse_m2m, [f for f in reverse_fields if f not in itertools.chain(reverse_m2m, reverse_m2m_through, forward_m2m, forward_m2m_through)])

I propose to add a new attribute is_reverse = True/False to facilitate this, of course this solution works in my context point of view, there might be a better approach to this

I am preparing a Draft DEP for addressing ORM relations API related issues here https://github.com/django/deps/pull/39

It's still work in progress but you certainly add your thoughts/feedbacks.

Note: See TracTickets for help on using tickets.
Back to Top