Opened 9 years ago

Closed 8 years ago

Last modified 8 years ago

#26749 closed Cleanup/optimization (fixed)

Preserve behavior of use_for_related_field during deprecation

Reported by: Julien Hartmann Owned by: nobody
Component: Database layer (models, ORM) Version: 1.10
Severity: Normal Keywords: manager
Cc: Loic Bistuer, Shai Berger Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I am in the process of testing Django 1.10 and upgrading the third-party package I maintain, django-hvad.

This ticket is related to the two following commits:

The issue I have with the way inheritance and customization now works is it is very hard to have a different manager as the base_manager and the one used for related field descriptors.

Why the need?

Django-hvad is one of the modules that change the default querysets. It does it by replacing the default manager with one that brings extra features. Namely, automatic joining onto a translations model.

Having those features available on related fields is essential to provide a consistent interface, and after years of having it, it is now essential to backward compatibility as well.

It does not break the basic rule of “a base manager must not hide objects”, as it merely adds automatic select_related() rules and iterator() encapsulation.

Yet, those extra features cannot be on the base_manager, due to the way ORM internals use it (for instance, triggering additional joins on a delete would not work).

Why the ticket?

Up to Django 1.9, this was possible using use_for_related_fields = True, and fixing the _base_manager in the class_prepared hook. Not clean but workable.

With the new API though, all the manager machinery uses property descriptors on the Options class, making it difficult and fragile to override. And at the same time, using use_for_related_fields got deprecated, which means there is no option to use a different manager as the base manager and as the related field descriptor's manager anymore.

Unless there is another way to do this?

Change History (23)

comment:1 by Tim Graham, 9 years ago

Cc: Loic Bistuer added

comment:2 by Loic Bistuer, 9 years ago

At some point in the design of the new API we had another model option called related_manager_name to do basically what OP is asking (discussions at https://github.com/django/django/pull/6175). There was some design decisions to make on whether it should only affect to-one rels, or every rels, and on whether we should introduce even more granularity with options like to_one_related_manager_name and to_many_related_manager_name.

comment:3 by Carl Meyer, 9 years ago

Triage Stage: UnreviewedAccepted

After discussion with loic in IRC, I think it makes sense to go ahead and add a single related_manager_name option. That would cover this use case (and all the ones I can think of), and wouldn't preclude later layering on many_related_manager_name or single_related_manager_name if someone finds compelling needs for them later on.

comment:4 by Julien Hartmann, 9 years ago

Could it be possible to clarify the intended use of base_manager? All over Django codebase, it's only used at 4 points:

  1. when finding out the list of related objects in the deletion collector
  2. when saving an existing model, the internal _update method is called on a queryset created by te base manager.
  3. when saving a newly created model, the internal _insert method is called on the base manager.
  4. in the contrib.contenttypes application, for ̀GenericForeignKey`.
  1. And well, since the changes of the linked commits, it's now also used for related fields, but hopefully that should be gone or overridable soon.

My understanding is use cases 1-3 are internal uses, and are the intended users of the base manager.

On the other hand, use cases 4-5 expose the manager to user code, and should be overridable to offer custom behavior to user code without altering ORM internals.

The change you suggest, with related_manager_name, seems in line with my understanding? If so, should GenericForeignKey use it as well?

(by the way, in the default manager documentation, I think it should say default_manager_name instead of base_manager_name?)

Last edited 9 years ago by Tim Graham (previous) (diff)

comment:5 by Loic Bistuer, 9 years ago

Could it be possible to clarify the intended use of exhaustive?

I was originally planning to document the exhaustive list of base manager usages, but Tim actually wrote the docs for that patch and went with a terser description. I'm not against adding a bit more details, patch welcome.

since the changes of the linked commits, it's now also used for related fields, but hopefully that should be gone or overridable soon

The previous implementation of one-related descriptors weren't representative of intent:

In practice use_for_related_fields meant "the default manager can also be used as base manager" since the introduction of _base_manager, as demonstrated by your need to monkeypatch _base_manager back to a plain manager during the class preparation. Also _base_manager was introduced to generalize the use_for_related_fields mechanism as documented in the historical note that was commited the same day _base_manager was introduced.

My understanding is use cases 1-3 are internal uses, and are the intended users of the base manager.

I don't see a distinction between internal and external use. There is nothing internal about the concept of inserting data and/or wanting to guarantee unfiltered results (e.g. dumpdata, collection for deletion, etc.), and for that you'd want to use a plain manager. Granted those are low-level operations that most projects don't need to worry about, but 3rd-party libraries are definitely among the intended users.

The change you suggest, with related_manager_name, seems in line with my understanding? If so, should GenericForeignKey use it as well?

Definitely, GenericForeignKey and GenericRelation actually don't pick the most adequate managers due to their reliance on the get_object_for_this_type and get_all_objects_for_this_type utilities which use _base_manager. I'm planning to undocument these and remove every usage in Django (deprecating them would break too much working code) in favor of inlining their code.

(by the way, in the ​default manager documentation, I thinkg it should say default_manager_name instead of base_manager_name?)

Well spotted, want to make a patch?

comment:6 by Julien Hartmann, 9 years ago

Before changing anything, I must be certain to fully understand.

What I meant by "internal" is the way the manager is used does not expose it to user code in cases 1-3. In those use cases, the manager provides a service to Django ORM, which could completely be implemented with no manager interaction, user code would not notice.
→ On the other hand, in cases 4-5, Django simply picks the manager and passes it to user code, which makes it possible for the manager to expose additional features to user code.

So we have two very different use patterns. Therefore it makes sense to separate them — related_manager and base_manager. Actually, I'd even say that related_manager is closer to default_manager than base_manager in purpose and use pattern.


Let's get back to base_manager. Actually, cases 1-3 are not the whole story, as you pointed out. It also serves the purpose of providing a unfiltered-access-to-model guarantee.

I believe that's a big red flag. Because those two purposes are different: cases 1-3 are consumed by the ORM while the unfiltered guarantee is consumed by user code.

Now, this unfiltered-guarantee is clearly defined and definitely a good addition to the public side of the API. No problem with that. Now, though, the purpose of customizing the inner workings of the ORM seems to be inconsistently served:

  • Overriding insertion:
    • Either override MyModel._do_insert
    • or just let it forward the call to the base_manager and override MyBaseManager._insert
  • Overriding updates:
    • Override Queryset.update (for queryset-based updates)
    • and either:
      • override MyModel._do_update
      • or let it forward the call to base_manager's queryset and override QuerySet._update
  • Overriding deletion:
    • If object can be fast deleted, override base_manager's queryset _raw_delete method.
    • otherwise, no luck, you cannot override deletion, it is a hardcoded sql.DeleteQuery(model).

There is no object clearly responsible for updating the database. Sometimes the queryset does it, sometimes the base_manager, sometimes the deletion collector bypasses both. Sometimes the calls go through the model, with a default implementation that simply forwards them and sometimes they don't.

Also, though base_manager_name is becoming a public API, doing anything useful with it requires overriding private, undocumented API methods.


So, the final breakdown, as I understand it at the moment, is as such:

  • default_manager: makes it possible to expose additional features to user code, in the general case.
  • related_manager: makes it possible to expose additional features to user code, on related fields.
  • base_manager: must provide the unfiltered, untampered guarantee for use in things such as dumpdata.
  • unspecified API clinging to base_manager and living in undocumented, private API methods of the manager or its queryset: can be used to override low-level operations. That is, cases 1-3.

I would advocate splitting base_manager and the unspecified API, for several reasons:

  1. it would make the base_manager's role and responsibilities clearly bound and defined.
  2. there is no reason for those methods to live on the manager/queryset in the first place.
  3. the fact they live on the manager(/manager's queryset) means every manager has them, yet they are only ever used/usable on the base manager(/manager's queryset).
  4. the unspecified API is inconsistent and incomplete anyway, certainly not mature enough to go public with the base_manager.

Namely, I'd completely remove Manager._insert and QuerySet._update and let Model._do_insert and Model._do_update do the job.


I hope this didn't sound too bold, I'm trying my best to make my meaning unambiguous, somewhat at the expense of political correctness.
If you think this has some value, I can try to make a proof of concept patch.

comment:7 by Shai Berger, 9 years ago

If I understand correctly, this is an accepted ticket asking for design changes in a new feature. Shouldn't it be a blocker?

comment:8 by Shai Berger, 9 years ago

Cc: Shai Berger added

comment:9 by Julien Hartmann, 9 years ago

I was taking a shot at adding the related_manager_name.

I was wondering what it should default to, if unspecified: the base_manager or default_manager? To me, default_manager makes more sense, but base_manager is closer to former behavior.

comment:10 by Loic Bistuer, 9 years ago

@spectras, I already have a branch that introduce related_manager_name, but there are a bunch of design decisions to make that I'm not comfortable making this late in the 1.10 cycle. I don't want to introduce new APIs in a rush.

Instead I'd like to adapt the current deprecation shim for use_for_related_field so django-hvad can use it to work on Django 1.10, and then introduce definitive APIs in Django 1.11.

comment:11 by Julien Hartmann, 9 years ago

Alright, well maybe it's better to just summarize what is needed:

If user code did not change any manager, then:
→ hvad must override the default manager and the manager used by related fields, for an hvad-enabled one.

And in all cases, even if the user did override a manager, then:
→ hvad must ensure the base manager is not hvad-enabled: isinstance(model._base_manager, TranslationManager) is False).

This is because hvad-enabled manager is not suitable as a base manager. Though it does provide the unfiltered-guarantee, it breaks because it alters the behavior of the deletion collector.

I'll investigate whether it's possible to prevent that in a future version, but that's probably a full rewrite with backward incompatibilities, something not happening in time for Django 1.10.

comment:12 by Loic Bistuer, 9 years ago

@spectras, could you please try https://github.com/django/django/pull/6825.

Then on django-hvad:

  • Model._plain_manager = models.Manager().
  • Instead of patching Model._base_manager patch Model._meta.base_manager_name = '_plain_manager'.
  • TranslationManager.silence_use_for_related_fields_deprecation = True

comment:13 by Julien Hartmann, 9 years ago

Passes the test suite. I'll fiddle with it a bit more, but unless I find odd corner cases, it's good.

I like the way the PR only changes the deprecation path, allowing a clean base to refine the API on next release.

comment:14 by Loic Bistuer, 9 years ago

I guess the downside of this approach is that it'll trigger a migration for each model. Think it's acceptable?

comment:15 by Julien Hartmann, 8 years ago

The migration process has been very well streamlined since South functionality has been merged into Django, so yes — I'll make sure to mention it in the release notes. That will be the opportunity to include another change that requires a migration as well.

comment:16 by Julien Hartmann, 8 years ago

As the release is getting closer, I'm coming back here to know whether the POC fix will be merged into stable/1.10.x?

comment:17 by Tim Graham, 8 years ago

Type: UncategorizedCleanup/optimization

I expect it should be, but the patch is missing a test. If you want to try writing one, that would be helpful.

comment:18 by Julien Hartmann, 8 years ago

Can do.
I checked current tests, this corner case is not covered. I'll try to add that today.

comment:19 by Julien Hartmann, 8 years ago

I opened another pull request as I cannot update loic's one. It is here: https://github.com/django/django/pull/6889
I'm open to all suggestions to improve it.

comment:20 by Tim Graham, 8 years ago

Has patch: set
Patch needs improvement: set

comment:21 by Tim Graham, 8 years ago

Patch needs improvement: unset
Summary: Different manager for _base_manager and related descriptorsPreserve behavior of use_for_related_field during deprecation
Triage Stage: AcceptedReady for checkin

comment:22 by Tim Graham <timograham@…>, 8 years ago

Resolution: fixed
Status: newclosed

In f4afb85d:

Fixed #26749 -- Preserved behavior of use_for_related_field during deprecation.

comment:23 by Tim Graham <timograham@…>, 8 years ago

In 39c25b7:

[1.10.x] Fixed #26749 -- Preserved behavior of use_for_related_field during deprecation.

Backport of f4afb85d7e900123fa8f88110adc011ab184e153 from master

Note: See TracTickets for help on using tickets.
Back to Top