Code

Opened 10 months ago

Closed 9 months ago

Last modified 2 months ago

#20625 closed New feature (fixed)

Custom Chainable QuerySets

Reported by: danols Owned by: loic84
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: QuerySet, models.Manager, chainable
Cc: loic@…, mike@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Provide a mechanism for defining chainable custom QuerySet directly through a subclassed models.Manager definition instead of the current approach that requires a custom models.Manager + QuerySet pair. This ticket potentially replaces/complements https://code.djangoproject.com/ticket/16748.

I suggest that we allow defining chainable querysets by allowing the django programmer to add a chainable == True attribute to the custom queryset method. The proposed syntax would look as shown below and a working github pull request will be submitted shortly; thoughts and feedback welcomed.

class OfferManager(models.Manager):
    """ Example of a chainable custom query set """
    
	...
    
	QUERYSET_PUBLIC_KWARGS = {'status__gte': STATUS_ENABLED}
    QUERYSET_ACTIVE_KWARGS = {'status': STATUS_ENABLED}
	
	...
    
	def public(self):
        """ Returns all entries accessible through front end site"""
        return self.all().filter(...)
    public.chainable = True 	# instructs to dynamically tranplat this method onto
                                # returned QuerySet as <queryset>.public(...) 
								# effectively providing chainable custom QuerySets

    def active(self):
        """ Returns offers that are open to negotiation """
        return self.public().filter(**OfferManager.QUERYSET_ACTIVE_KWARGS)
									# an example of how to reffer to OfferManager
									# constants as 'self' context changes
    active.chainable = True
    ...

Attachments (0)

Change History (37)

comment:1 Changed 10 months ago by danols

  • Needs documentation unset
  • Needs tests unset
  • Owner changed from nobody to danols
  • Patch needs improvement unset
  • Status changed from new to assigned

comment:2 Changed 10 months ago by danols

Initial pull request created: https://github.com/django/django/pull/1284

comment:3 Changed 10 months ago by danols

  • Has patch set

Properly formatted usage code example:

class OfferManager(models.Manager):
    """ Example of a chainable custom query set """
    
    ...
    
    QUERYSET_PUBLIC_KWARGS = {'status__gte': STATUS_ENABLED}
    QUERYSET_ACTIVE_KWARGS = {'status': STATUS_ENABLED}
    
    ...
    
    def public(self):
        """ Returns all entries accessible through front end site"""
        return self.all().filter(...)
    public.chainable = True     # instructs to dynamically tranplat this method onto
                                # returned QuerySet as <queryset>.public(...) 
                                # effectively providing chainable custom QuerySets

    def active(self):
        """ Returns offers that are open to negotiation """
        return self.public().filter(**OfferManager.QUERYSET_ACTIVE_KWARGS)
                                    # an example of how to reffer to OfferManager
                                    # constants as 'self' context changes
    active.chainable = True
    ...

comment:4 Changed 10 months ago by lukeplant

  • Triage Stage changed from Unreviewed to Accepted

I'm accepting the ticket in principle - we need a solution to this problem, evidenced by the number of solutions/complaints out there:

https://github.com/zacharyvoase/django-qmethod
http://stackoverflow.com/questions/4576622/in-django-can-you-add-a-method-to-querysets
http://djangosnippets.org/snippets/562/
http://djangosnippets.org/snippets/734/
http://djangosnippets.org/snippets/2114/
http://djangosnippets.org/snippets/2117/

And I don't think having multiple solutions to this problem helps people - it just adds more-than-one-way-to-do-it.

However, I'm not at the moment accepting the approach given in the ticket (neither am I rejecting it). We need to consider the pros/cons of the different methods in the existing solutions above.

For the solution in the PR above, I'd say the major disadvantage is that you have methods on the Manager which are not being used as Manager methods, but callables that could be attached to either Managers or QuerySets. This means surprises w.r.t to self (as documented in the example), and it would be easy to think it is working (because it works without chaining), or to have nasty surprises if you call methods that exist on QuerySet but have been overridden on the Manager subclass.

comment:5 Changed 10 months ago by loic84

  • Cc loic@… added

Overlapping ticket with related discussion: #16748

I see two main patterns in these attempts:

  1. Those that make it easier to provide a custom QuerySet:

https://github.com/jtillman/django/compare/CustomQuerySet
https://github.com/django/django/pull/317
http://djangosnippets.org/snippets/734/

  1. Those that pass custom Manager methods onto the QuerySet:

https://github.com/django/django/pull/1284/
https://github.com/zacharyvoase/django-qmethod
http://djangosnippets.org/snippets/562/

I personnaly favor the second option because it results in an API where both Model.objects.custom_method() and Model.objects.all().custom_method() work, which is what you've come to expect with the methods from the stock QuerySet.

Before we worry about the implementation we'll have to settle on the general idea.

comment:6 Changed 10 months ago by akaariai

I believe one problem with the proposed way is that chainable manager methods are lost when the QuerySet switches its class. The switching happens as side effect of certain calls (for example .values(), .values_list()). Also, creating dynamic classes as done in the patch is likely somewhat expensive, and will also likely break pickling (as the dynamic class can't be found from module level). Both of the latter problems are solvable, first one by caching the dynamic class, second one by altering QuerySet.reduce().

Other solutions to this problem are to make Manager itself a QuerySet (or subclass of it). I tried this some time ago and it was evident this path has a lot of problems ahead. Yet another solution is to call the chainable manager methods from the manager, but explicitly pass the queryset into the method, something like

class MyManager:
    @chainable
    def published(self, chain_to, other_params...):
        return chain_to.filter(...)

with some meta-magic (or just plain old-style wrapper function) the chain_to could be automatically created as get_queryset() if called from manager, but if called from qs, then chained would be the qs instance. However it will be a bit confusing to call this method with objects.published(datetime.now())... Alternative is to not magically provide the chain_to qs when called from manager - instead the user should create the chain_to if it is None.

A big +1 to the idea, but there needs to be a bit more discussion about what way to solve this.

My opinion is that if it is possible to get rid of the Manager class completely, or make it QuerySet subclass, that would be optimal path. Unfortunately this seems hard to do.

comment:7 Changed 10 months ago by loic84

akaariai's idea of having Manager inherit from QuerySet made me think of another option.

We can copy QuerySet methods onto the Manager.

That's similar to what we manually do to proxy all the current QuerySet methods, we'd just have to do it dynamically.

The end user's code would look like this:

class MyQueryset(QuerySet):
    def is_published(self):
        pass

class MyModel(Model):
    objects = Manager(queryset_class=MyQueryset)

>>> hasattr(MyModel.objects, 'is_published')
True

comment:8 Changed 10 months ago by loic84

Proof of concept: https://github.com/loic/django/compare/ticket20625

I wanted to leverage the Django test suite, so I tested my concept on the actual Manager class, the complete suite is passing.

We might not be able to do such refactor in the wild (removing all the proxy methods) since calls to super wouldn't work anymore, but the idea is there.

comment:9 Changed 10 months ago by charettes

FWIW here's what I've been using for the past years.

It doesn't account for QuerySet class changes thought (values_list) and should be adapted to make sure Model.objects.delete() can't be call directly.

comment:10 Changed 10 months ago by danols

From the semantic/syntax angle I don't believe we should pursue the idea of combining QuerySet and models.Manager into one as to me even though they do overlap in functionality they make sense to remain distinct, and the more I think about creating custom QuerySets on the models.Manager level the more I realize it is not the the right approach; it is just what we are familiar with and thought through the Django documentation.

I think the `objects=Manager(queryset_class=...) is the correct way , I would say all or almost all of the custom managers I have written should actually have been custom QuerySets. So I endorse loic84 approach (but would like to see if we could omit the need for the '.manager' attribute).

So Djangonauts - speak up and let's get a consensus going :)

comment:11 Changed 10 months ago by loic84

@charettes' proposal is pretty much identical to mine (as opposed to the 2 patterns described in comment:5), except for the factory which if I understand correctly wouldn't be needed anymore if this made it into core.

I picked an opt-in strategy with QuerySet.method.manager=True but we could just as well pick the following opt-out strategy:

  • Accept by default if QuerySet.method.manager is not present and the method name doesn't start with an underscore (thus protecting private and magic methods)
  • QuerySet.method.manager=False for public methods that don't make sense at the Manager level (i.e. QuerySet.delete())
  • QuerySet.method.manager=True to opt in for private methods.

Edit: Implementation of the opt-out strategy https://github.com/loic/django/compare/ticket20625_optout.

Last edited 10 months ago by loic84 (previous) (diff)

comment:12 Changed 10 months ago by charettes

@loic84 patch is looking good!

The only issue left is the QuerySet class changes, e.g. values() which returns a ValuesQuerySet instance that wouldn't inherit for the custom QuerySet subclass.

comment:13 Changed 10 months ago by loic84

@charettes: It's not a trivial issue! I pushed an extra commit to the branch where I play with dynamic types, everything but pickling works. Any suggestion?

comment:14 Changed 10 months ago by loic84

Added another commit which fixes pickling, I believe this POC is as good as it can be, we now have to decide whether it's the approach we want to pursue.

If we decide to go for it, we need to settle on the scope (do we try to deprecate all legacy proxy methods or not) and I'll turn it into a proper patch with tests and docs.

For the record, I did try having Manager inherit from QuerySet, it's quite hard because of all the magic methods. I also believe it's undesirable in term of API as there are plenty of QuerySet methods - the private ones in particular - that we don't want on the manager.

comment:15 Changed 10 months ago by loic84

  • Needs documentation set
  • Needs tests set
  • Patch needs improvement set

I've tried to come up with a strategy to deprecate all the legacy proxy methods.

The issue at hand is custom managers in the wild that call super in overridden methods.

@charettes mentioned on IRC that __getattribute__ would enable calls to super but that we wouldn’t be able to differentiate a normal access (which shouldn't go through the legacy method) and calls to super which should trigger the deprecation warning.

We could have a metaclass, similar to RenameMethodsBase which adds back legacy methods for every method with the same name in the MRO: if our method is ever called (by super) we trigger the deprecation warning.

I ran into a problem when trying to deprecate Manager.create() and Manager.get_or_create() with RelatedManager (and similar):

https://gist.github.com/loic/5861660
https://github.com/django/django/blob/master/tests/multiple_database/tests.py#L907-L921

If RelatedManager doesn't call super, we'll never reach custom managers' create() and get_or_create(). I don't see a solution to that other than making a backward incompatible change. Or we could elect a few proxy methods that shouldn’t be deprecated (like Manager.all(), see comment in the patch).

Any suggestions?

If we can't reach consensus on the deprecation plan I suggest we move on with this ticket and open a new one regarding the deprecation itself.

comment:16 Changed 10 months ago by charettes

@loic84 I agree that deprecation/removal of Manager proxy methods shouldn't block this ticket and discussion should be moved to another ticket once we manage to come up with a solution for this one.

We should play safe and avoid messing with Manager's MRO too much in the 1.6 release.

Lets see if #15363 behaves correctly first to avoid flooding users with deprecation warnings all over the place.

comment:17 Changed 10 months ago by loic84

  • Owner changed from danols to loic84

To summarize a bit:

  • The latest patch should cover all concerns raised so far: it handles specialized querysets such as ValuesQuerySet, doesn't break pickling, cruft-free methods definitions, sane defaults (no private or magic methods), a mechanism to opt-out or force-in certain methods and last but not least, it's robust enough to sustain the complete Django test suite while replacing almost 90% of Manager's public API.
  • We settled that deprecating legacy proxy methods is out of scope for this ticket while acknowledging that it's desirable in the long run. We'll open a dedicated ticket once we land this feature.
  • Since we are not deprecating the legacy methods, we need a dedicated test suite.
  • 1.6b1 has landed, so we can resume work targeting master.

Where are we in term of consensus? I'll turn this into an RFC patch as soon as I get a green flag on the implementation.

Latest implementation can be found here: https://github.com/loic/django/compare/ticket20625_optout.

Last edited 10 months ago by loic84 (previous) (diff)

comment:18 Changed 10 months ago by akaariai

I have tried various versions of making Manager a subclass of QuerySet. The most promising approach was to use a temporary class for hiding QuerySet only functionality from the manager, and drop the temp class on first clone. So:

  1. On creation of manager, create a dynamic class and add HideQuerySetMethods to the __bases__ of the dynamic class. Effect is that for example .delete() calls HideQuerySetMethods.delete(), not QuerySet.delete().
  2. On first clone, switch the class to the original class, that is drop the HideQuerySetMethods class from __bases__.

I got the solution to almost working state, but it required too many kluges. I ended up deleting the branch as there was just too many little problems to make it work cleanly.

I have seen the idea of this ticket discussed multiple times. Every time some solutions are offered, but the solutions always have some problematic cases that end up stalling the development effort. I think the solution in this ticket is clean, and makes it easy enough to add queryset methods. So, my opinion is that we should go forward with it.

I wouldn't actually remove the proxy methods at all. Removing them doesn't seem worth the effort, the maintenance burden of the methods is very small. If they are going to be removed, then all must be removed in one go. It is just weird if some of the proxy methods are still there because Django needs the super() method to be present, but other are removed.

It seems #17270 is a duplicate, I guess that one could be closed as this ticket seems to have more momentum at the moment.

comment:19 Changed 10 months ago by loic84

@akaariai: I agree with the all or nothing stance regarding the method deprecation, and I also agree that the maintenance burden is rather small.

Just a clarification, Django itself doesn't need the calls to super(), the problem is backward compatibility.

I'll tackle docs and tests tomorrow, hopefully we'll have an RFC patch pretty soon.

comment:20 Changed 10 months ago by akaariai

Sorry for the Django super() requirement confusion.

While writing my previous post I got an idea. I did some quick testing and it seems the idea needs to be presented. There are two key things in the idea:

  • Managers are created by CustomQuerySet.manager_cls() method - this is a factory constructing a new manager class.
  • The new manager class has dynamically created proxy methods for those methods in the CustomQuerySet that should be present in the manager.

So, you could do something like this:

class MyQuerySet(models.QuerySet):
    def bar(self):
        return ...
    bar.manager = True

class MyModel(Model):
    objects = MyQuerySet.as_manager()

Now, MyModel.objects is a custom Manager subclass which has a proxy method for bar(). And it really *has the method*, so super() works correctly.

The API might not be optimal, but the idea that proxy methods are automatically created on dynamically created Manager classes should be considered carefully. In my opinion it offers better semantics than the __getattr__ based approach.

I wrote a very quick proof-of-concept patch. It is available from: https://github.com/akaariai/django/tree/manager_factory. The only interesting part of the patch is the method at https://github.com/akaariai/django/commit/283af213d1873cefd642b6d9aeb97f285ef227c4#L2R44

comment:21 Changed 10 months ago by loic84

@akaariai: I'm glad you presented this idea; I find as_manager() very elegant as it removes all the boilerplate from Manager while remaining backward compatible, it's probably more efficient as well.

Its semantic is already used throughout Django (View.as_view(), Form.as_p(), etc.) so people should feel right at home.

Regarding the copy logic, I see you've used the "opt in" strategy from my initial patch, as per @danols' suggestion I like better the "opt out with smart defaults" one.

I had a stab at it: https://github.com/loic/django/compare/loic:ticket20625_copy

class MyQuerySet(models.QuerySet):
    # Private methods are ignored by default.
    def _foo(self):
        return ...

    # Public methods are copied by default.
    def foo(self):
        return ...

    # Opt in.
    def _bar(self):
        return ...
    _bar.manager = True

    # Opt out.
    def bar(self):
        return ...
    bar.manager = False

class MyModel(Model):
    objects = MyQuerySet.as_manager()

comment:22 Changed 10 months ago by akaariai

Yes, I like the smart defaults idea more than opt-in.

The only ugly case in the as_manager() approach seems to be that if you want manager-only methods you will need to create a Manager class, and then use MyQuerySet.as_manager(base_cls=MyManager) (or alternatively subclass MyQuerySet.manager_cls()). This isn't as-pretty-as-possible. But, in general having methods on the qs even if they only make sense as manager methods isn't too bad. Django has multiple examples of this, qs.create() doesn't actually make sense, but the create on qs doesn't hurt anybody either.

I am sure that if one defines the same method on both the qs and the manager in complex enough ways you can break super(). This seems to be a case of "don't do that".

I think it is now the time to complete the patch.

comment:23 Changed 10 months ago by loic84

@akaariai: wouldn't that address the issue?

class MyManager(Manager):
    def manager_only():
        pass

class MyQuerySet(QuerySet):
    manager_class = MyManager
    
    def manager_and_queryset_method():
        pass

class MyModel(Model):
    objects = MyQuerySet.as_manager()

So basically rename manager_cls() to get_manager_class() and have it use a cls.manager_class (or cls.base_manager_class) attribute.

If we go for that we should probably remove Manager.queryset_class (edit: by remove I mean, rename to _queryset_class) since it would become an antipattern.

Are you planning to complete the patch or should I do it?

Since we are replacing all the legacy methods with this technique there is not much left to do in term of testing: basically just the copying logic.

Last edited 10 months ago by loic84 (previous) (diff)

comment:24 Changed 10 months ago by loic84

Implementation of the idea above with accompanying (quick and dirty) test at https://github.com/loic/django/commit/2e129ae8aba862054373a3c5789493bc456d1ed7.

comment:25 Changed 10 months ago by akaariai

Seems good to me.

If you want a custom manager, but not queryset you can use subclass Manager approach. If you want custom queryset with methods added to the manager, you can use custom queryset + as_manager(). If you want both, you will need to have both Manager and QuerySet subclass, and use QuerySet.manager_class to tie them together. This feels clean and correct.

I think one area in need of testing is what happens when you define a custom method on both Manager and QuerySet class. I am sure that complex scenarios will break, but that is OK. This is mostly knowing what will fail, and if possible having informative error messages for the failing cases.

I am currently on holiday so I am not willing to spend too much time to work on Django, feel free to go for the final patch.

comment:26 Changed 10 months ago by loic84

The only issue I can think of is that we would override the Manager method when it is defined on both.

I've just addressed that in https://github.com/loic/django/commit/b1d6d2a0cbdb5ad7decc2cf6fc8b8dde960061cf.

I previously shielded Manager.all() from this issue with manager=False so it's no longer necessary. I believe Manager.all() and RelatedManager.create()|get_or_create() are in the situation you've described and it seems to work well.

I'll start working on an RFC patch, hopefully I'll get pretty close by tonight.

Last edited 10 months ago by loic84 (previous) (diff)

comment:27 Changed 10 months ago by loic84

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

PR https://github.com/django/django/pull/1328

Docs probably need more love but that gives us something to work with.

I had to change the implementation of get_manager_class() because it relied on __dict__ which doesn't work with inheritance.

I tried using dir() and the previous callable() test and while it worked for most purposes it also produced a few false positive; for example base_manager_class was picked up as a callable and made it into new_methods.

In the end I use inspect.getmembers() with a ismethod predicate and the result has been very robust.

comment:28 Changed 10 months ago by loic84

For the record, we lost support for the specialized querysets with the last implementation; I've added a new commit to the PR to address that.

There is also a couple of minor/cosmetic pending questions in the comments on the PR.

comment:29 Changed 10 months ago by carbonXT

  • Cc mike@… added

comment:30 Changed 10 months ago by akaariai

A quick skim of the patch didn't reveal anything surprising. I don't see anything that needs to be addressed before commit. The minor issues mentioned in the PR reviews seem minor enough to be decided at commit time.

If somebody can review the docs part that would help a lot. I am not too good with documentation changes myself.

comment:31 Changed 9 months ago by timo

Docs look good to me.

comment:32 Changed 9 months ago by loic84

Squashed down to a single commit that applies on top of the latest master. Hopefully this is RFC.

comment:33 Changed 9 months ago by loic84

Rebased to remove the new proxy method introduced by #20429.

comment:34 Changed 9 months ago by loic84

Anyone would volunteer for a merge? I rebase the branch to master daily but it doesn't seem merging is on anyone's todo list.

comment:35 Changed 9 months ago by akaariai

I'll be back to work next week. I will try to merge this one then unless somebody beats me to it.

comment:36 Changed 9 months ago by Anssi Kääriäinen <akaariai@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In 31fadc120213284da76801cc7bc56e9f32d7281b:

Fixed #20625 -- Chainable Manager/QuerySet methods.

Additionally this patch solves the orthogonal problem that specialized
QuerySet like ValuesQuerySet didn't inherit from the current QuerySet
type. This wasn't an issue until now because we didn't officially support
custom QuerySet but it became necessary with the introduction of this new
feature.

Thanks aaugustin, akaariai, carljm, charettes, mjtamlyn, shaib and timgraham
for the reviews.

comment:37 Changed 2 months ago by Tim Graham <timograham@…>

In 36f260341a1720d7a38e482234c84ff8e73eab85:

Extended the release notes for chainable Manager/QuerySet methods.

Refs #20625.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.