|Reported by:||brantley (deadwisdom@…||Owned by:||adrian|
|Component:||Database layer (models, ORM)||Version:|
|Has patch:||no||Needs documentation:||no|
|Needs tests:||no||Patch needs improvement:||no|
As it has been said before, the conflict between the rational SQL language and the procedural Python language mandates a perfect querying method to be impossible. However, there are some things we can do to make it nicer.
Imagine a new method called "query", or what have you, that makes the querying process easier to read and simpler.
Firstly, the idea has been set forth, as in ticket #851, to use the comparison operators already defined in Python, i.e. == < <= > >= !=, instead of the underscore keywords we have now. The problem with doing this is that you need an object that can emulate numeric types. There are few ways of getting this object out, the most obvious is to have a class, for instance named "Field", that will intialize with a name argument. This can then interact with the operator. Like so:
polls.query(Field('question') == 'What are you searching for?')
This is fine, and readable. But it forces us to put our field in a string and feels less OO, than it could. In ticket #851, it is suggested to make a 'q' object that returns a Field object for any property that is referenced on it. But I feel that 'q' isn't a very descriptive name, so I propose replacing it with 'field'. So field.question is identical to the above Field('question').
polls.query(field.question == 'What are you searching for?')
Even better, if the pluralization of models was dropped, and you could access Poll, then you could do Poll.question, which seems the most natural.
Ands / Ors
Now, we must think of how to perform an OR operation. The obvious answer is to allow '|' to modify what we already have:
choices.query(field.choice == 'A choice' | field.votes > 4)
Unfortunately, due to the operator precedence in Python, this doesn't work how we'd expect. Python thinks that it should 'A choice' | field.votes before applying the comparison operators, which doesn't make any sense. The solution is to put parenthesis around everything:
choices.query((field.choice == 'A choice') | (field.votes > 4))
But this will quickly lead to user error, and becomes more difficult to read.
So instead of thinking of it from an SQL perspective, which is to OR items in our WHERE clause. Let's think of it from a Python perspective. Results should be able to be added together, just like lists:
results = choices.query(field.choice == 'A choice') results += choices.query(field.votes > 4)
This seems intuitive; we add together our results, which effectively makes an "OR". However, it seems to add overhead. It looks like two calls, and then adding the list, which is much slower than one call with an OR. But here is where the magic comes in. Instead of calling the SQL each time the query is performed, we call it only when it is needed, like so:
results = choices.query(field.choice == 'A choice') results += choices.query(field.votes > 4) print results # Database is queried here. The SQL result: SELECT * FROM `choices` WHERE ((choice = 'A choice') OR (votes > 4))
The Query object, which is a returned from choices.query, acts just like a list, defining __getitem__, __iter__, and __len__. So only when any of these methods is called, does it finally query the database. Until then, one can add two queries together, or do a subquery:
results = choices.query(field.choice == 'A choice') popular_choices = results.sub_query(field.votes > 10) print popular_choices # Database is finally queried The SQL result: SELECT * FROM `documents` WHERE ((votes > 10) AND (choice = 'A choice'))
As you can see here, even though there were two queries, only the sub-query hit the database, because the first one "results" was never used.
ANDs can be done by specifying multiple requirements in the query:
popular_choices = choices.query(field.choice == 'A choice', field.votes > 10) print popular_choices # Database is finally queried The SQL result: SELECT * FROM `documents` WHERE (choice = 'A choice' AND votes > 10)
Extra options like sorting are done, intuitively, before the query is even done:
popular_choices = choices.query(field.choice == 'A choice', field.votes > 10) popular_choices.sort('-choice') print popular_choices # Database is finally queried The SQL result: SELECT * FROM `documents` WHERE (choice = 'A choice' AND votes > 10) ORDER BY choice DESC
Starts-with, contains, ends-with, are done like so:
obstaining = choices.query(field.choice.startswith("I don't know"))
Query objects can hit the database artificially with fetch():
results = choices.query(field.choice == 'A choice') results.fetch() # Database is finally queried
I'm not sure that this should replace the current system, as making this device of the API backwards incompatible would be a real pain in the *ss. Certainly that's a decision for the powers that be. But I do think that it should at least be added to the current system, with the documentation reflecting this process, rather than the current. And I would be willing to create the necessary patch and what not; I'm not just talkin' the talk.
A test scenario is attached to this ticket, that prints out the SQL rather than actually querying a database.