#11863 closed (fixed)
Add a method to the orm to create Model instances from raw sql queries
Reported by: | Sean O'Connor | Owned by: | Sean O'Connor |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Keywords: | ||
Cc: | Alexander Koshelev, info@…, miracle2k, Robin Breathe, jdunck@… | Triage Stage: | Unreviewed |
Has patch: | yes | Needs documentation: | yes |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
There should be a mechanism to easily convert the result of a raw sql query into model instances. It is currently possible to write code to generate model instances from raw sql query results but its a fair about of boring boiler plate code.
Attachments (5)
Change History (34)
comment:1 by , 15 years ago
Status: | new → assigned |
---|---|
Version: | → SVN |
comment:2 by , 15 years ago
comment:3 by , 15 years ago
That's assuming that the order of the fields coming back from your query match the order of your model's fields. There's value in providing something which can be a bit more robust and flexible (i.e. adding "extra" fields as properties. Additionally providing a supported and documented method of handling this process for newer users holds significant value.
comment:4 by , 15 years ago
If it's possible to write this generically at all, then you don't actually have to write any boilerplate - you can already write a utility function that will do it for any model (and it doesn't have to be method on Model). It also sounds like you've got specific requirements in mind with your extra features. Why don't you come up with a function that does what you want, and then we can see if it's likely to get into Django? It may be that *some* of the things you want would definitely *not* make it, in which case there is no point adding a feature that is not even sufficient for the one person who's asked for it.
comment:5 by , 15 years ago
So after some testing and research here's an example of the API I'd propose:
import ResultHandler from library import Author, Book # Basic use query = "select id, first_name, last_name from library_author" authors = ResultHandler(Author, query) # Extended use query = """select id, first_name as first, last_name as last, count(select * from library_book...) as book_count from library_author where first ilike %s and last ilike %s""" params = ('John', 'Doe') translations = ( ('first', 'first_name'), ('last', 'last_name'), ) authors = ResultHandler(Author, query, params, translations)
In the basic use, authors would be a iterable which would return normal model instances. The extended use example authors would be an iterable of model instances with an extra property called book_count. Obviously this example is a bit contrived since this could be accomplished via the ORM but it illustrates the API I'm proposing.
I've confirmed that it is possible to figure out the result field to model field translation independent of field ordering.
Let me know what you think of the API and it's appropriateness for Django proper. I'd be happy to package it in the form of a Django patch or a third party app once my implementation is complete.
comment:6 by , 15 years ago
Had a discussion with RKM and worked out a bit of a variation which should work. Unfortunately the final implementation will depend on some decisions being made around multi-db support so it will be a little while before a full patch will be ready.
comment:7 by , 15 years ago
For anybody interested you can track my progress at http://github.com/SeanOC/django. Once the code reaches a more officially reviewable state I will post a patch here.
by , 15 years ago
Attachment: | version_1.diff added |
---|
comment:8 by , 15 years ago
Has patch: | set |
---|---|
Needs documentation: | set |
comment:9 by , 15 years ago
Cc: | added |
---|
comment:10 by , 15 years ago
Cc: | added |
---|
comment:11 by , 15 years ago
Thanks for the promising good start, Sean! I'm going to tell you what's wrong with your patch in a sec, but really this is a great beginning, so thanks.
So, here's my review. First, things to fix:
- Making
kwargs
andannotations
available to the test suite is kinda gross. I don't think the tests really need to know that info: they just need to validate that the raw query works. - The signature of
raw()
will, I think, promote bad SQL practice and could lead to SQL injection. Specifically, the standard DB-API interface isexecute(query, params)
, but forraw()
it'd beraw(query, params=params)
. That's a minor difference, butraw(query % params)
is now shorter, so people'll use it. Please change the API ofraw()
to beraw(query, params, **kwargs)
. - Please spell-check your comments.
- Naming nit: no need for the
Exception
part ofInsufficiantFieldsException
;InsufficiantFields
will suffice. Ditto forInvalidQueryException
. - When you check for a
SELECT
query, make sure tostrip()
the query string (query.py line 57). - Please use fixtures in tests instead of that big
setUp
.
Second, some questions and further things to think about;
- Do you think
RawQuerySet
should cache results the same wayQuerySet
does? I think not, but we'll need to explain why that'd be. - Have you thought about integrating this with the lazy field support (
QuerySet.defer()
/QS.only()
)? Seems like a logical thing to support. - Is
cursor.description
supported across all Django's supported database backends (I think it's not). If not, the same API can still work -- probably by supplying a completetranslations
dict -- but we need to avoid erroring out ifdescription
isn't available. - This has some implications w/r/t Alex Gaynor's multi-db work. You might want to try to get his thoughts on how you're handling
connection
and how that'll need to change.
Again, thanks -- this is good work, and I think it's pretty close to okay for checkin.
comment:12 by , 15 years ago
Thanks for the feedback Jacob! I'm going to be swamped with work for the next week or two but getting these worked into the patch are the first thing on my queue once work lets up a bit. In the mean time here's some quick responses:
- Making kwargs and annotations available to the test suite is kinda gross. I don't think the tests really need to know that info: they just need to validate that the raw query works.
I tend to agree, the main reason I did this was during initial development there were situations where real fields were being added as annotations instead of being passed into the __init__
of the model. Since this could have some significant side effects I wanted to make sure that I caught it happening going forward and this was the easiest way I thought of to test it. I'll rework it so that those are no longer unneccesarrily exposed and develop a test with a model which will error if it doesn't receive all of it's fields in it's __init__
.
- The signature of raw() will, I think, promote bad SQL practice and could lead to SQL injection. Specifically, the standard DB-API interface is execute(query, params), but for raw()it'd be raw(query, params=params). That's a minor difference, but raw(query % params) is now shorter, so people'll use it. Please change the API of raw() to be raw(query, params, kwargs).
Good call. Part of the goal here was to keep things relatively non-SQL specific so that the .raw() method could be implemented for non-relational back-ends. That being said I think the query+params best practice is common enough across systems which would allow something resembling a raw query that this makes sense.
- Please spell-check your comments.
- Naming nit: no need for the Exception part of InsufficiantFieldsException; InsufficiantFields will suffice. Ditto for InvalidQueryException.
- When you check for a SELECT query, make sure to strip() the query string (query.py line 57).
- Please use fixtures in tests instead of that big setUp.
Will do on all of these
- Do you think RawQuerySet should cache results the same way QuerySet does? I think not, but we'll need to explain why that'd be.
I've gone a bit back and forth on this one. To be honest the main reason it doesn't right now is I've just been focusing on getting things working and getting the API hammered out. At this point I'm inclined to try and be consistent with QuerySet's behavior and to cache results. Then if somebody wants to hit the database again, they would need to be explicit and call raw() again to get a new RawQuerySet. If anybody has a strong feeling/argument either way I'd be very happy to hear it. This will probably be one of the last bits I implement in the patch.
- Have you thought about integrating this with the lazy field support (QuerySet.defer()/QS.only())? Seems like a logical thing to support.
Yes I have but so far I've really focused on just getting things to work and to get the API worked out. I've basically been treating this as a "nice to have" aspect to the patch. Once I get your other recommendations worked in and a first pass at the documentation complete I'll dig into what work would be required to implement this.
- Is cursor.description supported across all Django's supported database backends (I think it's not). If not, the same API can still work -- probably by supplying a complete translations dict -- but we need to avoid erroring out if description isn't available.
That's a very good question. I'll try to do some testing/research to find out. Either way I'll look at ways to handle cursor.description not being supported in a particular backend.
- This has some implications w/r/t Alex Gaynor's multi-db work. You might want to try to get his thoughts on how you're handling connection and how that'll need to change.
Alex, Russel, and I talked about this a bit during the Djangocon sprints. Their advice at the time was to go forward with developing against Django trunk. Once Alex's multi-db work hits trunk we'll look at what will be needed to make sure that .raw() uses the right connection to get a cursor. In generally tho it should be a relatively small change. To be safe I'll be sure that when I go to work in these recommendations, I'll do a review of how I'm handling the cursor and start to plan for the change.
comment:13 by , 15 years ago
Sounds good; I'll keep an eye out for your new patch(es).
I think the best plan for now, then, is to try to get this done in as simple way as possible -- no caching, no lazy fields. We can always add those things in later if it seems they're needed.
comment:14 by , 15 years ago
Another problem to think about:
The tests currently fail against SQLite because RawQuery.__len__
returns cursor.rowcount
, which is quirky on SQLite (it returns -1 if the query hasn't been executed yet). We'll need to figure out a way around that. It's likely that the solution involves caching the results so that calling __len__
and then __iter__
(or vice versa) doesn't cause the query to run twice.
comment:15 by , 15 years ago
Cc: | added |
---|
by , 15 years ago
Attachment: | version_2.diff added |
---|
comment:16 by , 15 years ago
I've attached an updated version of the patch which should apply cleanly to the current head of trunk ([11743]). Here's a list of the updates in the patch:
- kwargs and annotations are now no longer exposed for the unit tests. Proper handling of arguments vs. annotations is tested via tests in the init method of one of the test models.
- The signature of .raw() has been fixed as suggested.
- Comments have had their spelling errors fixed.
- Exception names have been fixed.
- The query type check now performs a .strip() to avoid errors due to silly whitespace.
- Tests now load test data using fixtures instead of a big setUp method.
- Tests now run clean against Postgres, MySQL, and SQLite (I haven't had time to setup an oracle db to test against yet).
- sql.RawQuery now does result caching and will use the len() of the cached results when SQLite is quirky with cursor.rowcount.
I've also done some research into cursor.description behavior across DB-API implementations. As far as I can tell, the only oddness is that SQLite only populates the first value of the seven tuple expected by the DB-API spec. I've observed this behavior and it is mentioned in the docs. Fortunately, this isn't a problem for us right now since the first value (column name) is the only one we care about.
Something which definitely could use some review and feedback is the result caching. Currently the caching is performed in sql.query.RawQuery and it will call cursor.fetchall() as soon as data is required. It may make sense to move the caching up to models.query.RawQueryset to save the work of building model instances with each iteration. The problem with moving the caching up to RawQueryset is that then mulitple iterations and reliable use of len() in RawQuery won't be possible. Accordingly it may make sense to cache in both. It may also make sense to move to a more "lazy" population of the cache using cursor.fetchone() or cursor.fetchmultiple() but I'm not sure that the gains would be worth the added complexity.
From here the main thing I need to work on for this patch to be fully ready is documentation. Any feedback on this latest patch would be greatly appreciated.
comment:17 by , 15 years ago
For anybody who was following this work on github, I needed to change my workflow a bit. You can now find the code at http://github.com/SeanOC/django/tree/ticket11863.
comment:18 by , 15 years ago
Looking great so far. Just one comment - the default repr of a RawQuerySet is a little jarring, I thought for a moment something was broken or there was a logging message left in the code. I'd suggest something like <RawQuerySet: "select * from blog_entry"> instead.
comment:19 by , 15 years ago
Cc: | added |
---|
by , 15 years ago
Attachment: | 11863-rc1.diff added |
---|
comment:20 by , 15 years ago
I've attached the patch I'm proposing for addition. The change I've made from Sean's most recent patch:
- Added documentation.
- Made
raw()
queries lazy -- they're not executed until iterated over. - Made
translations
into a dict instead of list-of-2-tuples. - And some misc. cleanups to docstrings, variable names, etc.
comment:21 by , 15 years ago
Cc: | added |
---|
by , 15 years ago
Attachment: | 11863-rc2.diff added |
---|
comment:22 by , 15 years ago
There is a minor typo at line 188 of the rc3 patch. I guess it should say:
'connection.cursor directly for other types of queries.')
comment:23 by , 15 years ago
Russ - it appears your latest patch is missing the tests. If you'll upload a new one I'll take a look and then commit it, or else feel free to commit yourself.
comment:24 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:25 by , 15 years ago
Maybe I'm missing it, or maybe the intent is to add it later, but I couldn't find docs/topics/db/raw.txt.
comment:26 by , 15 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
I believe this is broken, I am seeing a TypeError when evaluating a raw query:
TypeError: iter() returned non-iterator of type 'CursorWrapper'
Fix here:
http://github.com/miracle2k/django/commit/20985dbb958ebdcf5f2b5c5492f07b5258a0b5ff
comment:27 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Yes, that's a real issue, Russell and I identified it last night. Can we get a seperate ticket for it though?
Just one line: