Opened 6 years ago

Closed 5 years ago

Last modified 4 years ago

#11863 closed (fixed)

Add a method to the orm to create Model instances from raw sql queries

Reported by: seanoc Owned by: seanoc
Component: Database layer (models, ORM) Version: master
Severity: Keywords:
Cc: alexkoshelev, info@…, miracle2k, isometry, jdunck@… Triage Stage: Unreviewed
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

There should be a mechanism to easily convert the result of a raw sql query into model instances. It is currently possible to write code to generate model instances from raw sql query results but its a fair about of boring boiler plate code.

Attachments (5)

version_1.diff (14.0 KB) - added by seanoc 6 years ago.
version_2.diff (16.9 KB) - added by seanoc 5 years ago.
11863-rc1.diff (25.9 KB) - added by jacob 5 years ago.
11863-rc2.diff (25.3 KB) - added by jacob 5 years ago.
11863-rc3.diff (27.4 KB) - added by russellm 5 years ago.
Same as RC2, but with deferred fields.

Download all attachments as: .zip

Change History (34)

comment:1 Changed 6 years ago by seanoc

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Status changed from new to assigned
  • Version set to SVN

comment:2 Changed 6 years ago by dc

Just one line:

cursor.execute(query, args)
values = cursor.fetchrow()

model = Model(*values)  # "boring boiler plate code"

comment:3 Changed 6 years ago by seanoc

That's assuming that the order of the fields coming back from your query match the order of your model's fields. There's value in providing something which can be a bit more robust and flexible (i.e. adding "extra" fields as properties. Additionally providing a supported and documented method of handling this process for newer users holds significant value.

comment:4 Changed 6 years ago by lukeplant

If it's possible to write this generically at all, then you don't actually have to write any boilerplate - you can already write a utility function that will do it for any model (and it doesn't have to be method on Model). It also sounds like you've got specific requirements in mind with your extra features. Why don't you come up with a function that does what you want, and then we can see if it's likely to get into Django? It may be that *some* of the things you want would definitely *not* make it, in which case there is no point adding a feature that is not even sufficient for the one person who's asked for it.

comment:5 Changed 6 years ago by seanoc

So after some testing and research here's an example of the API I'd propose:

import ResultHandler
from library import Author, Book

# Basic use
query = "select id, first_name, last_name from library_author"
authors = ResultHandler(Author, query)

# Extended use
query = """select id, first_name as first, last_name as last, count(select * from library_book...) as book_count 
           from library_author where first ilike %s and last ilike %s"""
params = ('John', 'Doe')
translations = (
    ('first', 'first_name'),
    ('last', 'last_name'),
)                
authors = ResultHandler(Author, query, params, translations)

In the basic use, authors would be a iterable which would return normal model instances. The extended use example authors would be an iterable of model instances with an extra property called book_count. Obviously this example is a bit contrived since this could be accomplished via the ORM but it illustrates the API I'm proposing.

I've confirmed that it is possible to figure out the result field to model field translation independent of field ordering.

Let me know what you think of the API and it's appropriateness for Django proper. I'd be happy to package it in the form of a Django patch or a third party app once my implementation is complete.

comment:6 Changed 6 years ago by seanoc

Had a discussion with RKM and worked out a bit of a variation which should work. Unfortunately the final implementation will depend on some decisions being made around multi-db support so it will be a little while before a full patch will be ready.

comment:7 Changed 6 years ago by seanoc

For anybody interested you can track my progress at http://github.com/SeanOC/django. Once the code reaches a more officially reviewable state I will post a patch here.

Changed 6 years ago by seanoc

comment:8 Changed 6 years ago by seanoc

  • Has patch set
  • Needs documentation set

I've added an initial patch for review. This patch should work against trunk ([11597]) and will line-up with the code on github. At this time there is no documentation completed but there are basic unit tests written. Any feedback would be awesome.

comment:9 Changed 6 years ago by alexkoshelev

  • Cc alexkoshelev added

comment:10 Changed 5 years ago by anonymous

  • Cc info@… added

comment:11 Changed 5 years ago by jacob

Thanks for the promising good start, Sean! I'm going to tell you what's wrong with your patch in a sec, but really this is a great beginning, so thanks.

So, here's my review. First, things to fix:

  • Making kwargs and annotations available to the test suite is kinda gross. I don't think the tests really need to know that info: they just need to validate that the raw query works.
  • The signature of raw() will, I think, promote bad SQL practice and could lead to SQL injection. Specifically, the standard DB-API interface is execute(query, params), but for raw()it'd be raw(query, params=params). That's a minor difference, but raw(query % params) is now shorter, so people'll use it. Please change the API of raw() to be raw(query, params, **kwargs).
  • Please spell-check your comments.
  • Naming nit: no need for the Exception part of InsufficiantFieldsException; InsufficiantFields will suffice. Ditto for InvalidQueryException.
  • When you check for a SELECT query, make sure to strip() the query string (query.py line 57).
  • Please use fixtures in tests instead of that big setUp.


Second, some questions and further things to think about;

  • Do you think RawQuerySet should cache results the same way QuerySet does? I think not, but we'll need to explain why that'd be.
  • Have you thought about integrating this with the lazy field support (QuerySet.defer()/QS.only())? Seems like a logical thing to support.
  • Is cursor.description supported across all Django's supported database backends (I think it's not). If not, the same API can still work -- probably by supplying a complete translations dict -- but we need to avoid erroring out if description isn't available.
  • This has some implications w/r/t Alex Gaynor's multi-db work. You might want to try to get his thoughts on how you're handling connection and how that'll need to change.


Again, thanks -- this is good work, and I think it's pretty close to okay for checkin.

comment:12 Changed 5 years ago by seanoc

Thanks for the feedback Jacob! I'm going to be swamped with work for the next week or two but getting these worked into the patch are the first thing on my queue once work lets up a bit. In the mean time here's some quick responses:

  • Making kwargs and annotations available to the test suite is kinda gross. I don't think the tests really need to know that info: they just need to validate that the raw query works.

I tend to agree, the main reason I did this was during initial development there were situations where real fields were being added as annotations instead of being passed into the __init__ of the model. Since this could have some significant side effects I wanted to make sure that I caught it happening going forward and this was the easiest way I thought of to test it. I'll rework it so that those are no longer unneccesarrily exposed and develop a test with a model which will error if it doesn't receive all of it's fields in it's __init__.

  • The signature of raw() will, I think, promote bad SQL practice and could lead to SQL injection. Specifically, the standard DB-API interface is execute(query, params), but for raw()it'd be raw(query, params=params). That's a minor difference, but raw(query % params) is now shorter, so people'll use it. Please change the API of raw() to be raw(query, params, kwargs).

Good call. Part of the goal here was to keep things relatively non-SQL specific so that the .raw() method could be implemented for non-relational back-ends. That being said I think the query+params best practice is common enough across systems which would allow something resembling a raw query that this makes sense.

  • Please spell-check your comments.
  • Naming nit: no need for the Exception part of InsufficiantFieldsException; InsufficiantFields will suffice. Ditto for InvalidQueryException.
  • When you check for a SELECT query, make sure to strip() the query string (query.py line 57).
  • Please use fixtures in tests instead of that big setUp.

Will do on all of these

  • Do you think RawQuerySet should cache results the same way QuerySet does? I think not, but we'll need to explain why that'd be.

I've gone a bit back and forth on this one. To be honest the main reason it doesn't right now is I've just been focusing on getting things working and getting the API hammered out. At this point I'm inclined to try and be consistent with QuerySet's behavior and to cache results. Then if somebody wants to hit the database again, they would need to be explicit and call raw() again to get a new RawQuerySet. If anybody has a strong feeling/argument either way I'd be very happy to hear it. This will probably be one of the last bits I implement in the patch.

  • Have you thought about integrating this with the lazy field support (QuerySet.defer()/QS.only())? Seems like a logical thing to support.

Yes I have but so far I've really focused on just getting things to work and to get the API worked out. I've basically been treating this as a "nice to have" aspect to the patch. Once I get your other recommendations worked in and a first pass at the documentation complete I'll dig into what work would be required to implement this.

  • Is cursor.description supported across all Django's supported database backends (I think it's not). If not, the same API can still work -- probably by supplying a complete translations dict -- but we need to avoid erroring out if description isn't available.

That's a very good question. I'll try to do some testing/research to find out. Either way I'll look at ways to handle cursor.description not being supported in a particular backend.

  • This has some implications w/r/t Alex Gaynor's multi-db work. You might want to try to get his thoughts on how you're handling connection and how that'll need to change.

Alex, Russel, and I talked about this a bit during the Djangocon sprints. Their advice at the time was to go forward with developing against Django trunk. Once Alex's multi-db work hits trunk we'll look at what will be needed to make sure that .raw() uses the right connection to get a cursor. In generally tho it should be a relatively small change. To be safe I'll be sure that when I go to work in these recommendations, I'll do a review of how I'm handling the cursor and start to plan for the change.

comment:13 Changed 5 years ago by jacob

Sounds good; I'll keep an eye out for your new patch(es).

I think the best plan for now, then, is to try to get this done in as simple way as possible -- no caching, no lazy fields. We can always add those things in later if it seems they're needed.

comment:14 Changed 5 years ago by jacob

Another problem to think about:

The tests currently fail against SQLite because RawQuery.__len__ returns cursor.rowcount, which is quirky on SQLite (it returns -1 if the query hasn't been executed yet). We'll need to figure out a way around that. It's likely that the solution involves caching the results so that calling __len__ and then __iter__ (or vice versa) doesn't cause the query to run twice.

comment:15 Changed 5 years ago by miracle2k

  • Cc miracle2k added

Changed 5 years ago by seanoc

comment:16 Changed 5 years ago by seanoc

I've attached an updated version of the patch which should apply cleanly to the current head of trunk ([11743]). Here's a list of the updates in the patch:

  • kwargs and annotations are now no longer exposed for the unit tests. Proper handling of arguments vs. annotations is tested via tests in the init method of one of the test models.
  • The signature of .raw() has been fixed as suggested.
  • Comments have had their spelling errors fixed.
  • Exception names have been fixed.
  • The query type check now performs a .strip() to avoid errors due to silly whitespace.
  • Tests now load test data using fixtures instead of a big setUp method.
  • Tests now run clean against Postgres, MySQL, and SQLite (I haven't had time to setup an oracle db to test against yet).
  • sql.RawQuery now does result caching and will use the len() of the cached results when SQLite is quirky with cursor.rowcount.

I've also done some research into cursor.description behavior across DB-API implementations. As far as I can tell, the only oddness is that SQLite only populates the first value of the seven tuple expected by the DB-API spec. I've observed this behavior and it is mentioned in the docs. Fortunately, this isn't a problem for us right now since the first value (column name) is the only one we care about.

Something which definitely could use some review and feedback is the result caching. Currently the caching is performed in sql.query.RawQuery and it will call cursor.fetchall() as soon as data is required. It may make sense to move the caching up to models.query.RawQueryset to save the work of building model instances with each iteration. The problem with moving the caching up to RawQueryset is that then mulitple iterations and reliable use of len() in RawQuery won't be possible. Accordingly it may make sense to cache in both. It may also make sense to move to a more "lazy" population of the cache using cursor.fetchone() or cursor.fetchmultiple() but I'm not sure that the gains would be worth the added complexity.

From here the main thing I need to work on for this patch to be fully ready is documentation. Any feedback on this latest patch would be greatly appreciated.

comment:17 Changed 5 years ago by seanoc

For anybody who was following this work on github, I needed to change my workflow a bit. You can now find the code at http://github.com/SeanOC/django/tree/ticket11863.

comment:18 Changed 5 years ago by simon

Looking great so far. Just one comment - the default repr of a RawQuerySet is a little jarring, I thought for a moment something was broken or there was a logging message left in the code. I'd suggest something like <RawQuerySet: "select * from blog_entry"> instead.

comment:19 Changed 5 years ago by isometry

  • Cc isometry added

Changed 5 years ago by jacob

comment:20 Changed 5 years ago by jacob

I've attached the patch I'm proposing for addition. The change I've made from Sean's most recent patch:

  • Added documentation.
  • Made raw() queries lazy -- they're not executed until iterated over.
  • Made translations into a dict instead of list-of-2-tuples.
  • And some misc. cleanups to docstrings, variable names, etc.

Changed 5 years ago by jacob

comment:22 Changed 5 years ago by bjunix

There is a minor typo at line 188 of the rc3 patch. I guess it should say:

'connection.cursor directly for other types of queries.')

comment:23 Changed 5 years ago by jacob

Russ - it appears your latest patch is missing the tests. If you'll upload a new one I'll take a look and then commit it, or else feel free to commit yourself.

Changed 5 years ago by russellm

Same as RC2, but with deferred fields.

comment:24 Changed 5 years ago by jacob

  • Resolution set to fixed
  • Status changed from assigned to closed

(In [11921]) Fixed #11863: added a Model.objects.raw() method for executing raw SQL queries and yield models.

See docs/topics/db/raw.txt for details.

Thanks to seanoc for getting the ball rolling, and to Russ for wrapping things up.

comment:25 Changed 5 years ago by anonymous

Maybe I'm missing it, or maybe the intent is to add it later, but I couldn't find docs/topics/db/raw.txt.

comment:26 Changed 5 years ago by miracle2k

  • Resolution fixed deleted
  • Status changed from closed to reopened

I believe this is broken, I am seeing a TypeError when evaluating a raw query:

TypeError: iter() returned non-iterator of type 'CursorWrapper'

Fix here:

http://github.com/miracle2k/django/commit/20985dbb958ebdcf5f2b5c5492f07b5258a0b5ff

comment:27 Changed 5 years ago by Alex

  • Resolution set to fixed
  • Status changed from reopened to closed

Yes, that's a real issue, Russell and I identified it last night. Can we get a seperate ticket for it though?

comment:28 Changed 5 years ago by russellm

For the record - [11968] fixed the problem reported by miracle2k.

comment:29 Changed 4 years ago by jacob

  • milestone 1.2 deleted

Milestone 1.2 deleted

Note: See TracTickets for help on using tickets.
Back to Top