Django

Code

Ticket #3214 (reopened)

Opened 3 years ago

Last modified 4 days ago

[patch] raw sql file doesn't recognize quotes correctly

Reported by: shaunc <shaun@cuttshome.net> Assigned to: nobody
Milestone: Component: Database layer (models, ORM)
Version: SVN Keywords: rawsql
Cc: shaun@cuttshome.net, sam@robots.org.uk, andrehcampos@gmail.com, aseering@mit.edu Triage Stage: Accepted
Has patch: 1 Needs documentation: 0
Needs tests: 1 Patch needs improvement: 0

Description

Suppose I have an app foo with a model Bar. in foo/sql/bar.postgresql_psycopg2.sql I have:

CREATE OR REPLACE FUNCTION test() RETURNS trigger LANGUAGE plpythonu AS $$
  '''this will have problems if it contains an embedded ;
  '''
$$;

This generates the following error on syncdb:

sql /* cranedata.db.search.sql.searchmodel.postgresql_psycopg2
 *
 * Setup for 'SearchModel':
 *   want to have a trigger on SearchModels which sets up a trigger
 *   on models searched, such that when rows of these models are
 *   modified, text fields are automatically indexed.
 */

sql CREATE OR REPLACE FUNCTION test() RETURNS trigger LANGUAGE plpythonu AS $$
  '''this will have problems if it contains an embedded ;
params ()
Failed to install initial SQL data for search.SearchModel model: unterminated dollar-quoted string a
t or near "$$
  '''this will have problems if it contains an embedded ;" at character 355

Attachments

sql-statement-iterator.diff (6.0 kB) - added by shaunc <shaun@cuttshome.net> on 01/19/07 10:30:24.
patch, introduces 'sql-statement-iterator' which correctly deals with quotes
sql-statement-iterator-updated.diff (5.9 kB) - added by aseering@mit.edu on 08/10/08 11:02:52.
Updated version of shaunc's patch, that works with Django SVN as of r8255

Change History

12/30/06 23:37:56 changed by shaunc <shaun@cuttshome.net>

IMHO the sql file should be just fed all at once to a cursor, rather than django attempting to parse it and feeding in pieces.

The python DB API 2.0 doc is a bit hazy on this point, but:

>>> cursor.execute( "create table a ( id int ); insert into a ( id ) values ( 1 ); select * from a;" )
>>> cursor.fetchall()
[(1,)]

seems to show that multiple statements count as one "operation", at least using postgresql_psycopg2

- Shaun

01/17/07 10:42:54 changed by shaunc <shaun@cuttshome.net>

The problem is the following, in django.core.management:341:

    # Some backends can't execute more than one SQL statement at a time,
    # so split into separate statements.
    statements = re.compile(r";[ \t]*$", re.M)

This may be a fine workaround if the backend really doesn't deal with multiple statements, but it breaks things that have ';' in quotes. Easiest fix would be for the db to support a property: "SUPPORTS_MULTIPLE_STATEMENTS_IN_EXECUTE" or whatever, and only split if necessary.

01/19/07 10:30:24 changed by shaunc <shaun@cuttshome.net>

  • attachment sql-statement-iterator.diff added.

patch, introduces 'sql-statement-iterator' which correctly deals with quotes

01/19/07 10:38:00 changed by shaunc <shaun@cuttshome.net>

  • has_patch set to 1.

I've attached a patch for this bug: with patch management.py will use utility "sql_statement_iterator" which splits sql into statements more carefully, respecting quotes and comments.

Also, it will change freestanding '%' to double -- otherwise these cause an error in management.py, as python DB2.0 spec treats all non-doubled percent signs as substitution targets, which isn't wanted in this situation.

With this fix I'm able to use pg_dump (with appropriate flags to use 'insert' rather than 'copy') to dump old tables, then cut and paste resulting inserts (and set sequence) into initial data.

utils/sqltools.py, where "sql_statement_iterator" lives, has doctests. Ideally I would have written unit tests for both this and new management, but, ... well maybe I'll get to it after I deploy next edition of our website :).

01/19/07 14:59:18 changed by Simon G. <dev@simon.net.nz>

  • keywords set to rawsql.
  • summary changed from raw sql file doesn't recognize quotes correctly to [patch] raw sql file doesn't recognize quotes correctly.
  • needs_tests set to 1.
  • stage changed from Unreviewed to Design decision needed.

02/20/07 00:03:01 changed by shaunc <shaun@cuttshome.net>

  • cc set to shaun@cuttshome.net.

I understand this is been relegated to Tumbolia for the moment. If it ends up "accepted", please do ping me, and if I have any time at all, I'll write whatever unit tests are required. (Note: already includes doctest of sql_statement_iterator itself.)

10/03/07 08:54:22 changed by anonymous

  • cc changed from shaun@cuttshome.net to shaun@cuttshome.net, sam@robots.org.uk.

10/03/07 09:05:54 changed by Sam Morris <sam@robots.org.uk>

As a workaround, multi-line SQL statements have to have someting other than whitespace between their semicolons and newline characters. For example:

 CREATE OR REPLACE FUNCTION create_post () RETURNS trigger AS $$
 BEGIN
  NEW.date_created := current_timestamp; --
  RETURN NEW; --
 END;
$$ LANGUAGE plpgsql;

12/02/07 14:14:09 changed by jacob

  • version set to SVN.

12/02/07 22:47:07 changed by shaunc <shaun@cuttshome.net>

Note that the patch requires a slight reorg. to work now that management.py has been changed....

Again, if the powers that be accept this, I'll do my best to get it into shape, as I'm still using it myself.

08/05/08 12:12:12 changed by anonymous

  • cc changed from shaun@cuttshome.net, sam@robots.org.uk to shaun@cuttshome.net, sam@robots.org.uk, andrehcampos@gmail.com.

08/10/08 11:02:52 changed by aseering@mit.edu

  • attachment sql-statement-iterator-updated.diff added.

Updated version of shaunc's patch, that works with Django SVN as of r8255

08/10/08 11:18:04 changed by aseering@mit.edu

  • cc changed from shaun@cuttshome.net, sam@robots.org.uk, andrehcampos@gmail.com to shaun@cuttshome.net, sam@robots.org.uk, andrehcampos@gmail.com, aseering@mit.edu.

This bug currently also causes multiline strings to fail to parse, at least for me with PostgreSQL 8.3. This means that Sam Morris's workaround (above) doesn't work for me; as far as I know, there's no raw-SQL way of inserting stored procedures into a database? I could use a Signals hook; but that doesn't feel particularly DRY to me...

I've submitted an updated patch that works for me with Django SVN. I'm not yet very familiar with the new management-module layout; let me know if I did something silly.

I'm quite interested in seeing this bug resolved; I'm also glad to do some coding if it would help.

Django devs: It seems that there's the will and expertise to fix this bug. Is there any chance someone could spare a few moments to make a decision on it? (or, does anyone know of an alternate workaround?) Thanks; I'd certainly appreciate it!

08/10/08 11:27:47 changed by mtredinnick

  • status changed from new to closed.
  • resolution set to wontfix.

Due to variations between all the different SQL variants accepted by the backends we support (and others provide externally), as well as the complexity of SQL (and we aren't going to include a full SQL parser), it's really chasing rainbows to try and make complex stuff work here.

I think this is a situation we just aren't going to handle directly, since the hooks now exist to do it on a case-by-case basis in client code. If you want to pass in complex SQL, catch the post-syncdb signal and work directly with a django.db.backend.cursor. You will know precisely what database you're talking to and can use the appropriate syntax and calling mechanisms.

08/10/08 12:13:02 changed by shaun cutts <shaun@cuttshome.net>

Grumble grumble...

IMO, it is perfectly reasonable to say "its not our business to understand SQL". But then you shouldn't be trying to break the SQL down into statements. Rather, pass it through as a chunk. If this causes problems, then -- if you want to -- patch (hack) it in the backends that have problems. Or users of those back ends should get their db engines fixed.

-- Shaun Cutts

(follow-up: ↓ 16 ) 08/10/08 12:28:49 changed by mtredinnick

Yes, if the solution was the simple we would be doing it. Not all database wrappers permit multiple statements in one call. So the current stuff works for them and your stuff is possible. Win for everybody.

(follow-up: ↓ 17 ) 08/10/08 12:35:12 changed by aseering@mit.edu

Curious: Does anyone watching this ticket, know offhand which DB backends/engines actually don't support multiple statements? I tend to agree with Shaun that the best long-term fix is to add support into the lower-level code.

Also, mtredinnick: Shaun's solution sounds pretty simple to me. Has it been tried? If not, would you consider accepting a patch that implements it?

(in reply to: ↑ 14 ) 08/13/08 00:46:20 changed by shaun cutts <shaun@cuttshome.net>

Replying to mtredinnick:

Yes, if the solution was the simple we would be doing it. Not all database wrappers permit multiple statements in one call. So the current stuff works for them and your stuff is possible. Win for everybody.

What I meant to suggest is: do just what you're doing now, but only do it if the db wrapper doesn't support multiple statements. (seconding aseering's question) Would you consider a patch that did this?

(in reply to: ↑ 15 ; follow-up: ↓ 18 ) 10/02/09 08:37:47 changed by akaihola

Replying to aseering@mit.edu:

Curious: Does anyone watching this ticket, know offhand which DB backends/engines actually don't support multiple statements? I tend to agree with Shaun that the best long-term fix is to add support into the lower-level code.

I tested PostgreSQL, SQLite and MySQL on Ubuntu 9.04, and SQLite was the only one not supporting multiple statements.

Would it make sense to add a

django.backends.*.DatabaseFeatures.accepts_multiple_statements

attribute and do the statements.split() dance only when needed?

(in reply to: ↑ 17 ) 02/05/10 09:51:59 changed by russellm

  • status changed from closed to reopened.
  • resolution deleted.
  • stage changed from Design decision needed to Accepted.

Replying to akaihola:

Replying to aseering@mit.edu:

Curious: Does anyone watching this ticket, know offhand which DB backends/engines actually don't support multiple statements? I tend to agree with Shaun that the best long-term fix is to add support into the lower-level code.

I tested PostgreSQL, SQLite and MySQL on Ubuntu 9.04, and SQLite was the only one not supporting multiple statements. Would it make sense to add a {{{ django.backends.*.DatabaseFeatures?.accepts_multiple_statements }}} attribute and do the statements.split() dance only when needed?

Reopening (and accepting) on the basis of this suggestion. This sounds workable to me, although it's the python binding version that is important, not the database version itself.

(follow-up: ↓ 20 ) 02/05/10 09:54:43 changed by russellm

Caveat - I'm accepting the "run bulk SQL as a single block if possible" idea. I'd rather see a 'execute_bulk_sql' call on the DatabaseOperations? class than a boolean flag. Making it an overridable method on the backend means that the regular expressions/splitting technique can be modified on a per-backend basis.

(in reply to: ↑ 19 ) 02/05/10 10:15:03 changed by shauncutts

Replying to russellm:

Souunds good to me... -- Thanks, Russell!


Add/Change #3214 ([patch] raw sql file doesn't recognize quotes correctly)




Change Properties
Action