Opened 17 years ago

Closed 10 years ago

Last modified 10 years ago

#3214 closed Bug (fixed)

[patch] raw sql file doesn't recognize quotes correctly

Reported by: shaunc <shaun@…> Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Release blocker Keywords: rawsql migrations
Cc: shaun@…, sam@…, andrehcampos@…, aseering@…, walter+django@…, vlastimil.zima@…, Christophe Pettus, sbarre, vlastimil@…, 4glitch@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Suppose I have an app foo with a model Bar.
in foo/sql/bar.postgresql_psycopg2.sql I have:

CREATE OR REPLACE FUNCTION test() RETURNS trigger LANGUAGE plpythonu AS $$
  '''this will have problems if it contains an embedded ;
  '''
$$;

This generates the following error on syncdb:

sql /* cranedata.db.search.sql.searchmodel.postgresql_psycopg2
 *
 * Setup for 'SearchModel':
 *   want to have a trigger on SearchModels which sets up a trigger
 *   on models searched, such that when rows of these models are
 *   modified, text fields are automatically indexed.
 */

sql CREATE OR REPLACE FUNCTION test() RETURNS trigger LANGUAGE plpythonu AS $$
  '''this will have problems if it contains an embedded ;
params ()
Failed to install initial SQL data for search.SearchModel model: unterminated dollar-quoted string a
t or near "$$
  '''this will have problems if it contains an embedded ;" at character 355

Attachments (2)

sql-statement-iterator.diff (6.0 KB ) - added by shaunc <shaun@…> 17 years ago.
patch, introduces 'sql-statement-iterator' which correctly deals with quotes
sql-statement-iterator-updated.diff (5.9 KB ) - added by aseering@… 16 years ago.
Updated version of shaunc's patch, that works with Django SVN as of r8255

Download all attachments as: .zip

Change History (54)

comment:1 by shaunc <shaun@…>, 17 years ago

IMHO the sql file should be just fed all at once to a cursor, rather than django attempting to parse it and feeding in pieces.

The python DB API 2.0 doc is a bit hazy on this point, but:

>>> cursor.execute( "create table a ( id int ); insert into a ( id ) values ( 1 ); select * from a;" )
>>> cursor.fetchall()
[(1,)]

seems to show that multiple statements count as one "operation", at least using postgresql_psycopg2

  • Shaun

comment:2 by shaunc <shaun@…>, 17 years ago

The problem is the following, in django.core.management:341:

    # Some backends can't execute more than one SQL statement at a time,
    # so split into separate statements.
    statements = re.compile(r";[ \t]*$", re.M)

This may be a fine workaround if the backend really doesn't deal with multiple statements, but it breaks things that have ';' in quotes. Easiest fix would be for the db to support a property: "SUPPORTS_MULTIPLE_STATEMENTS_IN_EXECUTE" or whatever, and only split if necessary.

by shaunc <shaun@…>, 17 years ago

Attachment: sql-statement-iterator.diff added

patch, introduces 'sql-statement-iterator' which correctly deals with quotes

comment:3 by shaunc <shaun@…>, 17 years ago

Has patch: set

I've attached a patch for this bug: with patch management.py will use utility "sql_statement_iterator" which splits sql into statements more carefully, respecting quotes and comments.

Also, it will change freestanding '%' to double -- otherwise these cause an error in management.py, as python DB2.0 spec treats all non-doubled percent signs as substitution targets, which isn't wanted in this situation.

With this fix I'm able to use pg_dump (with appropriate flags to use 'insert' rather than 'copy') to dump old tables, then cut and paste resulting inserts (and set sequence) into initial data.

utils/sqltools.py, where "sql_statement_iterator" lives, has doctests. Ideally I would have written unit tests for both this and new management, but, ... well maybe I'll get to it after I deploy next edition of our website :).

comment:4 by Simon G. <dev@…>, 17 years ago

Keywords: rawsql added
Needs tests: set
Summary: raw sql file doesn't recognize quotes correctly[patch] raw sql file doesn't recognize quotes correctly
Triage Stage: UnreviewedDesign decision needed

comment:5 by shaunc <shaun@…>, 17 years ago

Cc: shaun@… added

I understand this is been relegated to Tumbolia for the moment. If it ends up "accepted", please do ping me, and if I have any time at all, I'll write whatever unit tests are required. (Note: already includes doctest of sql_statement_iterator itself.)

comment:6 by anonymous, 16 years ago

Cc: sam@… added

comment:7 by Sam Morris <sam@…>, 16 years ago

As a workaround, multi-line SQL statements have to have someting other than whitespace between their semicolons and newline characters. For example:

 CREATE OR REPLACE FUNCTION create_post () RETURNS trigger AS $$
 BEGIN
  NEW.date_created := current_timestamp; --
  RETURN NEW; --
 END;
$$ LANGUAGE plpgsql;

comment:8 by Jacob, 16 years ago

Version: SVN

comment:9 by shaunc <shaun@…>, 16 years ago

Note that the patch requires a slight reorg. to work now that management.py has been changed....

Again, if the powers that be accept this, I'll do my best to get it into shape, as I'm still using it myself.

comment:10 by anonymous, 16 years ago

Cc: andrehcampos@… added

by aseering@…, 16 years ago

Updated version of shaunc's patch, that works with Django SVN as of r8255

comment:11 by aseering@…, 16 years ago

Cc: aseering@… added

This bug currently also causes multiline strings to fail to parse, at least for me with PostgreSQL 8.3. This means that Sam Morris's workaround (above) doesn't work for me; as far as I know, there's no raw-SQL way of inserting stored procedures into a database? I could use a Signals hook; but that doesn't feel particularly DRY to me...

I've submitted an updated patch that works for me with Django SVN. I'm not yet very familiar with the new management-module layout; let me know if I did something silly.

I'm quite interested in seeing this bug resolved; I'm also glad to do some coding if it would help.

Django devs: It seems that there's the will and expertise to fix this bug. Is there any chance someone could spare a few moments to make a decision on it? (or, does anyone know of an alternate workaround?) Thanks; I'd certainly appreciate it!

comment:12 by Malcolm Tredinnick, 16 years ago

Resolution: wontfix
Status: newclosed

Due to variations between all the different SQL variants accepted by the backends we support (and others provide externally), as well as the complexity of SQL (and we aren't going to include a full SQL parser), it's really chasing rainbows to try and make complex stuff work here.

I think this is a situation we just aren't going to handle directly, since the hooks now exist to do it on a case-by-case basis in client code. If you want to pass in complex SQL, catch the post-syncdb signal and work directly with a django.db.backend.cursor. You will know precisely what database you're talking to and can use the appropriate syntax and calling mechanisms.

comment:13 by shaun cutts <shaun@…>, 16 years ago

Grumble grumble...

IMO, it is perfectly reasonable to say "its not our business to understand SQL". But then you shouldn't be trying to break the SQL down into statements. Rather, pass it through as a chunk. If this causes problems, then -- if you want to -- patch (hack) it in the backends that have problems. Or users of those back ends should get their db engines fixed.

-- Shaun Cutts

comment:14 by Malcolm Tredinnick, 16 years ago

Yes, if the solution was the simple we would be doing it. Not all database wrappers permit multiple statements in one call. So the current stuff works for them and your stuff is possible. Win for everybody.

comment:15 by aseering@…, 16 years ago

Curious: Does anyone watching this ticket, know offhand which DB backends/engines actually don't support multiple statements? I tend to agree with Shaun that the best long-term fix is to add support into the lower-level code.

Also, mtredinnick: Shaun's solution sounds pretty simple to me. Has it been tried? If not, would you consider accepting a patch that implements it?

in reply to:  14 comment:16 by shaun cutts <shaun@…>, 16 years ago

Replying to mtredinnick:

Yes, if the solution was the simple we would be doing it. Not all database wrappers permit multiple statements in one call. So the current stuff works for them and your stuff is possible. Win for everybody.

What I meant to suggest is: do just what you're doing now, but only do it if the db wrapper doesn't support multiple statements. (seconding aseering's question) Would you consider a patch that did this?

in reply to:  15 ; comment:17 by Antti Kaihola, 14 years ago

Replying to aseering@mit.edu:

Curious: Does anyone watching this ticket, know offhand which DB backends/engines actually don't support multiple statements? I tend to agree with Shaun that the best long-term fix is to add support into the lower-level code.

I tested PostgreSQL, SQLite and MySQL on Ubuntu 9.04, and SQLite was the only one not supporting multiple statements.

Would it make sense to add a

django.backends.*.DatabaseFeatures.accepts_multiple_statements

attribute and do the statements.split() dance only when needed?

in reply to:  17 comment:18 by Russell Keith-Magee, 14 years ago

Resolution: wontfix
Status: closedreopened
Triage Stage: Design decision neededAccepted

Replying to akaihola:

Replying to aseering@mit.edu:

Curious: Does anyone watching this ticket, know offhand which DB backends/engines actually don't support multiple statements? I tend to agree with Shaun that the best long-term fix is to add support into the lower-level code.

I tested PostgreSQL, SQLite and MySQL on Ubuntu 9.04, and SQLite was the only one not supporting multiple statements.

Would it make sense to add a

django.backends.*.DatabaseFeatures.accepts_multiple_statements

attribute and do the statements.split() dance only when needed?

Reopening (and accepting) on the basis of this suggestion. This sounds workable to me, although it's the python binding version that is important, not the database version itself.

comment:19 by Russell Keith-Magee, 14 years ago

Caveat - I'm accepting the "run bulk SQL as a single block if possible" idea. I'd rather see a 'execute_bulk_sql' call on the DatabaseOperations class than a boolean flag. Making it an overridable method on the backend means that the regular expressions/splitting technique can be modified on a per-backend basis.

in reply to:  19 comment:20 by Shaun Cutts, 14 years ago

Replying to russellm:

Souunds good to me... -- Thanks, Russell!

comment:21 by Walter Doekes, 14 years ago

Cc: walter+django@… added

As a workaround, multi-line SQL statements have to have someting other than whitespace between their semicolons and newline characters. For example:

> CREATE OR REPLACE FUNCTION create_post () RETURNS trigger AS $$
>  BEGIN
>   NEW.date_created := current_timestamp; --
>   RETURN NEW; --
>  END;
> $$ LANGUAGE plpgsql;

Is that a bug or a feature?

I was about to file this patch because I was getting quite puzzling "Command out of sync error"s because I had a bit of comment after a statement:

--- django/core/management/sql.py.orig	2010-02-25 12:32:04.660976707 +0100
+++ django/core/management/sql.py	2010-02-25 12:38:06.240954572 +0100
@@ -174,7 +174,7 @@
 
     # Some backends can't execute more than one SQL statement at a time,
     # so split into separate statements.
-    statements = re.compile(r";[ \t]*$", re.M)
+    statements = re.compile(r";[ \t]*(?:--.*)?$", re.M)
 
     # Find custom SQL, if it's available.
     sql_files = [os.path.join(app_dir, "%s.%s.sql" % (opts.object_name.lower(), settings.DATABASE_ENGINE)),

But that fix will break your feature.

comment:22 by Vlastimil Zíma, 13 years ago

Cc: vlastimil.zima@… added

comment:23 by Łukasz Rekucki, 13 years ago

Severity: normalNormal
Type: defectBug

comment:24 by Aymeric Augustin, 12 years ago

UI/UX: unset

Change UI/UX from NULL to False.

comment:25 by Aymeric Augustin, 12 years ago

Easy pickings: unset

Change Easy pickings from NULL to False.

comment:26 by Christophe Pettus, 12 years ago

Cc: Christophe Pettus added

comment:27 by Aymeric Augustin, 11 years ago

Status: reopenednew

comment:28 by sbarre, 11 years ago

Cc: sbarre added

comment:29 by ris, 11 years ago

Can we get this fixed at some point please? It seems quite straightforward now and it causes daily pain to so many people.

comment:30 by Russell Keith-Magee, 11 years ago

Sure - it will be fixed as soon as someone presents a patch. There's a call to action in Comment 18 and Comment 19, but nobody has presented a patch that implements that approach AFAICT from the rest of the ticket discussion.

If someone wants this, you're going to have to write the patch.

comment:31 by Claude Paroz, 11 years ago

Needs tests: unset

comment:32 by Claude Paroz, 11 years ago

Question raised by Mark on the PR: With the migrations support, are .sql files even needed any more?

in reply to:  32 comment:33 by Vlastimil Zíma, 11 years ago

Replying to claudep:

Question raised by Mark on the PR: With the migrations support, are .sql files even needed any more?

I have not check the entire migration and south documentation, but what about custom restraints, triggers and functions?

comment:34 by Vlastimil Zíma, 11 years ago

Cc: vlastimil@… added

comment:35 by Dmitri Bogomolov <4glitch@…>, 11 years ago

Cc: 4glitch@… added

comment:36 by Claude Paroz, 11 years ago

Patch needs improvement: set

Repeating Andrew's answer on the pull request:
They should go away, there's a RunSQL operation you can use in migrations instead (that accepts either single statements or mutiple ones with its own internal splitter regex). I didn't get around to removing the support for them on syncdb-type apps, but they'll be ignored if you have migrations.

We should therefore test (sql like this) and document the new process replacing initial sql before closing this ticket (or close this ticket and open new one).

comment:37 by Tim Graham, 10 years ago

How will the migrations framework handle Database-backend-specific SQL data?

comment:38 by Andrew Godwin, 10 years ago

Yep, this same bug will still apply to the splitter inside RunSQL (though I believe it's marginally better than the old Django one).

timo: Database-specific SQL data will only work with apps without migrations; those with migrations are expected to achieve the same result using a migration to load in the data, which is much more portable and forwards-compatible.

comment:39 by Aymeric Augustin, 10 years ago

At least parts of this must have been fixed in #22401, see 071c9337750b296d198cced56f3ffad0e176afb6.

comment:40 by Walter Doekes, 10 years ago

Assuming sqlparse does the job, this can indeed be closed. Thanks!

comment:41 by Aymeric Augustin, 10 years ago

Has patch: unset
Keywords: migrations added
Patch needs improvement: unset
Severity: NormalRelease blocker

This ticket never tells which backend support multiple statements in cursor.execute() and which don't. It would be interesting to check that on the four officially supported databases before going any further...


If indeed some backends need splitting, in order to close this ticket, we must implement a similar approach as #22401 in RunSQL._split_sql.

Splitting occurs only when explicitly setting multiple=True, which Django itself never does, even in the test suite. We should add a test for that (and skip it when sqlparse isn't available.

The patch will be easier if we include it in 1.7 because we won't need a deprecation path. Therefore, marking as a release blocker.

comment:42 by Claude Paroz, 10 years ago

SQLite: needs splitting
PostgreSQL: does not need splitting
MySQL: does not need splitting, but has issues when some multiline SQL results are not read (cursor.nextset() has to be used to "consume" results).
Oracle: ?

comment:43 by Aymeric Augustin, 10 years ago

It would be interesting to avoid the dependency on sqlparse for backends that do not need it. Can we introduce a feature flag "execute_supports_multiple_statements"?

comment:44 by Claude Paroz, 10 years ago

In my tentative pull request linked in comment:31, I used the approach suggested by Russell in comment:19 (a custom backend method).

comment:45 by Aymeric Augustin, 10 years ago

The fix #22401 was reverted and it was then closed as a duplicate.

comment:46 by Aymeric Augustin, 10 years ago

Oracle seems to choke on \n:

DatabaseError: ORA-00911: invalid character

comment:47 by Aymeric Augustin, 10 years ago

Triage Stage: AcceptedReady for checkin

I think this is good to go: https://github.com/django/django/pull/2608

I ran the "test_run_sql" test on all four core database backends with and without sqlparse. It's skipped in three cases and passes in five cases.

comment:48 by Aymeric Augustin, 10 years ago

Has patch: set

comment:49 by Claude Paroz, 10 years ago

Thanks Aymeric, nice patch!

There are however test failures in initial_sql_regress, I had to apply this:

--- a/tests/initial_sql_regress/tests.py
+++ b/tests/initial_sql_regress/tests.py
@@ -27,7 +27,10 @@ class InitialSQLTests(TestCase):
         """
         connection = connections[DEFAULT_DB_ALIAS]
         custom_sql = custom_sql_for_model(Simple, no_style(), connection)
-        self.assertEqual(len(custom_sql), 9)
+        if connection.features.requires_sqlparse_for_splitting:
+            self.assertEqual(len(custom_sql), 9)
+        else:
+            self.assertEqual(len(custom_sql), 1)
         with connection.cursor() as cursor:
             for sql in custom_sql:
                 cursor.execute(sql)

Apart from that, I think it's RFC.

comment:50 by Aymeric Augustin, 10 years ago

Thanks a lot for the review!

I don't think the assert on the number on statements is useful. I'll just remove it.

What matters is that each statement creates an object and that 9 objects are created.

comment:51 by Aymeric Augustin <aymeric.augustin@…>, 10 years ago

Resolution: fixed
Status: newclosed

In 8b5b199e20ad2d8d3e91873ce0cd5d3035e05ece:

Fixed #3214 -- Stopped parsing SQL with regex.

Avoided introducing a new regex-based SQL splitter in the migrations
framework, before we're bound by backwards compatibility.

Adapted this change to the legacy "initial SQL data" feature, even
though it's already deprecated, in order to facilitate the transition
to migrations.

sqlparse becomes mandatory for RunSQL on some databases (all but
PostgreSQL). There's no API to provide a single statement and tell
Django not to attempt splitting. Since we have a more robust splitting
implementation, that seems like a good tradeoff. It's easier to add a
new keyword argument later if necessary than to remove one.

Many people contributed to both tickets, thank you all, and especially
Claude for the review.

Refs #22401.

comment:52 by Aymeric Augustin <aymeric.augustin@…>, 10 years ago

In 3bb0f118ca375f25cd0c03a5733ee2ef9d79dfa5:

[1.7.x] Fixed #3214 -- Stopped parsing SQL with regex.

Avoided introducing a new regex-based SQL splitter in the migrations
framework, before we're bound by backwards compatibility.

Adapted this change to the legacy "initial SQL data" feature, even
though it's already deprecated, in order to facilitate the transition
to migrations.

sqlparse becomes mandatory for RunSQL on some databases (all but
PostgreSQL). There's no API to provide a single statement and tell
Django not to attempt splitting. Since we have a more robust splitting
implementation, that seems like a good tradeoff. It's easier to add a
new keyword argument later if necessary than to remove one.

Many people contributed to both tickets, thank you all, and especially
Claude for the review.

Refs #22401.

Backport of 8b5b199 from master

Note: See TracTickets for help on using tickets.
Back to Top