Opened 9 years ago

Closed 16 months ago

Last modified 16 months ago

#3214 closed Bug (fixed)

[patch] raw sql file doesn't recognize quotes correctly

Reported by: shaunc <shaun@…> Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Release blocker Keywords: rawsql migrations
Cc: shaun@…, sam@…, andrehcampos@…, aseering@…, walter+django@…, vlastimil.zima@…, Xof, sbarre, vlastimil@…, 4glitch@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Suppose I have an app foo with a model Bar.
in foo/sql/bar.postgresql_psycopg2.sql I have:

CREATE OR REPLACE FUNCTION test() RETURNS trigger LANGUAGE plpythonu AS $$
  '''this will have problems if it contains an embedded ;
  '''
$$;

This generates the following error on syncdb:

sql /* cranedata.db.search.sql.searchmodel.postgresql_psycopg2
 *
 * Setup for 'SearchModel':
 *   want to have a trigger on SearchModels which sets up a trigger
 *   on models searched, such that when rows of these models are
 *   modified, text fields are automatically indexed.
 */

sql CREATE OR REPLACE FUNCTION test() RETURNS trigger LANGUAGE plpythonu AS $$
  '''this will have problems if it contains an embedded ;
params ()
Failed to install initial SQL data for search.SearchModel model: unterminated dollar-quoted string a
t or near "$$
  '''this will have problems if it contains an embedded ;" at character 355

Attachments (2)

sql-statement-iterator.diff (6.0 KB) - added by shaunc <shaun@…> 9 years ago.
patch, introduces 'sql-statement-iterator' which correctly deals with quotes
sql-statement-iterator-updated.diff (5.9 KB) - added by aseering@… 7 years ago.
Updated version of shaunc's patch, that works with Django SVN as of r8255

Download all attachments as: .zip

Change History (54)

comment:1 Changed 9 years ago by shaunc <shaun@…>

IMHO the sql file should be just fed all at once to a cursor, rather than django attempting to parse it and feeding in pieces.

The python DB API 2.0 doc is a bit hazy on this point, but:

>>> cursor.execute( "create table a ( id int ); insert into a ( id ) values ( 1 ); select * from a;" )
>>> cursor.fetchall()
[(1,)]

seems to show that multiple statements count as one "operation", at least using postgresql_psycopg2

  • Shaun

comment:2 Changed 9 years ago by shaunc <shaun@…>

The problem is the following, in django.core.management:341:

    # Some backends can't execute more than one SQL statement at a time,
    # so split into separate statements.
    statements = re.compile(r";[ \t]*$", re.M)

This may be a fine workaround if the backend really doesn't deal with multiple statements, but it breaks things that have ';' in quotes. Easiest fix would be for the db to support a property: "SUPPORTS_MULTIPLE_STATEMENTS_IN_EXECUTE" or whatever, and only split if necessary.

Changed 9 years ago by shaunc <shaun@…>

patch, introduces 'sql-statement-iterator' which correctly deals with quotes

comment:3 Changed 9 years ago by shaunc <shaun@…>

  • Has patch set

I've attached a patch for this bug: with patch management.py will use utility "sql_statement_iterator" which splits sql into statements more carefully, respecting quotes and comments.

Also, it will change freestanding '%' to double -- otherwise these cause an error in management.py, as python DB2.0 spec treats all non-doubled percent signs as substitution targets, which isn't wanted in this situation.

With this fix I'm able to use pg_dump (with appropriate flags to use 'insert' rather than 'copy') to dump old tables, then cut and paste resulting inserts (and set sequence) into initial data.

utils/sqltools.py, where "sql_statement_iterator" lives, has doctests. Ideally I would have written unit tests for both this and new management, but, ... well maybe I'll get to it after I deploy next edition of our website :).

comment:4 Changed 9 years ago by Simon G. <dev@…>

  • Keywords rawsql added
  • Needs tests set
  • Summary changed from raw sql file doesn't recognize quotes correctly to [patch] raw sql file doesn't recognize quotes correctly
  • Triage Stage changed from Unreviewed to Design decision needed

comment:5 Changed 9 years ago by shaunc <shaun@…>

  • Cc shaun@… added

I understand this is been relegated to Tumbolia for the moment. If it ends up "accepted", please do ping me, and if I have any time at all, I'll write whatever unit tests are required. (Note: already includes doctest of sql_statement_iterator itself.)

comment:6 Changed 8 years ago by anonymous

  • Cc sam@… added

comment:7 Changed 8 years ago by Sam Morris <sam@…>

As a workaround, multi-line SQL statements have to have someting other than whitespace between their semicolons and newline characters. For example:

 CREATE OR REPLACE FUNCTION create_post () RETURNS trigger AS $$
 BEGIN
  NEW.date_created := current_timestamp; --
  RETURN NEW; --
 END;
$$ LANGUAGE plpgsql;

comment:8 Changed 8 years ago by jacob

  • Version set to SVN

comment:9 Changed 8 years ago by shaunc <shaun@…>

Note that the patch requires a slight reorg. to work now that management.py has been changed....

Again, if the powers that be accept this, I'll do my best to get it into shape, as I'm still using it myself.

comment:10 Changed 7 years ago by anonymous

  • Cc andrehcampos@… added

Changed 7 years ago by aseering@…

Updated version of shaunc's patch, that works with Django SVN as of r8255

comment:11 Changed 7 years ago by aseering@…

  • Cc aseering@… added

This bug currently also causes multiline strings to fail to parse, at least for me with PostgreSQL 8.3. This means that Sam Morris's workaround (above) doesn't work for me; as far as I know, there's no raw-SQL way of inserting stored procedures into a database? I could use a Signals hook; but that doesn't feel particularly DRY to me...

I've submitted an updated patch that works for me with Django SVN. I'm not yet very familiar with the new management-module layout; let me know if I did something silly.

I'm quite interested in seeing this bug resolved; I'm also glad to do some coding if it would help.

Django devs: It seems that there's the will and expertise to fix this bug. Is there any chance someone could spare a few moments to make a decision on it? (or, does anyone know of an alternate workaround?) Thanks; I'd certainly appreciate it!

comment:12 Changed 7 years ago by mtredinnick

  • Resolution set to wontfix
  • Status changed from new to closed

Due to variations between all the different SQL variants accepted by the backends we support (and others provide externally), as well as the complexity of SQL (and we aren't going to include a full SQL parser), it's really chasing rainbows to try and make complex stuff work here.

I think this is a situation we just aren't going to handle directly, since the hooks now exist to do it on a case-by-case basis in client code. If you want to pass in complex SQL, catch the post-syncdb signal and work directly with a django.db.backend.cursor. You will know precisely what database you're talking to and can use the appropriate syntax and calling mechanisms.

comment:13 Changed 7 years ago by shaun cutts <shaun@…>

Grumble grumble...

IMO, it is perfectly reasonable to say "its not our business to understand SQL". But then you shouldn't be trying to break the SQL down into statements. Rather, pass it through as a chunk. If this causes problems, then -- if you want to -- patch (hack) it in the backends that have problems. Or users of those back ends should get their db engines fixed.

-- Shaun Cutts

comment:14 follow-up: Changed 7 years ago by mtredinnick

Yes, if the solution was the simple we would be doing it. Not all database wrappers permit multiple statements in one call. So the current stuff works for them and your stuff is possible. Win for everybody.

comment:15 follow-up: Changed 7 years ago by aseering@…

Curious: Does anyone watching this ticket, know offhand which DB backends/engines actually don't support multiple statements? I tend to agree with Shaun that the best long-term fix is to add support into the lower-level code.

Also, mtredinnick: Shaun's solution sounds pretty simple to me. Has it been tried? If not, would you consider accepting a patch that implements it?

comment:16 in reply to: ↑ 14 Changed 7 years ago by shaun cutts <shaun@…>

Replying to mtredinnick:

Yes, if the solution was the simple we would be doing it. Not all database wrappers permit multiple statements in one call. So the current stuff works for them and your stuff is possible. Win for everybody.

What I meant to suggest is: do just what you're doing now, but only do it if the db wrapper doesn't support multiple statements. (seconding aseering's question) Would you consider a patch that did this?

comment:17 in reply to: ↑ 15 ; follow-up: Changed 6 years ago by akaihola

Replying to aseering@mit.edu:

Curious: Does anyone watching this ticket, know offhand which DB backends/engines actually don't support multiple statements? I tend to agree with Shaun that the best long-term fix is to add support into the lower-level code.

I tested PostgreSQL, SQLite and MySQL on Ubuntu 9.04, and SQLite was the only one not supporting multiple statements.

Would it make sense to add a

django.backends.*.DatabaseFeatures.accepts_multiple_statements

attribute and do the statements.split() dance only when needed?

comment:18 in reply to: ↑ 17 Changed 6 years ago by russellm

  • Resolution wontfix deleted
  • Status changed from closed to reopened
  • Triage Stage changed from Design decision needed to Accepted

Replying to akaihola:

Replying to aseering@mit.edu:

Curious: Does anyone watching this ticket, know offhand which DB backends/engines actually don't support multiple statements? I tend to agree with Shaun that the best long-term fix is to add support into the lower-level code.

I tested PostgreSQL, SQLite and MySQL on Ubuntu 9.04, and SQLite was the only one not supporting multiple statements.

Would it make sense to add a

django.backends.*.DatabaseFeatures.accepts_multiple_statements

attribute and do the statements.split() dance only when needed?

Reopening (and accepting) on the basis of this suggestion. This sounds workable to me, although it's the python binding version that is important, not the database version itself.

comment:19 follow-up: Changed 6 years ago by russellm

Caveat - I'm accepting the "run bulk SQL as a single block if possible" idea. I'd rather see a 'execute_bulk_sql' call on the DatabaseOperations class than a boolean flag. Making it an overridable method on the backend means that the regular expressions/splitting technique can be modified on a per-backend basis.

comment:20 in reply to: ↑ 19 Changed 6 years ago by shauncutts

Replying to russellm:

Souunds good to me... -- Thanks, Russell!

comment:21 Changed 6 years ago by wdoekes

  • Cc walter+django@… added

As a workaround, multi-line SQL statements have to have someting other than whitespace between their semicolons and newline characters. For example:

> CREATE OR REPLACE FUNCTION create_post () RETURNS trigger AS $$
>  BEGIN
>   NEW.date_created := current_timestamp; --
>   RETURN NEW; --
>  END;
> $$ LANGUAGE plpgsql;

Is that a bug or a feature?

I was about to file this patch because I was getting quite puzzling "Command out of sync error"s because I had a bit of comment after a statement:

--- django/core/management/sql.py.orig	2010-02-25 12:32:04.660976707 +0100
+++ django/core/management/sql.py	2010-02-25 12:38:06.240954572 +0100
@@ -174,7 +174,7 @@
 
     # Some backends can't execute more than one SQL statement at a time,
     # so split into separate statements.
-    statements = re.compile(r";[ \t]*$", re.M)
+    statements = re.compile(r";[ \t]*(?:--.*)?$", re.M)
 
     # Find custom SQL, if it's available.
     sql_files = [os.path.join(app_dir, "%s.%s.sql" % (opts.object_name.lower(), settings.DATABASE_ENGINE)),

But that fix will break your feature.

comment:22 Changed 4 years ago by vzima

  • Cc vlastimil.zima@… added

comment:23 Changed 4 years ago by lrekucki

  • Severity changed from normal to Normal
  • Type changed from defect to Bug

comment:24 Changed 4 years ago by aaugustin

  • UI/UX unset

Change UI/UX from NULL to False.

comment:25 Changed 4 years ago by aaugustin

  • Easy pickings unset

Change Easy pickings from NULL to False.

comment:26 Changed 3 years ago by Xof

  • Cc Xof added

comment:27 Changed 2 years ago by aaugustin

  • Status changed from reopened to new

comment:28 Changed 2 years ago by sbarre

  • Cc sbarre added

comment:29 Changed 2 years ago by ris

Can we get this fixed at some point please? It seems quite straightforward now and it causes daily pain to so many people.

comment:30 Changed 2 years ago by russellm

Sure - it will be fixed as soon as someone presents a patch. There's a call to action in Comment 18 and Comment 19, but nobody has presented a patch that implements that approach AFAICT from the rest of the ticket discussion.

If someone wants this, you're going to have to write the patch.

comment:31 Changed 2 years ago by claudep

  • Needs tests unset

comment:32 follow-up: Changed 2 years ago by claudep

Question raised by Mark on the PR: With the migrations support, are .sql files even needed any more?

comment:33 in reply to: ↑ 32 Changed 2 years ago by vzima

Replying to claudep:

Question raised by Mark on the PR: With the migrations support, are .sql files even needed any more?

I have not check the entire migration and south documentation, but what about custom restraints, triggers and functions?

comment:34 Changed 2 years ago by vzima

  • Cc vlastimil@… added

comment:35 Changed 2 years ago by Dmitri Bogomolov <4glitch@…>

  • Cc 4glitch@… added

comment:36 Changed 2 years ago by claudep

  • Patch needs improvement set

Repeating Andrew's answer on the pull request:
They should go away, there's a RunSQL operation you can use in migrations instead (that accepts either single statements or mutiple ones with its own internal splitter regex). I didn't get around to removing the support for them on syncdb-type apps, but they'll be ignored if you have migrations.

We should therefore test (sql like this) and document the new process replacing initial sql before closing this ticket (or close this ticket and open new one).

comment:37 Changed 2 years ago by timo

How will the migrations framework handle Database-backend-specific SQL data?

comment:38 Changed 2 years ago by andrewgodwin

Yep, this same bug will still apply to the splitter inside RunSQL (though I believe it's marginally better than the old Django one).

timo: Database-specific SQL data will only work with apps without migrations; those with migrations are expected to achieve the same result using a migration to load in the data, which is much more portable and forwards-compatible.

comment:39 Changed 17 months ago by aaugustin

At least parts of this must have been fixed in #22401, see 071c9337750b296d198cced56f3ffad0e176afb6.

comment:40 Changed 17 months ago by wdoekes

Assuming sqlparse does the job, this can indeed be closed. Thanks!

comment:41 Changed 17 months ago by aaugustin

  • Has patch unset
  • Keywords migrations added
  • Patch needs improvement unset
  • Severity changed from Normal to Release blocker

This ticket never tells which backend support multiple statements in cursor.execute() and which don't. It would be interesting to check that on the four officially supported databases before going any further...


If indeed some backends need splitting, in order to close this ticket, we must implement a similar approach as #22401 in RunSQL._split_sql.

Splitting occurs only when explicitly setting multiple=True, which Django itself never does, even in the test suite. We should add a test for that (and skip it when sqlparse isn't available.

The patch will be easier if we include it in 1.7 because we won't need a deprecation path. Therefore, marking as a release blocker.

comment:42 Changed 17 months ago by claudep

SQLite: needs splitting
PostgreSQL: does not need splitting
MySQL: does not need splitting, but has issues when some multiline SQL results are not read (cursor.nextset() has to be used to "consume" results).
Oracle: ?

comment:43 Changed 17 months ago by aaugustin

It would be interesting to avoid the dependency on sqlparse for backends that do not need it. Can we introduce a feature flag "execute_supports_multiple_statements"?

comment:44 Changed 17 months ago by claudep

In my tentative pull request linked in comment:31, I used the approach suggested by Russell in comment:19 (a custom backend method).

comment:45 Changed 16 months ago by aaugustin

The fix #22401 was reverted and it was then closed as a duplicate.

comment:46 Changed 16 months ago by aaugustin

Oracle seems to choke on \n:

DatabaseError: ORA-00911: invalid character

comment:47 Changed 16 months ago by aaugustin

  • Triage Stage changed from Accepted to Ready for checkin

I think this is good to go: https://github.com/django/django/pull/2608

I ran the "test_run_sql" test on all four core database backends with and without sqlparse. It's skipped in three cases and passes in five cases.

comment:48 Changed 16 months ago by aaugustin

  • Has patch set

comment:49 Changed 16 months ago by claudep

Thanks Aymeric, nice patch!

There are however test failures in initial_sql_regress, I had to apply this:

--- a/tests/initial_sql_regress/tests.py
+++ b/tests/initial_sql_regress/tests.py
@@ -27,7 +27,10 @@ class InitialSQLTests(TestCase):
         """
         connection = connections[DEFAULT_DB_ALIAS]
         custom_sql = custom_sql_for_model(Simple, no_style(), connection)
-        self.assertEqual(len(custom_sql), 9)
+        if connection.features.requires_sqlparse_for_splitting:
+            self.assertEqual(len(custom_sql), 9)
+        else:
+            self.assertEqual(len(custom_sql), 1)
         with connection.cursor() as cursor:
             for sql in custom_sql:
                 cursor.execute(sql)

Apart from that, I think it's RFC.

comment:50 Changed 16 months ago by aaugustin

Thanks a lot for the review!

I don't think the assert on the number on statements is useful. I'll just remove it.

What matters is that each statement creates an object and that 9 objects are created.

comment:51 Changed 16 months ago by Aymeric Augustin <aymeric.augustin@…>

  • Resolution set to fixed
  • Status changed from new to closed

In 8b5b199e20ad2d8d3e91873ce0cd5d3035e05ece:

Fixed #3214 -- Stopped parsing SQL with regex.

Avoided introducing a new regex-based SQL splitter in the migrations
framework, before we're bound by backwards compatibility.

Adapted this change to the legacy "initial SQL data" feature, even
though it's already deprecated, in order to facilitate the transition
to migrations.

sqlparse becomes mandatory for RunSQL on some databases (all but
PostgreSQL). There's no API to provide a single statement and tell
Django not to attempt splitting. Since we have a more robust splitting
implementation, that seems like a good tradeoff. It's easier to add a
new keyword argument later if necessary than to remove one.

Many people contributed to both tickets, thank you all, and especially
Claude for the review.

Refs #22401.

comment:52 Changed 16 months ago by Aymeric Augustin <aymeric.augustin@…>

In 3bb0f118ca375f25cd0c03a5733ee2ef9d79dfa5:

[1.7.x] Fixed #3214 -- Stopped parsing SQL with regex.

Avoided introducing a new regex-based SQL splitter in the migrations
framework, before we're bound by backwards compatibility.

Adapted this change to the legacy "initial SQL data" feature, even
though it's already deprecated, in order to facilitate the transition
to migrations.

sqlparse becomes mandatory for RunSQL on some databases (all but
PostgreSQL). There's no API to provide a single statement and tell
Django not to attempt splitting. Since we have a more robust splitting
implementation, that seems like a good tradeoff. It's easier to add a
new keyword argument later if necessary than to remove one.

Many people contributed to both tickets, thank you all, and especially
Claude for the review.

Refs #22401.

Backport of 8b5b199 from master

Note: See TracTickets for help on using tickets.
Back to Top