Code

Opened 7 years ago

Closed 16 months ago

Last modified 16 months ago

#3485 closed Bug (fixed)

Initial SQL fails when data contains a % in it when settings.DEBUG = True

Reported by: lakin@… Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: me@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

We had some initial sql data that contains %'s in strings exported from a postgresql database. When a fellow developer tried to start the application, not all of the data was properly inserted as the mysql driver interpreted the % symbols as a directive for it to process and they failed.

Attachments (6)

db_do_not_format_string.diff (864 bytes) - added by Craig Ogg <cogg@…> 7 years ago.
Patch
__init__.py (7 bytes) - added by Craig Ogg <cogg@…> 7 years ago.
Should go in trunk/tests/modeltests/initial_sql/
models.py (932 bytes) - added by Craig Ogg <cogg@…> 7 years ago.
Should go in trunk/tests/modeltests/initial_sql/
simple.sql (214 bytes) - added by Craig Ogg <cogg@…> 7 years ago.
Should go in trunk/tests/modeltests/initial_sql/sql/
initial_sql_regress_models.diff (1.4 KB) - added by Craig Ogg <cogg@…> 7 years ago.
Additional test added to trunk/tests/regressiontests/initial_sql_regress/models.py
simple2.sql (241 bytes) - added by Craig Ogg <cogg@…> 7 years ago.
Should be in trunk/tests/regressiontests/initial_sql_regress/sql

Download all attachments as: .zip

Change History (30)

comment:1 Changed 7 years ago by Gary Wilson <gary.wilson@…>

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Resolution set to invalid
  • Status changed from new to closed

Possibly a quoting issue? What was used to export the SQL from the postgres database? Was it output by Django? Could you post an example? Until we have more info, closing this ticket as invalid.

comment:2 Changed 7 years ago by lakin.wecker@…

  • Resolution invalid deleted
  • Status changed from closed to reopened

From my understanding of SQL, it's not a quoting issue. The initialdata is in raw sql format. The postgres statement is:

INSERT INTO cms_contentblock (id, page_id, name, body) VALUES (4, 3, 'email_format', 'Question From: %(name)s <%(email)s>

%(question)s

');

This query works fine from the MySQL command line utility or through a couple of different mysql administration tools.
In order for Django to properly put it into MySQL we had to change the query to:

INSERT INTO cms_contentblock (id, page_id, name, body) VALUES (4, 3, 'email_format', 'Question From: %%(name)s <%%(email)s>

%%(question)s

');

comment:3 Changed 7 years ago by Simon G. <dev@…>

  • Triage Stage changed from Unreviewed to Accepted

#4801 was marked as a duplicate

Changed 7 years ago by Craig Ogg <cogg@…>

Patch

comment:4 Changed 7 years ago by Craig Ogg <cogg@…>

  • Component changed from Uncategorized to Database wrapper
  • Has patch set
  • Summary changed from Initial SQL fails with MySQLdb driver when data contains a % in it. to Initial SQL fails when data contains a % in it.

Tickets #4465 and #3537 are duplicates (and I will mark as such). Modified Summary to make it clear that this applies to all backends (at least, it was reported against MySQLdb driver and I am seeing it with postgresql_psycopg2).

The patch I submitted is based on the code in #4465 but modified to accommodate the recent unicode changes.

comment:5 Changed 7 years ago by Simon G. <dev@…>

  • Triage Stage changed from Accepted to Ready for checkin

comment:6 follow-up: Changed 7 years ago by Simon G. <dev@…>

Looks good Craig - but we might need some tests.

comment:7 in reply to: ↑ 6 Changed 7 years ago by Craig Ogg <cogg@…>

  • Summary changed from Initial SQL fails when data contains a % in it. to Initial SQL fails when data contains a % in it when settings.DEBUG = True

I have modified the subject to show this is a problem recording the SQL query in debug mode and not a problem with the SQL execution itself. I have written tests for this, but am not quite sure how to make them available, so I will attach them to the ticket. There was no existing test for initial SQL loading except as a regression, so I added a new app for it.

Put the attached files in the following locations on the trunk:

tests\modeltests\initial_sql (new directory)

init.py
models.py

tests\modeltests\initial_sql\sql

simple.sql

I realize in retrospect that actual problem doesn't have anything to do with the initial SQL mechanism (other than that being a convenient way to trigger the problem), it is instead a bug with the CursorDebugWrapper which has much broader applicability. I am afraid that I don't have enough understanding of Django to know if this is a complete fix (IOW, there may be other places where literal %s in SQL will cause Django to fail). I am sure the fix is safe though: the worst effect it could have is to leave a %% in a debug copy of SQL string where a % should be).

The nature of this test may mean it should be in the regression area. Not sure what the convention is.

Note: CursorDebugWrapper.executemany() seems to have the same problem from code inspection. Not sure under what conditions (if any) it would be triggered.

Changed 7 years ago by Craig Ogg <cogg@…>

Should go in trunk/tests/modeltests/initial_sql/

Changed 7 years ago by Craig Ogg <cogg@…>

Should go in trunk/tests/modeltests/initial_sql/

Changed 7 years ago by Craig Ogg <cogg@…>

Should go in trunk/tests/modeltests/initial_sql/sql/

comment:8 follow-up: Changed 7 years ago by mtredinnick

  • Triage Stage changed from Ready for checkin to Accepted

I don't think I like this patch, whereby the quoting behaviour is different depending upon whether or not parameters are present. Consistent quoting behaviour is a good idea. I realise there's the contradiction that print '%s' and print '%s' % 'foo' behave differently in Python, but it's harder to tell if there are going to be parameters included in the SQL query or not (particularly when using, e.g., the extra() method on a QuerySet).

We should probably try and move the tests under regressionstests to where the existing initial data tests are, for performance reasons. There's no real reason it needs a new directory and every new directory adds extra time to test runs (another setup and teardown iteration).

Moving back to "accepted" whilst I think about this a bit more. I probably want to just add extra quoting for initial SQL (which is more likely to be expected to be raw SQL) and leave the cursor behaviour as it is now -- always expecting parameters and you have to escape '%' markers yourself.

comment:9 in reply to: ↑ 8 Changed 7 years ago by Craig Ogg <cogg@…>

Replying to mtredinnick:

I don't think I like this patch, whereby the quoting behaviour is different depending upon whether or not parameters are present. Consistent quoting behaviour is a good idea. I realise there's the contradiction that print '%s' and print '%s' % 'foo' behave differently in Python, but it's harder to tell if there are going to be parameters included in the SQL query or not (particularly when using, e.g., the extra() method on a QuerySet).

After writing the test and thinking about this more, it is obvious what the correction solution is: the placeholder behavior for the debug SQL needs to match that of the SQL driver. This isn't true for the existing CursorDebugWrapper and isn't true for the patch (though the patch is incrementally closer). I will try to take a look at the source for one of the SQL drivers to discover what the behavior is (I couldn't find a good spec, would be happy to follow one if someone can point me at one).

We should probably try and move the tests under regressionstests to where the existing initial data tests are, for performance reasons. There's no real reason it needs a new directory and every new directory adds extra time to test runs (another setup and teardown iteration).

Makes sense.

Moving back to "accepted" whilst I think about this a bit more. I probably want to just add extra quoting for initial SQL (which is more likely to be expected to be raw SQL) and leave the cursor behaviour as it is now -- always expecting parameters and you have to escape '%' markers yourself.

Extra quoting for initial SQL won't work. You'll find that if you escape the % you will end up with %% in your db. The root cause of this bug is that the database drivers' handling of placeholders and print '%s' % 'foo' are different.

comment:10 Changed 7 years ago by Craig Ogg <cogg@…>

On further investigation, I believe the patch is correct. From the source for psycopg2, formatting is only applied if there are parameters. See cursor_type.c:

/* here we are, and we have a sequence or a dictionary filled with
   objects to be substituted (bound variables). we try to be smart and do
   the right thing (i.e., what the user expects) */
    
if (vars && vars != Py_None)
{
   if(_mogrify(vars, operation, self->conn, &cvt) == -1) {
       Py_XDECREF(uoperation);
       return 0;
   }
}

Similar code is in psycopg2 executemany, so a similar change really should be made to the appropriate method for it as well. I haven't made that change because I am not sure on what the policy here is on fixing bugs discovered through code inspection.

I'll update the test code so it can live in the regression directory with the other initial SQL test.

Changed 7 years ago by Craig Ogg <cogg@…>

Additional test added to trunk/tests/regressiontests/initial_sql_regress/models.py

Changed 7 years ago by Craig Ogg <cogg@…>

Should be in trunk/tests/regressiontests/initial_sql_regress/sql

comment:11 Changed 7 years ago by mtredinnick

Doing what the backend cursors do is the right thing if they all behave the same way. I still need to check that behaviour, but assuming they are consistent, I'm happy with this fix.

If various wrappers behave in different ways, we need to think this through carefully, because we try to be as database-neutral as possible (although sometimes differences occur).

comment:12 Changed 7 years ago by Craig Ogg <cogg@…>

MySQLdb does, according to the cursors.py Python source for parameter handling:

if args is not None:
   query = query % connection.literal(args)

I don't have the energy to check the rest right now, but will point out that the unit test I wrote will fail against a backend if its parameter handling is different. It inserts a string with literal percents in it and verifies that they come back intact. Parameter handling behavior is considered back-end specific according to the spec, though it has a footnote on the behavior that says:

The term "bound" refers to the process of binding an input
value to a database execution buffer. In practical terms,
this means that the input value is directly used as a value
in the operation. The client should not be required to
"escape" the value so that it can be used -- the value
should be equal to the actual database value.

This is not clear enough to me to be sure it is speaking of the case we are talking about now.

comment:13 Changed 7 years ago by Craig Ogg <cogg@…>

I have verified that cx_Oracle also doesn't do translation if there are no parameters, from Cursor.c:

perform binds
if (executeArgs && Cursor_SetBindVariables(self, executeArgs, 1, 0) < 0)

return NULL;

Later in the code, Cursor_PerformBind passes through with success if there are no bind variables set. I attempted to verify pysqlite, but was confident in my understanding of the code from a brief pass through.

comment:14 Changed 7 years ago by cogg

Ticket summary:

  • Code in trunk will crash if settings.DEBUG is true and there is initial SQL with a '%' in it when it goes to log the query.
  • Patch only does string substitution (string % args) for query logging if there are query arguments passed in.
  • Actual data stored in DB is correct, this only affects DB logging.

mtredinnick wanted to be sure this match db backend behavior and wanted tests moved to regression tree.

  • Verified through code inspection that this is what cx_Oracle, psycopg2 and MySQLdb do as well.
  • Created tests in regressions (test files dated 8/12/2007 are correct files)

comment:15 Changed 7 years ago by durdinator

  • Cc me@… added

comment:16 Changed 5 years ago by arifamirani

This still continues to affect in the trunk. Is there a workaround or resolution?

comment:17 Changed 4 years ago by adamnelson

  • Patch needs improvement set

comment:18 Changed 3 years ago by lrekucki

  • Severity set to Normal
  • Type set to Bug

comment:19 Changed 3 years ago by julien

See some other issues relating to the use of the % character and escaping: #6343, #12268, #11391, #13648. Perhaps some of these could be tackled at the same time.

comment:20 Changed 2 years ago by aaugustin

  • UI/UX unset

Change UI/UX from NULL to False.

comment:21 Changed 2 years ago by aaugustin

  • Easy pickings unset

Change Easy pickings from NULL to False.

comment:22 Changed 16 months ago by aaugustin

  • Status changed from reopened to new

comment:23 Changed 16 months ago by claudep

  • Resolution set to fixed
  • Status changed from new to closed

I think this issue has been fixed in the same time as #9055 [76aecfbc4b49f5a]. I will commit the proposed test anyway.

comment:24 Changed 16 months ago by Claude Paroz <claude@…>

In 34a50e99e80295aa0a8970a775f53c62fe220c5a:

Added regression test for custom SQL containing percents

Refs #3485.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.