Opened 7 years ago

Closed 7 years ago

#18977 closed Bug (duplicate)

bad removal of comments in custom_sql_for_model

Reported by: trott@… Owned by: nobody
Component: Uncategorized Version: 1.4
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


When providing initial data for models as SQL files, the function "custom_sql_for_model" in django/core/management/
will cut away comments (good) ... and also valid string data (not good).


INSERT INTO myapp_mymodel (somecolumn) VALUES ('Joung--at--Heart');

will result in:

INSERT INTO myapp_mymodel (somecolumn) VALUES ('Joung;

to be executed by the database .. which in turn complains about unterminated strings.

The source of this dilemma is in line 173 of

                statement = re.sub(ur"--.*([\n\Z]|$)", "", statement)

Possible solution; force the regex to the linestart (using the caret):

                statement = re.sub(ur"^--.*([\n\Z]|$)", "", statement)

That should catch almost all comments (and the others should not bother us...)

Change History (3)

comment:1 Changed 7 years ago by Anssi Kääriäinen

I believe we are removing the comments so that we do not try to execute comment-only strings by cursor.execute(). Could we remove just those comments and leave the rest in for this ticket.

There is a deeper problem with the raw SQL files. We just can't reliably parse the SQL into statements. We will need an SQL parser for each version of each database vendor if we want to get this correct...

I think we really need a better solution for initial (schema) SQL.

comment:2 Changed 7 years ago by anonymous

Beside this, I'd like to add some comments:

a.) the removal of comments should be re-considered. I think, it's not Django job to do this.

b.) even the split info separate statements at ";<newline>" boundaries can result into broken up string values.

c.) please recognize, that the SQL statements of ALL SQL files are collected and re-arranged by this function.
This can result into a really large list and memory waste. It would be better to consider a Python generator based approach.
(In my case the size of all SQL files is about 200MBytes and takes 5 minutes to be loaded&executed).

comment:3 Changed 7 years ago by Claude Paroz

Resolution: duplicate
Status: newclosed

In #4680 (recently fixed), we improved sql parsing and probably fixed the issue mentioned in the description (not to say that there is room for improvements in custom SQL loading).

Note: See TracTickets for help on using tickets.
Back to Top