#18977 closed Bug (duplicate)

bad removal of comments in custom_sql_for_model

Reported by: trott@… Owned by: nobody
Component: Uncategorized Version: 1.4
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


When providing initial data for models as SQL files, the function "custom_sql_for_model" in django/core/management/
will cut away comments (good) ... and also valid string data (not good).


INSERT INTO myapp_mymodel (somecolumn) VALUES ('Joung--at--Heart');

will result in:

INSERT INTO myapp_mymodel (somecolumn) VALUES ('Joung;

to be executed by the database .. which in turn complains about unterminated strings.

The source of this dilemma is in line 173 of

                statement = re.sub(ur"--.*([\n\Z]|$)", "", statement)

Possible solution; force the regex to the linestart (using the caret):

                statement = re.sub(ur"^--.*([\n\Z]|$)", "", statement)

That should catch almost all comments (and the others should not bother us...)

Attachments (0)

Change History (3)

comment:1 Changed 19 months ago by akaariai

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

I believe we are removing the comments so that we do not try to execute comment-only strings by cursor.execute(). Could we remove just those comments and leave the rest in for this ticket.

There is a deeper problem with the raw SQL files. We just can't reliably parse the SQL into statements. We will need an SQL parser for each version of each database vendor if we want to get this correct...

I think we really need a better solution for initial (schema) SQL.

comment:2 Changed 19 months ago by anonymous

Beside this, I'd like to add some comments:

a.) the removal of comments should be re-considered. I think, it's not Django job to do this.

b.) even the split info separate statements at ";<newline>" boundaries can result into broken up string values.

c.) please recognize, that the SQL statements of ALL SQL files are collected and re-arranged by this function.
This can result into a really large list and memory waste. It would be better to consider a Python generator based approach.
(In my case the size of all SQL files is about 200MBytes and takes 5 minutes to be loaded&executed).

comment:3 Changed 19 months ago by claudep

  • Resolution set to duplicate
  • Status changed from new to closed

In #4680 (recently fixed), we improved sql parsing and probably fixed the issue mentioned in the description (not to say that there is room for improvements in custom SQL loading).

Add Comment

Modify Ticket

Change Properties
<Author field>
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.