#36380 closed Bug (fixed)
SQL is formatted for logging regardless of whether it will be logged
Reported by: | Jacob Walls | Owned by: | Natalia Bidart |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Release blocker | Keywords: | debug-sql format_debug_sql |
Cc: | Tim Graham, Simon Charette, Jacob Walls, Mariusz Felisiak | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
I have an endpoint that generates several large SQL strings. It performs 10x worse (300ms -> 3000ms) on the main branch. Traced it to eager formatting of SQL.
To reproduce, print or throw inside format_debug_sql()
, and run a project without logging queries (i.e. without a django.db
entry in LOGGING
configured at the DEBUG
level).
-
django/db/backends/base/operations.py
diff --git a/django/db/backends/base/operations.py b/django/db/backends/base/operations.py index fea73bc1e4..4c106f59e3 100644
a b class BaseDatabaseOperations: 791 791 792 792 def format_debug_sql(self, sql): 793 793 # Hook for backends (e.g. NoSQL) to customize formatting. 794 print('eager formatting...') 794 795 return sqlparse.format(sql, reindent=True, keyword_case="upper")
Regression in d8f093908c504ae0dbc39d3f5231f7d7920dde37 (#35448).
We might want a benchmark in django-asv for views with long SQL strings?
This fix works for me; I can submit a patch with tests in the next few days:
-
django/db/backends/utils.py
diff --git a/django/db/backends/utils.py b/django/db/backends/utils.py index 568f510a67..ae810ffd12 100644
a b from hashlib import md5 10 10 from django.apps import apps 11 11 from django.db import NotSupportedError 12 12 from django.utils.dateparse import parse_time 13 from django.utils.functional import lazy 13 14 14 15 logger = logging.getLogger("django.db.backends") 15 16 … … class CursorDebugWrapper(CursorWrapper): 151 152 logger.debug( 152 153 "(%.3f) %s; args=%s; alias=%s", 153 154 duration, 154 self.db.ops.format_debug_sql(sql),155 lazy(self.db.ops.format_debug_sql)(sql), 155 156 params, 156 157 self.db.alias, 157 158 extra={
Change History (28)
comment:1 by , 4 months ago
comment:2 by , 4 months ago
Description: | modified (diff) |
---|---|
Triage Stage: | Unreviewed → Accepted |
I had some reservations that moving the formatting from the test runner to the CursorDebugWrapper
might cause some problem.
comment:3 by , 4 months ago
Have we considered moving the formatting logic to an actual logging formater instead?
I looked into this. Unless I'm missing something, it's not a good fit for our needs, because we don't want to actually change the DEFAULT_LOGGING
.
To get a logging.Formatter
working, I have this in Django:
-
django/db/backends/utils.py
diff --git a/django/db/backends/utils.py b/django/db/backends/utils.py index 568f510a67..fe5b9478ac 100644
a b class CursorDebugWrapper(CursorWrapper): 151 151 logger.debug( 152 152 "(%.3f) %s; args=%s; alias=%s", 153 153 duration, 154 s elf.db.ops.format_debug_sql(sql),154 sql, 155 155 params, 156 156 self.db.alias, 157 157 extra={ … … class CursorDebugWrapper(CursorWrapper): 159 159 "sql": sql, 160 160 "params": params, 161 161 "alias": self.db.alias, 162 "sql_format_function": self.db.ops.format_debug_sql, 162 163 }, 163 164 ) 164 165 -
django/utils/log.py
diff --git a/django/utils/log.py b/django/utils/log.py index a25b97a7d5..09bb7ace50 100644
a b def log_response( 255 255 exc_info=exception, 256 256 ) 257 257 response._has_been_logged = True 258 259 260 class DebugSQLFormatter(logging.Formatter): 261 def format(self, record): 262 record.sql = record.sql_format_function(record.sql) 263 return super().format(record)
And this in my project settings. I've started with a working example of query logging and commented out the changes I'd have to make on upgrade to keep the current behavior. (As far as I can tell, there is no way to "inherit" these definitions from Django.)
LOGGING = { "version": 1, "disable_existing_loggers": False, # "formatters": { # had to be added # "django.db.backends": { # "()": "django.utils.log.DebugSQLFormatter", # "format": "({duration:.3f}) {sql}; args={args}; alias={alias}", # "style": "{", # }, # }, "handlers": { 'file': { 'level': 'DEBUG', 'class': 'logging.FileHandler', 'filename': 'general.log', }, # "django.db.backends": { # had to be added # "class": "logging.FileHandler", # "filename": "general.log", # "formatter": "django.db.backends", # }, }, "loggers": { "django.db.backends": { "handlers": ["file"], # "handlers": ["django.db.backends"], # had to be adjusted "level": "DEBUG", }, }, }
Wouldn't the suggestion to use lazy()
be preferable to maintain the prior behavior without any impact on projects?
comment:4 by , 4 months ago
Has patch: | set |
---|
PR for adding the degree of laziness, but open to other ideas if I didn't evaluate the logging.Formatter
approach sufficiently.
comment:5 by , 3 months ago
Keywords: | format_debug_sq added |
---|---|
Patch needs improvement: | set |
As I just commented in the PR, I think I slightly prefer using logging.Logger.isEnabledFor to conditionally format the sql.
follow-up: 7 comment:6 by , 3 months ago
So the idea is:
if logger.isEnabledFor(logging.DEBUG): logger.debug(...)
It seems like it would work, but I guess I'm still concerned that SQL formatting shouldn't be part of CursorDebugWrapper
if it has the potential to slow down request processing by 10x in any situation. For me, 3283120cca5d5eba5c3619612d0de5ad49dcf054 was too much of a hack and the formatting of --debug-sql
output could simply be removed (i.e. revert d8f093908c504ae0dbc39d3f5231f7d7920dde37 too) unless an elegant solution is proposed.
comment:7 by , 3 months ago
Replying to Tim Graham:
[...] I'm still concerned that SQL formatting shouldn't be part of
CursorDebugWrapper
if it has the potential to slow down request processing by 10x in any situation.
I agree.
For me, 3283120cca5d5eba5c3619612d0de5ad49dcf054 was too much of a hack and the formatting of
--debug-sql
output could simply be removed (i.e. revert d8f093908c504ae0dbc39d3f5231f7d7920dde37 too) unless an elegant solution is proposed.
I think this is worth considering. Reading on the tickets and evaluating "lack of SQL formatting" vs. the complications that this change has brought up, I'm inclined to go this route. Would be interesting to evaluate if we could have a dedicated log formatter (following the suggestion from Simon) to be automatically configured and plugged in when running the tests with --debug-sql
. We could do this analysis in the context of #34111 (which would be reopened if reverted, unless we decide to wontfix
it).
I'll ponder on this, and open to read other's people thoughts.
comment:8 by , 3 months ago
comment:9 by , 3 months ago
Cc: | added |
---|
Tim, I'm now convinced that reverting the revnos you mentioned (plus ec8cbea91ed6362d43298caa43f3d868452f7efa) is a good path forward. I have investigated and proposed an approach for the path of isolating the log formatting in the log utils, always considering customizations (but not per backend, more like per project since it would require customizations in the logging dictConfig).
See alternate PR at https://github.com/django/django/pull/19508 (4 commits, 3 reverts but the alternate fix for #34111).
comment:10 by , 3 months ago
Requiring customizing logging per project doesn't seem ideal (users don't want to deal with it). Also, a project may use a mix and of SQL and Non-SQL backends. Could self.db.ops.format_debug_sql
be passed in extra
and used in QueryFormatter
?
There is still the behavior change (introduced in d8f093908c504ae0dbc39d3f5231f7d7920dde37) that SQL is formatted outside of running tests (with the proposal it's only if the user has configured debug logging), correct? I'm unsure if this is worth the risk of affecting users in some development environments. It would be interesting to know from Jacob exactly how large the SQL statements were that resulted in the 3 second response time.
comment:11 by , 3 months ago
I'm dynamically generating annotations that introspect EAV data from JSON fields. In the worst case, I found up to 400 alias clauses. (Yes, it's on my list to look into pruning unnecessary annotations, but--perhaps unbelievably--it hasn't been costing me enough performance in ordinary cases to move up my priority list). Here's a gist with the string. sqlparse
takes about 13s to format this on an apple silicon chip. (It also produces some strange indentation; I suppose I could raise this with the maintainers.)
comment:12 by , 3 months ago
Requiring customizing logging per project doesn't seem ideal (users don't want to deal with it).
What I like about Natalia's proposal is that since we would no longer format SQL by default outside tests, no customization is necessary to keep a working query logger configuration. Users with existing (custom or default) logging configs will just get the status quo behavior of unformatted SQL in their logs (good from a performance standpoint). Users that want formatted SQL in their logs have an upgrade path via the example formatters Natalia's PR provides: copy the new default config, and change the django.db.backends
logger from ERROR
to DEBUG
, and preserve any other custom handling). This way the "risk" is opt-in.
comment:13 by , 3 months ago
Keywords: | format_debug_sql added; format_debug_sq removed |
---|---|
Owner: | changed from | to
comment:14 by , 3 months ago
Spin off PR based on Tim's comments is at https://github.com/django/django/pull/19509/ (needs tests and docs if we settle on the core solution).
comment:15 by , 3 months ago
Needs documentation: | set |
---|---|
Needs tests: | set |
comment:16 by , 3 months ago
Needs documentation: | unset |
---|---|
Needs tests: | unset |
Patch needs improvement: | unset |
Final attempt at https://github.com/django/django/pull/19512 (I think this is a winner).
comment:17 by , 3 months ago
Patch needs improvement: | set |
---|
comment:18 by , 3 months ago
Thanks for reporting this, Tim Jacob (and Tim for triaging the ticket)! I did notice this when I made my repro for #36143.
You can see that with Django's main
branch, my assertNumQueries
test took more than 5 mins. On Natalia's PR branch for this ticket, it only took less than a second.
Here's a minimal diff to demonstrate the issue on Django's main
branch:
-
tests/bulk_create/tests.py
diff --git a/tests/bulk_create/tests.py b/tests/bulk_create/tests.py index d590a292de..0c9aa65aeb 100644
a b class BulkCreateTests(TestCase): 295 295 296 296 @skipUnlessDBFeature("has_bulk_insert") 297 297 def test_max_batch_size(self): 298 objs = [Country(name=f"Country {i}") for i in range( 1000)]298 objs = [Country(name=f"Country {i}") for i in range(2**12)] 299 299 fields = ["name", "iso_two_letter", "description"] 300 300 max_batch_size = connection.ops.bulk_batch_size(fields, objs) 301 301 with self.assertNumQueries(ceil(len(objs) / max_batch_size)):
Using SQLite, the test took more than 18s on my machine, and less than 0.1s with Natalia's PR branch.
comment:19 by , 3 months ago
(Just in case others were planning to: I've reported the various issues with performance and indentation of the large SQL strings at play here to the maintainer of sqlparse
.)
comment:20 by , 3 months ago
Needs tests: | set |
---|---|
Patch needs improvement: | unset |
comment:22 by , 3 months ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:23 by , 3 months ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Ready for checkin → Accepted |
comment:24 by , 3 months ago
Patch needs improvement: | unset |
---|
comment:25 by , 3 months ago
Needs tests: | set |
---|---|
Patch needs improvement: | set |
comment:26 by , 3 months ago
Needs tests: | unset |
---|---|
Patch needs improvement: | unset |
Triage Stage: | Accepted → Ready for checkin |
Have we considered moving the formatting logic to an actual logging formater instead? I believe these only kick in when the log is actually handled?