Opened 4 months ago

Closed 3 months ago

Last modified 3 months ago

#36380 closed Bug (fixed)

SQL is formatted for logging regardless of whether it will be logged

Reported by: Jacob Walls Owned by: Natalia Bidart
Component: Database layer (models, ORM) Version: dev
Severity: Release blocker Keywords: debug-sql format_debug_sql
Cc: Tim Graham, Simon Charette, Jacob Walls, Mariusz Felisiak Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Tim Graham)

I have an endpoint that generates several large SQL strings. It performs 10x worse (300ms -> 3000ms) on the main branch. Traced it to eager formatting of SQL.

To reproduce, print or throw inside format_debug_sql(), and run a project without logging queries (i.e. without a django.db entry in LOGGING configured at the DEBUG level).

  • django/db/backends/base/operations.py

    diff --git a/django/db/backends/base/operations.py b/django/db/backends/base/operations.py
    index fea73bc1e4..4c106f59e3 100644
    a b class BaseDatabaseOperations:  
    791791
    792792    def format_debug_sql(self, sql):
    793793        # Hook for backends (e.g. NoSQL) to customize formatting.
     794        print('eager formatting...')
    794795        return sqlparse.format(sql, reindent=True, keyword_case="upper")

Regression in d8f093908c504ae0dbc39d3f5231f7d7920dde37 (#35448).

We might want a benchmark in django-asv for views with long SQL strings?

This fix works for me; I can submit a patch with tests in the next few days:

  • django/db/backends/utils.py

    diff --git a/django/db/backends/utils.py b/django/db/backends/utils.py
    index 568f510a67..ae810ffd12 100644
    a b from hashlib import md5  
    1010from django.apps import apps
    1111from django.db import NotSupportedError
    1212from django.utils.dateparse import parse_time
     13from django.utils.functional import lazy
    1314
    1415logger = logging.getLogger("django.db.backends")
    1516
    class CursorDebugWrapper(CursorWrapper):  
    151152            logger.debug(
    152153                "(%.3f) %s; args=%s; alias=%s",
    153154                duration,
    154                 self.db.ops.format_debug_sql(sql),
     155                lazy(self.db.ops.format_debug_sql)(sql),
    155156                params,
    156157                self.db.alias,
    157158                extra={

Change History (28)

comment:1 by Simon Charette, 4 months ago

Have we considered moving the formatting logic to an actual logging formater instead? I believe these only kick in when the log is actually handled?

comment:2 by Tim Graham, 4 months ago

Description: modified (diff)
Triage Stage: UnreviewedAccepted

I had some reservations that moving the formatting from the test runner to the CursorDebugWrapper might cause some problem.

comment:3 by Jacob Walls, 4 months ago

Have we considered moving the formatting logic ​to an actual logging formater instead?

I looked into this. Unless I'm missing something, it's not a good fit for our needs, because we don't want to actually change the DEFAULT_LOGGING.

To get a logging.Formatter working, I have this in Django:

  • django/db/backends/utils.py

    diff --git a/django/db/backends/utils.py b/django/db/backends/utils.py
    index 568f510a67..fe5b9478ac 100644
    a b class CursorDebugWrapper(CursorWrapper):  
    151151            logger.debug(
    152152                "(%.3f) %s; args=%s; alias=%s",
    153153                duration,
    154                 self.db.ops.format_debug_sql(sql),
     154                sql,
    155155                params,
    156156                self.db.alias,
    157157                extra={
    class CursorDebugWrapper(CursorWrapper):  
    159159                    "sql": sql,
    160160                    "params": params,
    161161                    "alias": self.db.alias,
     162                    "sql_format_function": self.db.ops.format_debug_sql,
    162163                },
    163164            )
    164165
  • django/utils/log.py

    diff --git a/django/utils/log.py b/django/utils/log.py
    index a25b97a7d5..09bb7ace50 100644
    a b def log_response(  
    255255        exc_info=exception,
    256256    )
    257257    response._has_been_logged = True
     258
     259
     260class DebugSQLFormatter(logging.Formatter):
     261    def format(self, record):
     262        record.sql = record.sql_format_function(record.sql)
     263        return super().format(record)

And this in my project settings. I've started with a working example of query logging and commented out the changes I'd have to make on upgrade to keep the current behavior. (As far as I can tell, there is no way to "inherit" these definitions from Django.)

LOGGING = {
    "version": 1,
    "disable_existing_loggers": False,
    # "formatters": {  # had to be added
    #     "django.db.backends": {
    #         "()": "django.utils.log.DebugSQLFormatter",
    #         "format": "({duration:.3f}) {sql}; args={args}; alias={alias}",
    #         "style": "{",
    #     },
    # },
    "handlers": {
        'file': {
            'level': 'DEBUG',
            'class': 'logging.FileHandler',
            'filename': 'general.log',
        },
        # "django.db.backends": {  # had to be added
        #     "class": "logging.FileHandler",
        #     "filename": "general.log",
        #     "formatter": "django.db.backends",
        # },
    },
    "loggers": {
        "django.db.backends": {
            "handlers": ["file"],
            # "handlers": ["django.db.backends"],  # had to be adjusted
            "level": "DEBUG",
        },
    },
}

Wouldn't the suggestion to use lazy() be preferable to maintain the prior behavior without any impact on projects?

comment:4 by Jacob Walls, 4 months ago

Has patch: set

PR for adding the degree of laziness, but open to other ideas if I didn't evaluate the logging.Formatter approach sufficiently.

comment:5 by Natalia Bidart, 3 months ago

Keywords: format_debug_sq added
Patch needs improvement: set

As I just commented in the PR, I think I slightly prefer using logging.Logger.isEnabledFor to conditionally format the sql.

comment:6 by Tim Graham, 3 months ago

So the idea is:

if logger.isEnabledFor(logging.DEBUG):
    logger.debug(...)

It seems like it would work, but I guess I'm still concerned that SQL formatting shouldn't be part of CursorDebugWrapper if it has the potential to slow down request processing by 10x in any situation. For me, 3283120cca5d5eba5c3619612d0de5ad49dcf054 was too much of a hack and the formatting of --debug-sql output could simply be removed (i.e. revert d8f093908c504ae0dbc39d3f5231f7d7920dde37 too) unless an elegant solution is proposed.

in reply to:  6 comment:7 by Natalia Bidart, 3 months ago

Replying to Tim Graham:

[...] I'm still concerned that SQL formatting shouldn't be part of CursorDebugWrapper if it has the potential to slow down request processing by 10x in any situation.

I agree.

For me, 3283120cca5d5eba5c3619612d0de5ad49dcf054 was too much of a hack and the formatting of --debug-sql output could simply be removed (i.e. revert d8f093908c504ae0dbc39d3f5231f7d7920dde37 too) unless an elegant solution is proposed.

I think this is worth considering. Reading on the tickets and evaluating "lack of SQL formatting" vs. the complications that this change has brought up, I'm inclined to go this route. Would be interesting to evaluate if we could have a dedicated log formatter (following the suggestion from Simon) to be automatically configured and plugged in when running the tests with --debug-sql. We could do this analysis in the context of #34111 (which would be reopened if reverted, unless we decide to wontfix it).

I'll ponder on this, and open to read other's people thoughts.

comment:8 by Natalia Bidart, 3 months ago

It's also worth noting that #36112 is a side effect of the revisions mentioned in comment:6, so that one would also be solved by the proposed reverts.

comment:9 by Natalia Bidart, 3 months ago

Cc: Simon Charette Jacob Walls Mariusz Felisiak added

Tim, I'm now convinced that reverting the revnos you mentioned (plus ec8cbea91ed6362d43298caa43f3d868452f7efa) is a good path forward. I have investigated and proposed an approach for the path of isolating the log formatting in the log utils, always considering customizations (but not per backend, more like per project since it would require customizations in the logging dictConfig).

See alternate PR at https://github.com/django/django/pull/19508 (4 commits, 3 reverts but the alternate fix for #34111).

comment:10 by Tim Graham, 3 months ago

Requiring customizing logging per project doesn't seem ideal (users don't want to deal with it). Also, a project may use a mix and of SQL and Non-SQL backends. Could self.db.ops.format_debug_sql be passed in extra and used in QueryFormatter?

There is still the behavior change (introduced in d8f093908c504ae0dbc39d3f5231f7d7920dde37) that SQL is formatted outside of running tests (with the proposal it's only if the user has configured debug logging), correct? I'm unsure if this is worth the risk of affecting users in some development environments. It would be interesting to know from Jacob exactly how large the SQL statements were that resulted in the 3 second response time.

comment:11 by Jacob Walls, 3 months ago

I'm dynamically generating annotations that introspect EAV data from JSON fields. In the worst case, I found up to 400 alias clauses. (Yes, it's on my list to look into pruning unnecessary annotations, but--perhaps unbelievably--it hasn't been costing me enough performance in ordinary cases to move up my priority list). Here's a gist with the string. sqlparse takes about 13s to format this on an apple silicon chip. (It also produces some strange indentation; I suppose I could raise this with the maintainers.)

comment:12 by Jacob Walls, 3 months ago

Requiring customizing logging per project doesn't seem ideal (users don't want to deal with it).

What I like about Natalia's proposal is that since we would no longer format SQL by default outside tests, no customization is necessary to keep a working query logger configuration. Users with existing (custom or default) logging configs will just get the status quo behavior of unformatted SQL in their logs (good from a performance standpoint). Users that want formatted SQL in their logs have an upgrade path via the example formatters Natalia's PR provides: copy the new default config, and change the django.db.backends logger from ERROR to DEBUG, and preserve any other custom handling). This way the "risk" is opt-in.

Last edited 3 months ago by Jacob Walls (previous) (diff)

comment:13 by Jacob Walls, 3 months ago

Keywords: format_debug_sql added; format_debug_sq removed
Owner: changed from Jacob Walls to Natalia Bidart

comment:14 by Natalia Bidart, 3 months ago

Spin off PR based on Tim's comments is at https://github.com/django/django/pull/19509/ (needs tests and docs if we settle on the core solution).

comment:15 by Natalia Bidart, 3 months ago

Needs documentation: set
Needs tests: set

comment:16 by Natalia Bidart, 3 months ago

Needs documentation: unset
Needs tests: unset
Patch needs improvement: unset

Final attempt at https://github.com/django/django/pull/19512 (I think this is a winner).

comment:17 by Natalia Bidart, 3 months ago

Patch needs improvement: set

comment:18 by Sage Abdullah, 3 months ago

Thanks for reporting this, Tim Jacob (and Tim for triaging the ticket)! I did notice this when I made my repro for #36143.

You can see that with Django's main branch, my assertNumQueries test took more than 5 mins. On Natalia's PR branch for this ticket, it only took less than a second.

Here's a minimal diff to demonstrate the issue on Django's main branch:

  • tests/bulk_create/tests.py

    diff --git a/tests/bulk_create/tests.py b/tests/bulk_create/tests.py
    index d590a292de..0c9aa65aeb 100644
    a b class BulkCreateTests(TestCase):  
    295295
    296296    @skipUnlessDBFeature("has_bulk_insert")
    297297    def test_max_batch_size(self):
    298         objs = [Country(name=f"Country {i}") for i in range(1000)]
     298        objs = [Country(name=f"Country {i}") for i in range(2**12)]
    299299        fields = ["name", "iso_two_letter", "description"]
    300300        max_batch_size = connection.ops.bulk_batch_size(fields, objs)
    301301        with self.assertNumQueries(ceil(len(objs) / max_batch_size)):

Using SQLite, the test took more than 18s on my machine, and less than 0.1s with Natalia's PR branch.

Last edited 3 months ago by Sage Abdullah (previous) (diff)

comment:19 by Jacob Walls, 3 months ago

(Just in case others were planning to: I've reported the various issues with performance and indentation of the large SQL strings at play here to the maintainer of sqlparse.)

comment:20 by Natalia Bidart, 3 months ago

Needs tests: set
Patch needs improvement: unset

comment:21 by Natalia Bidart, 3 months ago

Needs tests: unset

PR ready for final review.

comment:22 by Sarah Boyce, 3 months ago

Triage Stage: AcceptedReady for checkin

comment:23 by Sarah Boyce, 3 months ago

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

comment:24 by Natalia Bidart, 3 months ago

Patch needs improvement: unset

comment:25 by Jacob Walls, 3 months ago

Needs tests: set
Patch needs improvement: set

comment:26 by Jacob Walls, 3 months ago

Needs tests: unset
Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:27 by nessita <124304+nessita@…>, 3 months ago

Resolution: fixed
Status: assignedclosed

In 1a03a984:

Fixed #36380 -- Deferred SQL formatting when running tests with --debug-sql.

Thanks to Jacob Walls for the report and previous iterations of this
fix, to Simon Charette for the logging formatter idea, and to Tim Graham
for testing and ensuring that 3rd party backends remain compatible.

This partially reverts d8f093908c504ae0dbc39d3f5231f7d7920dde37.

Refs #36112, #35448.

Co-authored-by: Jacob Walls <jacobtylerwalls@…>

comment:28 by nessita <124304+nessita@…>, 3 months ago

In d6e0c710:

Refs #36380 -- Replaced mocked logging assertions in tests/backends/tests.py.

Note: See TracTickets for help on using tickets.
Back to Top