Opened 12 months ago

Closed 7 weeks ago

#35728 closed Cleanup/optimization (fixed)

Lazily compute assertion messages

Reported by: Adam Johnson Owned by: Clifford Gama
Component: Testing framework Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Django’s custom test assertions have some rich failure messages. Typically, assertions pass, so most computation of these messages is wasted.

This overhead is quite noticeable for messages based on large strings. For example, assertContains calls repr() on the whole response content, typically thousands of bytes.

To mitigate this, I propose that all custom assert methods pass lazy-computing objects to unittest’s msg arguments. unittest’s _formatMessage() effectively calls str() on the given object when displaying, so this should work well.

To measure the overhead, I created this test script, named benchmark.py:

import unittest

from django.conf import settings
from django.http import HttpResponse
from django.test import SimpleTestCase

if not settings.configured:
    settings.configure()


class ExampleTests(SimpleTestCase):
    def test_example(self):
        response = HttpResponse("Apple\n" * 1_000)
        for _ in range(100_000):
            self.assertContains(response, "Apple")


if __name__ == '__main__':
    unittest.main(module='benchmark')

I ran it under cProfile with:

$ python -m cProfile -o profile -m benchmark

I tried it without and with the below patch, which disables the custom message for assertContains:

diff --git django/test/testcases.py django/test/testcases.py
index cd7e7b45d6..de86cb55ec 100644
--- django/test/testcases.py
+++ django/test/testcases.py
@@ -580,13 +580,13 @@ def _assert_contains(self, response, text, status_code, msg_prefix, html):
             content = b"".join(response.streaming_content)
         else:
             content = response.content
-        content_repr = safe_repr(content)
+        content_repr = ""
         if not isinstance(text, bytes) or html:
             text = str(text)
             content = content.decode(response.charset)
-            text_repr = "'%s'" % text
+            text_repr = ""
         else:
-            text_repr = repr(text)
+            text_repr = ""
         if html:
             content = assert_and_parse_html(
                 self, content, None, "Response's content is not valid HTML:"
@@ -623,10 +623,10 @@ def assertContains(
         else:
             self.assertTrue(
                 real_count != 0,
-                (
-                    f"{msg_prefix}Couldn't find {text_repr} in the following response\n"
-                    f"{content_repr}"
-                ),
+                # (
+                #     f"{msg_prefix}Couldn't find {text_repr} in the following response\n"
+                #     f"{content_repr}"
+                # ),
             )
 
     def assertNotContains(

Here are the stats.

Before:

  • 2,724,770 function calls in 2.280 seconds
  • 2.119 seconds spent in assertContains calls

After:

  • 2,524,805 function calls in 1.117 seconds
  • 0.949 seconds spent in assertContains calls

It looks like ~50% of the cost of calling assertContains is forming the message.

Change History (13)

comment:1 by Natalia Bidart, 12 months ago

Triage Stage: UnreviewedAccepted

Thank you Adam for this ticket and analysis, I agree with your rationale.

I performed a simpler test: I added the proposed testcase in an existing tests.py file, and ran with the usual python -Wall manage.py test testapp.tests.ExampleTests. Then, I replaced assertContains with a call similar to self.assertIn(text, response.content.decode()). My timings are:

  • without the replacement, ExampleTests runs in about 4 seconds
  • with the replacement, ExampleTests runs in around a second (0.950s)

Accepting following the above with the note that we should, ideally, be mindful of potential subclassing of Django's TestCase when designing how the laziness will be incorported to ensure backwards compatibility.

comment:2 by Adam Johnson, 12 months ago

Nice benchmarking idea. Looks like we may be able to trim the overhead further.

For reference, assertIn avoids the overhead by performing the test first:

https://github.com/python/cpython/blob/fbb26f067a7a3cd6dc6eed31cce12892cc0fedbb/Lib/unittest/case.py#L1178-L1183

I think we should be able to adopt that pattern too. No need for a lazy string, really.

Accepting following the above with the note that we should, ideally, be mindful of potential subclassing of Django's TestCase when designing how the laziness will be incorported to ensure backwards compatibility.

Sure, but I don’t think we don’t need to support, say, an overridden _assert_contains, because that’s the private method under the public assertContains.

comment:3 by Ahmed Ibrahim, 12 months ago

What's the status of this ticket? can I help in any way?

comment:4 by Rish, 12 months ago

Owner: set to Rish
Status: newassigned

comment:5 by Clifford Gama, 5 months ago

Owner: changed from Rish to Clifford Gama

comment:6 by Clifford Gama, 5 months ago

Has patch: set

comment:7 by Clifford Gama, 4 months ago

Patch needs improvement: set

comment:8 by Clifford Gama, 3 months ago

Patch needs improvement: unset

comment:9 by Sarah Boyce, 3 months ago

Note that this is a performance regression in 1dae65dc63ae84be5002c37b4ddae0b9220e8808 (Django 5.1) refs #34657

comment:10 by Sarah Boyce, 3 months ago

Patch needs improvement: set

comment:11 by Clifford Gama, 7 weeks ago

Patch needs improvement: unset

comment:12 by Sarah Boyce, 7 weeks ago

Triage Stage: AcceptedReady for checkin

comment:13 by Sarah Boyce <42296566+sarahboyce@…>, 7 weeks ago

Resolution: fixed
Status: assignedclosed

In 449b9f9:

Fixed #35728 -- Computed error messages in assertions only on test failures.

Performance regression in 1dae65dc63ae84be5002c37b4ddae0b9220e8808.

Thanks to Adam Johnson for the report.

Note: See TracTickets for help on using tickets.
Back to Top