Opened 2 months ago

Last modified 2 months ago

#35728 assigned Cleanup/optimization

Lazily compute assertion messages

Reported by: Adam Johnson Owned by: Rish
Component: Testing framework Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Django’s custom test assertions have some rich failure messages. Typically, assertions pass, so most computation of these messages is wasted.

This overhead is quite noticeable for messages based on large strings. For example, assertContains calls repr() on the whole response content, typically thousands of bytes.

To mitigate this, I propose that all custom assert methods pass lazy-computing objects to unittest’s msg arguments. unittest’s _formatMessage() effectively calls str() on the given object when displaying, so this should work well.

To measure the overhead, I created this test script, named benchmark.py:

import unittest

from django.conf import settings
from django.http import HttpResponse
from django.test import SimpleTestCase

if not settings.configured:
    settings.configure()


class ExampleTests(SimpleTestCase):
    def test_example(self):
        response = HttpResponse("Apple\n" * 1_000)
        for _ in range(100_000):
            self.assertContains(response, "Apple")


if __name__ == '__main__':
    unittest.main(module='benchmark')

I ran it under cProfile with:

$ python -m cProfile -o profile -m benchmark

I tried it without and with the below patch, which disables the custom message for assertContains:

diff --git django/test/testcases.py django/test/testcases.py
index cd7e7b45d6..de86cb55ec 100644
--- django/test/testcases.py
+++ django/test/testcases.py
@@ -580,13 +580,13 @@ def _assert_contains(self, response, text, status_code, msg_prefix, html):
             content = b"".join(response.streaming_content)
         else:
             content = response.content
-        content_repr = safe_repr(content)
+        content_repr = ""
         if not isinstance(text, bytes) or html:
             text = str(text)
             content = content.decode(response.charset)
-            text_repr = "'%s'" % text
+            text_repr = ""
         else:
-            text_repr = repr(text)
+            text_repr = ""
         if html:
             content = assert_and_parse_html(
                 self, content, None, "Response's content is not valid HTML:"
@@ -623,10 +623,10 @@ def assertContains(
         else:
             self.assertTrue(
                 real_count != 0,
-                (
-                    f"{msg_prefix}Couldn't find {text_repr} in the following response\n"
-                    f"{content_repr}"
-                ),
+                # (
+                #     f"{msg_prefix}Couldn't find {text_repr} in the following response\n"
+                #     f"{content_repr}"
+                # ),
             )
 
     def assertNotContains(

Here are the stats.

Before:

  • 2,724,770 function calls in 2.280 seconds
  • 2.119 seconds spent in assertContains calls

After:

  • 2,524,805 function calls in 1.117 seconds
  • 0.949 seconds spent in assertContains calls

It looks like ~50% of the cost of calling assertContains is forming the message.

Change History (4)

comment:1 by Natalia Bidart, 2 months ago

Triage Stage: UnreviewedAccepted

Thank you Adam for this ticket and analysis, I agree with your rationale.

I performed a simpler test: I added the proposed testcase in an existing tests.py file, and ran with the usual python -Wall manage.py test testapp.tests.ExampleTests. Then, I replaced assertContains with a call similar to self.assertIn(text, response.content.decode()). My timings are:

  • without the replacement, ExampleTests runs in about 4 seconds
  • with the replacement, ExampleTests runs in around a second (0.950s)

Accepting following the above with the note that we should, ideally, be mindful of potential subclassing of Django's TestCase when designing how the laziness will be incorported to ensure backwards compatibility.

comment:2 by Adam Johnson, 2 months ago

Nice benchmarking idea. Looks like we may be able to trim the overhead further.

For reference, assertIn avoids the overhead by performing the test first:

https://github.com/python/cpython/blob/fbb26f067a7a3cd6dc6eed31cce12892cc0fedbb/Lib/unittest/case.py#L1178-L1183

I think we should be able to adopt that pattern too. No need for a lazy string, really.

Accepting following the above with the note that we should, ideally, be mindful of potential subclassing of Django's TestCase when designing how the laziness will be incorported to ensure backwards compatibility.

Sure, but I don’t think we don’t need to support, say, an overridden _assert_contains, because that’s the private method under the public assertContains.

comment:3 by Ahmed Ibrahim, 2 months ago

What's the status of this ticket? can I help in any way?

comment:4 by Rish, 2 months ago

Owner: set to Rish
Status: newassigned
Note: See TracTickets for help on using tickets.
Back to Top