Opened 5 months ago
Last modified 5 months ago
#35728 assigned Cleanup/optimization
Lazily compute assertion messages
Reported by: | Adam Johnson | Owned by: | Rish |
---|---|---|---|
Component: | Testing framework | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Accepted | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Pull Requests: | How to create a pull request | ||
Description ¶
Django’s custom test assertions have some rich failure messages. Typically, assertions pass, so most computation of these messages is wasted.
This overhead is quite noticeable for messages based on large strings. For example, assertContains
calls repr()
on the whole response content, typically thousands of bytes.
To mitigate this, I propose that all custom assert methods pass lazy-computing objects to unittest’s msg
arguments. unittest’s _formatMessage() effectively calls str()
on the given object when displaying, so this should work well.
To measure the overhead, I created this test script, named benchmark.py
:
import unittest from django.conf import settings from django.http import HttpResponse from django.test import SimpleTestCase if not settings.configured: settings.configure() class ExampleTests(SimpleTestCase): def test_example(self): response = HttpResponse("Apple\n" * 1_000) for _ in range(100_000): self.assertContains(response, "Apple") if __name__ == '__main__': unittest.main(module='benchmark')
I ran it under cProfile with:
$ python -m cProfile -o profile -m benchmark
I tried it without and with the below patch, which disables the custom message for assertContains
:
diff --git django/test/testcases.py django/test/testcases.py index cd7e7b45d6..de86cb55ec 100644 --- django/test/testcases.py +++ django/test/testcases.py @@ -580,13 +580,13 @@ def _assert_contains(self, response, text, status_code, msg_prefix, html): content = b"".join(response.streaming_content) else: content = response.content - content_repr = safe_repr(content) + content_repr = "" if not isinstance(text, bytes) or html: text = str(text) content = content.decode(response.charset) - text_repr = "'%s'" % text + text_repr = "" else: - text_repr = repr(text) + text_repr = "" if html: content = assert_and_parse_html( self, content, None, "Response's content is not valid HTML:" @@ -623,10 +623,10 @@ def assertContains( else: self.assertTrue( real_count != 0, - ( - f"{msg_prefix}Couldn't find {text_repr} in the following response\n" - f"{content_repr}" - ), + # ( + # f"{msg_prefix}Couldn't find {text_repr} in the following response\n" + # f"{content_repr}" + # ), ) def assertNotContains(
Here are the stats.
Before:
- 2,724,770 function calls in 2.280 seconds
- 2.119 seconds spent in
assertContains
calls
After:
- 2,524,805 function calls in 1.117 seconds
- 0.949 seconds spent in
assertContains
calls
It looks like ~50% of the cost of calling assertContains
is forming the message.
According to the ticket's flags, the next step(s) to move this issue forward are:
- To provide a patch by sending a pull request. Claim the ticket when you start working so that someone else doesn't duplicate effort. Before sending a pull request, review your work against the patch review checklist. Check the "Has patch" flag on the ticket after sending a pull request and include a link to the pull request in the ticket comment when making that update. The usual format is:
[https://github.com/django/django/pull/#### PR]
.
Change History (4)
comment:1 by , 5 months ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 5 months ago
Nice benchmarking idea. Looks like we may be able to trim the overhead further.
For reference, assertIn
avoids the overhead by performing the test first:
I think we should be able to adopt that pattern too. No need for a lazy string, really.
Accepting following the above with the note that we should, ideally, be mindful of potential subclassing of Django's TestCase when designing how the laziness will be incorported to ensure backwards compatibility.
Sure, but I don’t think we don’t need to support, say, an overridden _assert_contains
, because that’s the private method under the public assertContains
.
comment:4 by , 5 months ago
Owner: | set to |
---|---|
Status: | new → assigned |
Thank you Adam for this ticket and analysis, I agree with your rationale.
I performed a simpler test: I added the proposed testcase in an existing
tests.py
file, and ran with the usualpython -Wall manage.py test testapp.tests.ExampleTests
. Then, I replacedassertContains
with a call similar toself.assertIn(text, response.content.decode())
. My timings are:ExampleTests
runs in about 4 secondsExampleTests
runs in around a second (0.950s)Accepting following the above with the note that we should, ideally, be mindful of potential subclassing of Django's TestCase when designing how the laziness will be incorported to ensure backwards compatibility.