Opened 4 years ago
Closed 4 years ago
#32556 closed Bug (fixed)
assertHTMLEqual gives a confusing error message with empty attributes
Reported by: | Baptiste Mispelon | Owned by: | Baptiste Mispelon |
---|---|---|---|
Component: | Testing framework | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
If you try the following assertion:
self.assertHTMLEqual('<input value>', '<input value="">')
You get a test failure and the following message:
AssertionError: <input value> != <input value> <input value>
I'm getting mixed signals here: either the test should pass or the error message should show the difference between the two strings.
I'm not actually sure which option would be correct in this case so I'm leaving the ticket as "uncategorized" instead of "bug" or "cleanup".
Change History (16)
comment:1 by , 4 years ago
Description: | modified (diff) |
---|
comment:2 by , 4 years ago
comment:3 by , 4 years ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Type: | Uncategorized → Bug |
Thanks for this report.
A boolean attribute without a value assigned to it (e.g. checked) is implicitly equivalent to one that has the empty string assigned to it (i.e. checked="")
I think we should stick to this, without checking if an attribute is really a boolean attribute or not according to spec. assertHTMLEqual()
already allows invalid HTML (see #32547). I suggest checking if the values are truthy:
diff --git a/django/test/html.py b/django/test/html.py index 486a0d358d..d443c118e2 100644 --- a/django/test/html.py +++ b/django/test/html.py @@ -64,9 +64,9 @@ class Element: for i in range(len(self.attributes)): attr, value = self.attributes[i] other_attr, other_value = element.attributes[i] - if value is None: + if not value: value = attr - if other_value is None: + if not other_value: other_value = other_attr if attr != other_attr or value != other_value: return False
Would you like to provide a patch?
comment:5 by , 4 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:7 by , 4 years ago
Resolution: | fixed |
---|---|
Status: | closed → new |
Oh, that was fast. Thanks for the quick turnaround on this, but if I may I'd like to reopen this ticket and argue for a revert.
This test was passing before and now fails (both assertions fail):
def test_boolean_attribute2(self): html1 = '<input value="">' html2 = '<input value="value">' self.assertHTMLNotEqual(html1, html2) self.assertNotEqual(parse_html(html1), parse_html(html2))
I've been trying to think about how I would fix the original issue but I keep coming back to having an exhaustive list of known boolean attributes (the parser already has SELF_CLOSING_TAGS
so there's some precedent). I don't see how else to fix this correctly.
On the other hand I'm not opposed to having an incorrect parser with known limitations but I think the limitations should be documented.
I also think that having __eq__
and __str__
use separate codepaths will lead to more issues. Why can't we have def __eq__(self, other): return str(self) == str(other)
?
comment:8 by , 4 years ago
Triage Stage: | Ready for checkin → Accepted |
---|
I've been trying to think about how I would fix the original issue but I keep coming back to having an exhaustive list of known boolean attributes (the parser already has SELF_CLOSING_TAGS so there's some precedent). I don't see how else to fix this correctly.
Can you prepare such a list? (based on spec). Maybe: allowfullscreen
, async
, autofocus
, autoplay
, checked
, controls
, default
, defer
, disabled
, formnovalidate
, hidden
, ismap
, itemscope
, loop
, multiple
, muted
, nomodule
, novalidate
, open
, playsinline
, readonly
, required
, reversed
, selected
(see https://html.spec.whatwg.org/#attributes-3).
comment:9 by , 4 years ago
Using the table you linked to, I get the same list of 24 attributes. The table is labelled as "non-normative" but I think it would be good enough for this usecase.
I tried various other ways of extracting the list from the spec but this table seems like the best method. It's missing truespeed
which is a boolean attribute for the <marquee>
tag (which is deprecated) so I think it's acceptable.
For reference, here's the script I used to get the list (piped to sort -u
to remove duplicates):
from urllib.request import urlopen from lxml import html # needs to be pip-installed (`pip install lxml`) if __name__ == '__main__': # The spec is an 11Mb file, downloading it takes a while response = urlopen('https://html.spec.whatwg.org') tree = html.parse(response) for i, row in enumerate(tree.xpath('.//table[@id="attributes-1"]/tbody/tr')): assert len(row) == 4, f"Expected 4 columns on row {i}, got {len(rows)}" attr, elements, description, value = (node.text_content().strip() for node in row) if value == "Boolean attribute": print(attr)
comment:10 by , 4 years ago
I wrote a new PR using this approach:
- declaring a list of known boolean attributes
- using a single logic between
__str__
and__eq__
(which is now a one-liner)
While writing the patch I noticed that the previous behavior is sortof documented, as the documentation for assertHTMLEqual
[1] says:
Attributes without an argument are equal to attributes that equal in name and value (see the examples).
But the example only uses checked
which is a boolean attribute so I'd argue it's not a breaking change.
[1] https://docs.djangoproject.com/en/dev/topics/testing/tools/#django.test.SimpleTestCase.assertHTMLEqual
comment:11 by , 4 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:12 by , 4 years ago
Patch needs improvement: | set |
---|
comment:14 by , 4 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
Digging into this, the issue appears to be that
django.test.html.Element
has a different logic in its__eq__
vs__str__
.In particular,
__eq__
[1] has this comment:Whereas
__str__
will simply drop the value if it's false-ish.But I'm not sure that's a correct interpretation of the HTML spec on boolean attributes [2].
From what I understand not all HTML attributes can be used without a value, but I can't seem to find an exhaustive list. A random comment on the internet suggests going through the complete HTML5 spec [3] and looking for the phrase "is a boolean".
With Django's current approach, the two elements
<input value>
and<input value="value">
would be considered equivalent but I'm not sure that's correct.[1] https://github.com/django/django/blob/54d91795408c878ad335226a87a44961d7f646a9/django/test/html.py#L61
[2] https://www.w3.org/TR/html52/infrastructure.html#sec-boolean-attributes
[3] https://html.spec.whatwg.org/