Opened 3 years ago

Closed 3 years ago

#32556 closed Bug (fixed)

assertHTMLEqual gives a confusing error message with empty attributes

Reported by: Baptiste Mispelon Owned by: Baptiste Mispelon
Component: Testing framework Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Baptiste Mispelon)

If you try the following assertion:

self.assertHTMLEqual('<input value>', '<input value="">')

You get a test failure and the following message:

AssertionError: <input value> != <input value>
  <input value>

I'm getting mixed signals here: either the test should pass or the error message should show the difference between the two strings.

I'm not actually sure which option would be correct in this case so I'm leaving the ticket as "uncategorized" instead of "bug" or "cleanup".

Change History (16)

comment:1 by Baptiste Mispelon, 3 years ago

Description: modified (diff)

comment:2 by Baptiste Mispelon, 3 years ago

Digging into this, the issue appears to be that django.test.html.Element has a different logic in its __eq__ vs __str__.

In particular, __eq__ [1] has this comment:

# attributes without a value is same as attribute with value that
# equals the attributes name:
# <input checked> == <input checked="checked">

Whereas __str__ will simply drop the value if it's false-ish.

But I'm not sure that's a correct interpretation of the HTML spec on boolean attributes [2].
From what I understand not all HTML attributes can be used without a value, but I can't seem to find an exhaustive list. A random comment on the internet suggests going through the complete HTML5 spec [3] and looking for the phrase "is a boolean".

With Django's current approach, the two elements <input value> and <input value="value"> would be considered equivalent but I'm not sure that's correct.

[1] https://github.com/django/django/blob/54d91795408c878ad335226a87a44961d7f646a9/django/test/html.py#L61
[2] https://www.w3.org/TR/html52/infrastructure.html#sec-boolean-attributes
[3] https://html.spec.whatwg.org/

comment:3 by Mariusz Felisiak, 3 years ago

Triage Stage: UnreviewedAccepted
Type: UncategorizedBug

Thanks for this report.

A boolean attribute without a value assigned to it (e.g. checked) is implicitly equivalent to one that has the empty string assigned to it (i.e. checked="")

I think we should stick to this, without checking if an attribute is really a boolean attribute or not according to spec. assertHTMLEqual() already allows invalid HTML (see #32547). I suggest checking if the values are truthy:

diff --git a/django/test/html.py b/django/test/html.py
index 486a0d358d..d443c118e2 100644
--- a/django/test/html.py
+++ b/django/test/html.py
@@ -64,9 +64,9 @@ class Element:
             for i in range(len(self.attributes)):
                 attr, value = self.attributes[i]
                 other_attr, other_value = element.attributes[i]
-                if value is None:
+                if not value:
                     value = attr
-                if other_value is None:
+                if not other_value:
                     other_value = other_attr
                 if attr != other_attr or value != other_value:
                     return False

Would you like to provide a patch?

comment:4 by Hasan Ramezani, 3 years ago

Has patch: set
Owner: changed from nobody to Hasan Ramezani
Status: newassigned

comment:5 by Mariusz Felisiak, 3 years ago

Triage Stage: AcceptedReady for checkin

comment:6 by Mariusz Felisiak <felisiak.mariusz@…>, 3 years ago

Resolution: fixed
Status: assignedclosed

In 9bf5e941:

Fixed #32556 -- Fixed assertHTMLEqual() to handle empty string as boolean attributes value.

comment:7 by Baptiste Mispelon, 3 years ago

Resolution: fixed
Status: closednew

Oh, that was fast. Thanks for the quick turnaround on this, but if I may I'd like to reopen this ticket and argue for a revert.

This test was passing before and now fails (both assertions fail):

def test_boolean_attribute2(self):
    html1 = '<input value="">'
    html2 = '<input value="value">'
    self.assertHTMLNotEqual(html1, html2)
    self.assertNotEqual(parse_html(html1), parse_html(html2))

I've been trying to think about how I would fix the original issue but I keep coming back to having an exhaustive list of known boolean attributes (the parser already has SELF_CLOSING_TAGS so there's some precedent). I don't see how else to fix this correctly.

On the other hand I'm not opposed to having an incorrect parser with known limitations but I think the limitations should be documented.

I also think that having __eq__ and __str__ use separate codepaths will lead to more issues. Why can't we have def __eq__(self, other): return str(self) == str(other)?

comment:8 by Mariusz Felisiak, 3 years ago

Triage Stage: Ready for checkinAccepted

I've been trying to think about how I would fix the original issue but I keep coming back to having an exhaustive list of known boolean attributes (the parser already has SELF_CLOSING_TAGS so there's some precedent). I don't see how else to fix this correctly.

Can you prepare such a list? (based on spec). Maybe: allowfullscreen, async, autofocus, autoplay, checked, controls, default, defer , disabled, formnovalidate, hidden, ismap, itemscope, loop, multiple, muted, nomodule, novalidate, open, playsinline, readonly, required, reversed, selected (see https://html.spec.whatwg.org/#attributes-3).

comment:9 by Baptiste Mispelon, 3 years ago

Using the table you linked to, I get the same list of 24 attributes. The table is labelled as "non-normative" but I think it would be good enough for this usecase.

I tried various other ways of extracting the list from the spec but this table seems like the best method. It's missing truespeed which is a boolean attribute for the <marquee> tag (which is deprecated) so I think it's acceptable.

For reference, here's the script I used to get the list (piped to sort -u to remove duplicates):

from urllib.request import urlopen

from lxml import html  # needs to be pip-installed (`pip install lxml`)

if __name__ == '__main__':
    # The spec is an 11Mb file, downloading it takes a while
    response = urlopen('https://html.spec.whatwg.org')
    tree = html.parse(response)

    for i, row in enumerate(tree.xpath('.//table[@id="attributes-1"]/tbody/tr')):
        assert len(row) == 4, f"Expected 4 columns on row {i}, got {len(rows)}"
        attr, elements, description, value = (node.text_content().strip() for node in row)
        if value == "Boolean attribute":
            print(attr)

comment:10 by Baptiste Mispelon, 3 years ago

I wrote a new PR using this approach:

  • declaring a list of known boolean attributes
  • using a single logic between __str__ and __eq__ (which is now a one-liner)

While writing the patch I noticed that the previous behavior is sortof documented, as the documentation for assertHTMLEqual [1] says:

Attributes without an argument are equal to attributes that equal in name and value (see the examples).

But the example only uses checked which is a boolean attribute so I'd argue it's not a breaking change.

[1] https://docs.djangoproject.com/en/dev/topics/testing/tools/#django.test.SimpleTestCase.assertHTMLEqual

comment:11 by Mariusz Felisiak, 3 years ago

Owner: changed from Hasan Ramezani to Baptiste Mispelon
Status: newassigned

comment:12 by Mariusz Felisiak, 3 years ago

Patch needs improvement: set

comment:13 by Baptiste Mispelon, 3 years ago

Patch needs improvement: unset

PR updated

comment:14 by Mariusz Felisiak, 3 years ago

Triage Stage: AcceptedReady for checkin

comment:15 by Mariusz Felisiak <felisiak.mariusz@…>, 3 years ago

In 98abf80c:

Refs #32556 -- Added django.test.html.normalize_attributes().

comment:16 by Mariusz Felisiak <felisiak.mariusz@…>, 3 years ago

Resolution: fixed
Status: assignedclosed

In 41e6b2a3:

Fixed #32556 -- Fixed handling empty string as non-boolean attributes value by assertHTMLEqual().

Note: See TracTickets for help on using tickets.
Back to Top