Opened 13 years ago
Closed 13 years ago
#16066 closed Uncategorized (wontfix)
fix_ampersands does not convert abbreviations followed by a semi-colon
Reported by: | Jerry | Owned by: | nobody |
---|---|---|---|
Component: | Uncategorized | Version: | 1.3 |
Severity: | Normal | Keywords: | ampersands fix_ampersands html.py |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
In django/utils/html.py, unencoded_ampersands_re will not convert ampersands if they are followed by at least one alphabetical character and a semicolon. There are no named entities with only a single character, but abbreviations of that form are common in some circles: D&D and R&D for example.
Each issue has adventures designed for early D&D; there’s the beginnings of a megadungeon in issue 2, “The Darkness Beneath”, and a lot of weirdness.
List of Our Mission in R&D; 1: Foster the creation of new business. 2: Create and accumulate advanced technologies. 3: Extend our value chain globally. 4: Fulfill our social responsibilities.
Assuming that it is safe to encode what look like one-character entities, the \w+ can be changed to \w{2,}.
There are no one-character entities listed on http://www.w3.org/TR/WD-html40-970708/sgml/entities.html; or on the less-canonical http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references and http://www.w3schools.com/tags/ref_entities.asp.
If it isn't safe to assume that there will not be one-character entities, a note in the documentation (http://docs.djangoproject.com/en/dev/ref/templates/builtins/) will probably be useful. (It may be useful even if this patch does make sense, in case someone tries to use longer abbreviations, such as F&SF or AT&SF and follow them by a semi-colon.
Attachments (1)
Change History (2)
by , 13 years ago
Attachment: | ampersands.diff added |
---|
comment:1 by , 13 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
fix_ampersands
does have some limitations in that it is not smart enough to distinguish real named entities from non-real ones. But only making it work with one-character words would really just be equivalent to fixing one particular symptom without addressing the core limitations. In this case, what you want to use is in fact django.utils.html.escape
.
Change unencoded_ampersands_re to encode &n; as &n; for D&D, R&D, etc.