Opened 14 years ago
Closed 14 years ago
#16066 closed Uncategorized (wontfix)
fix_ampersands does not convert abbreviations followed by a semi-colon
| Reported by: | Jerry | Owned by: | nobody |
|---|---|---|---|
| Component: | Uncategorized | Version: | 1.3 |
| Severity: | Normal | Keywords: | ampersands fix_ampersands html.py |
| Cc: | Triage Stage: | Unreviewed | |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
In django/utils/html.py, unencoded_ampersands_re will not convert ampersands if they are followed by at least one alphabetical character and a semicolon. There are no named entities with only a single character, but abbreviations of that form are common in some circles: D&D and R&D for example.
Each issue has adventures designed for early D&D; there’s the beginnings of a megadungeon in issue 2, “The Darkness Beneath”, and a lot of weirdness.
List of Our Mission in R&D; 1: Foster the creation of new business. 2: Create and accumulate advanced technologies. 3: Extend our value chain globally. 4: Fulfill our social responsibilities.
Assuming that it is safe to encode what look like one-character entities, the \w+ can be changed to \w{2,}.
There are no one-character entities listed on http://www.w3.org/TR/WD-html40-970708/sgml/entities.html; or on the less-canonical http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references and http://www.w3schools.com/tags/ref_entities.asp.
If it isn't safe to assume that there will not be one-character entities, a note in the documentation (http://docs.djangoproject.com/en/dev/ref/templates/builtins/) will probably be useful. (It may be useful even if this patch does make sense, in case someone tries to use longer abbreviations, such as F&SF or AT&SF and follow them by a semi-colon.
Attachments (1)
Change History (2)
by , 14 years ago
| Attachment: | ampersands.diff added |
|---|
comment:1 by , 14 years ago
| Resolution: | → wontfix |
|---|---|
| Status: | new → closed |
fix_ampersands does have some limitations in that it is not smart enough to distinguish real named entities from non-real ones. But only making it work with one-character words would really just be equivalent to fixing one particular symptom without addressing the core limitations. In this case, what you want to use is in fact django.utils.html.escape.
Change unencoded_ampersands_re to encode &n; as &n; for D&D, R&D, etc.