Opened 15 years ago
Closed 13 years ago
#11911 closed Bug (fixed)
urlizetrunc not taking into account last ')' of a link
Reported by: | Stefan_Petrea | Owned by: | Aymeric Augustin |
---|---|---|---|
Component: | Template system | Version: | dev |
Severity: | Normal | Keywords: | urlizetrunc |
Cc: | Triage Stage: | Design decision needed | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
Hi,
I'm using Django 1.1 , the release version.
The urlizetrunc filter is causing some problems , for example the following link
http://en.wikipedia.org/wiki/Sibiu(Romania)
is rendered as <a href="http://en.wikipedia.org/wiki/Sibiu%28Romania" rel="nofollow">http://en.wikipedia.org/wik...</a>)
(please notice the trailing ')' )
This is not correct , it should take the ')' inside the link.
(As a work-around I'm adding # to my links, which seems to work).
Best regards,
Stefan
Attachments (4)
Change History (13)
comment:1 by , 15 years ago
Summary: | [BUG] urlizetrunc not taking into account last ')' of a link (example given) → urlizetrunc not taking into account last ')' of a link |
---|---|
Triage Stage: | Unreviewed → Accepted |
by , 15 years ago
Attachment: | 11911-and-12183-tests.patch added |
---|
by , 15 years ago
Attachment: | 11911-and-12183.patch added |
---|
comment:2 by , 15 years ago
Has patch: | set |
---|
I have attached tests and a patch that fixes this bug as well as #12183.
comment:3 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → Bug |
by , 14 years ago
Attachment: | 11911.patch added |
---|
Split out just the part for this ticket. Added tests.
comment:4 by , 14 years ago
Easy pickings: | unset |
---|---|
milestone: | → 1.4 |
Version: | 1.1 → SVN |
This is still a problem in current SVN. I've updated the patch to apply cleanly, to be just for this ticket, and to have tests. All the unit tests pass.
comment:5 by , 13 years ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Accepted → Design decision needed |
UI/UX: | unset |
It seems than the real problem come from the definition of punctuation_re:
TRAILING_PUNCTUATION = ['.', ',', ')', '>', '\n', '>'] punctuation_re = re.compile('^(?P<lead>(?:%s)*)(?P<middle>.*?)(?P<trail>(?:%s)*)$' % \ ('|'.join([re.escape(x) for x in LEADING_PUNCTUATION]), '|'.join([re.escape(x) for x in TRAILING_PUNCTUATION])))
The ')' character is not interpreted as a part of the url but as an enclosing character.
The patch provided solve the parenthesis problem but do not solve other cases for other unreserved characters in the TRAILING_PUNCTUATION list.
Valid character accoding to RFC 1738 named as extra are "!" | "*" | "'" | "(" | ")" | ","
For example: http://en.wikipedia.org/wiki/Sibiu(Romania). which is the valid (notice the dot at the end of the url)
To completely solve this problem, I think an in-depth rewrite of the punctation_re should be done which should be able to distinguish between
http://en.wikipedia.org/wiki/Sibiu(Romania) and (http://en.wikipedia.org/wiki/Sibiu(Romania))
A design decision is needed.
comment:6 by , 13 years ago
Could you provide an example of a live url where this is a problem, with characters other than parenthesis?
comment:8 by , 13 years ago
Owner: | changed from | to
---|
by , 13 years ago
Attachment: | 11911-2.patch added |
---|
Joint tests for tickets #11911 and #12183