Opened 15 years ago
Closed 13 years ago
#11911 closed Bug (fixed)
urlizetrunc not taking into account last ')' of a link
Reported by: | Stefan_Petrea | Owned by: | Aymeric Augustin |
---|---|---|---|
Component: | Template system | Version: | dev |
Severity: | Normal | Keywords: | urlizetrunc |
Cc: | Triage Stage: | Design decision needed | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
Hi,
I'm using Django 1.1 , the release version.
The urlizetrunc filter is causing some problems , for example the following link
http://en.wikipedia.org/wiki/Sibiu(Romania)
is rendered as <a href="http://en.wikipedia.org/wiki/Sibiu%28Romania" rel="nofollow">http://en.wikipedia.org/wik...</a>)
(please notice the trailing ')' )
This is not correct , it should take the ')' inside the link.
(As a work-around I'm adding # to my links, which seems to work).
Best regards,
Stefan
Attachments (4)
Change History (13)
comment:1 by , 15 years ago
Summary: | [BUG] urlizetrunc not taking into account last ')' of a link (example given) → urlizetrunc not taking into account last ')' of a link |
---|---|
Triage Stage: | Unreviewed → Accepted |
by , 14 years ago
Attachment: | 11911-and-12183-tests.patch added |
---|
by , 14 years ago
Attachment: | 11911-and-12183.patch added |
---|
comment:2 by , 14 years ago
Has patch: | set |
---|
I have attached tests and a patch that fixes this bug as well as #12183.
comment:3 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → Bug |
by , 13 years ago
Attachment: | 11911.patch added |
---|
Split out just the part for this ticket. Added tests.
comment:4 by , 13 years ago
Easy pickings: | unset |
---|---|
milestone: | → 1.4 |
Version: | 1.1 → SVN |
This is still a problem in current SVN, and the patch fixes it.
I've updated the patch to apply cleanly, to be just for this ticket, and to have tests. All the unit tests pass.
comment:5 by , 13 years ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Accepted → Design decision needed |
UI/UX: | unset |
It seems than the real problem come from the definition of punctuation_re:
TRAILING_PUNCTUATION = ['.', ',', ')', '>', '\n', '>'] punctuation_re = re.compile('^(?P<lead>(?:%s)*)(?P<middle>.*?)(?P<trail>(?:%s)*)$' % \ ('|'.join([re.escape(x) for x in LEADING_PUNCTUATION]), '|'.join([re.escape(x) for x in TRAILING_PUNCTUATION])))
The ')' character is not interpreted as a part of the url but as an enclosing character.
The patch provided solve the parenthesis problem but do not solve other cases for other unreserved characters in the TRAILING_PUNCTUATION list.
Valid character accoding to RFC 1738 named as extra are "!" | "*" | "'" | "(" | ")" | ","
For example: http://en.wikipedia.org/wiki/Sibiu(Romania). which is the valid (notice the dot at the end of the url)
To completely solve this problem, I think an in-depth rewrite of the punctation_re should be done which should be able to distinguish between
http://en.wikipedia.org/wiki/Sibiu(Romania) and (http://en.wikipedia.org/wiki/Sibiu(Romania))
A design decision is needed.
comment:6 by , 13 years ago
Could you provide an example of a live url where this is a problem, with characters other than parenthesis?
comment:8 by , 13 years ago
Owner: | changed from | to
---|
by , 13 years ago
Attachment: | 11911-2.patch added |
---|
Joint tests for tickets #11911 and #12183