Opened 16 years ago
Closed 14 years ago
#11911 closed Bug (fixed)
urlizetrunc not taking into account last ')' of a link
| Reported by: | Stefan_Petrea | Owned by: | Aymeric Augustin |
|---|---|---|---|
| Component: | Template system | Version: | dev |
| Severity: | Normal | Keywords: | urlizetrunc |
| Cc: | Triage Stage: | Design decision needed | |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | yes |
| Easy pickings: | no | UI/UX: | no |
Description
Hi,
I'm using Django 1.1 , the release version.
The urlizetrunc filter is causing some problems , for example the following link
http://en.wikipedia.org/wiki/Sibiu(Romania)
is rendered as <a href="http://en.wikipedia.org/wiki/Sibiu%28Romania" rel="nofollow">http://en.wikipedia.org/wik...</a>)
(please notice the trailing ')' )
This is not correct , it should take the ')' inside the link.
(As a work-around I'm adding # to my links, which seems to work).
Best regards,
Stefan
Attachments (4)
Change History (13)
comment:1 by , 16 years ago
| Summary: | [BUG] urlizetrunc not taking into account last ')' of a link (example given) → urlizetrunc not taking into account last ')' of a link |
|---|---|
| Triage Stage: | Unreviewed → Accepted |
by , 15 years ago
| Attachment: | 11911-and-12183-tests.patch added |
|---|
by , 15 years ago
| Attachment: | 11911-and-12183.patch added |
|---|
comment:2 by , 15 years ago
| Has patch: | set |
|---|
I have attached tests and a patch that fixes this bug as well as #12183.
comment:3 by , 15 years ago
| Severity: | → Normal |
|---|---|
| Type: | → Bug |
by , 14 years ago
| Attachment: | 11911.patch added |
|---|
Split out just the part for this ticket. Added tests.
comment:4 by , 14 years ago
| Easy pickings: | unset |
|---|---|
| milestone: | → 1.4 |
| Version: | 1.1 → SVN |
This is still a problem in current SVN. I've updated the patch to apply cleanly, to be just for this ticket, and to have tests. All the unit tests pass.
comment:5 by , 14 years ago
| Patch needs improvement: | set |
|---|---|
| Triage Stage: | Accepted → Design decision needed |
| UI/UX: | unset |
It seems than the real problem come from the definition of punctuation_re:
TRAILING_PUNCTUATION = ['.', ',', ')', '>', '\n', '>']
punctuation_re = re.compile('^(?P<lead>(?:%s)*)(?P<middle>.*?)(?P<trail>(?:%s)*)$' % \
('|'.join([re.escape(x) for x in LEADING_PUNCTUATION]),
'|'.join([re.escape(x) for x in TRAILING_PUNCTUATION])))
The ')' character is not interpreted as a part of the url but as an enclosing character.
The patch provided solve the parenthesis problem but do not solve other cases for other unreserved characters in the TRAILING_PUNCTUATION list.
Valid character accoding to RFC 1738 named as extra are "!" | "*" | "'" | "(" | ")" | ","
For example: http://en.wikipedia.org/wiki/Sibiu(Romania). which is the valid (notice the dot at the end of the url)
To completely solve this problem, I think an in-depth rewrite of the punctation_re should be done which should be able to distinguish between
http://en.wikipedia.org/wiki/Sibiu(Romania) and (http://en.wikipedia.org/wiki/Sibiu(Romania))
A design decision is needed.
comment:6 by , 14 years ago
Could you provide an example of a live url where this is a problem, with characters other than parenthesis?
comment:8 by , 14 years ago
| Owner: | changed from to |
|---|
by , 14 years ago
| Attachment: | 11911-2.patch added |
|---|
Joint tests for tickets #11911 and #12183