Opened 7 years ago

Closed 5 years ago

#11911 closed Bug (fixed)

urlizetrunc not taking into account last ')' of a link

Reported by: Stefan_Petrea Owned by: Aymeric Augustin
Component: Template system Version: master
Severity: Normal Keywords: urlizetrunc
Cc: Triage Stage: Design decision needed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Hi,

I'm using Django 1.1 , the release version.

The urlizetrunc filter is causing some problems , for example the following link
http://en.wikipedia.org/wiki/Sibiu(Romania)

is rendered as <a href="http://en.wikipedia.org/wiki/Sibiu%28Romania" rel="nofollow">http://en.wikipedia.org/wik...</a>)
(please notice the trailing ')' )
This is not correct , it should take the ')' inside the link.

(As a work-around I'm adding # to my links, which seems to work).

Best regards,

Stefan

Attachments (4)

11911-and-12183-tests.patch (1.8 KB) - added by Tom Mortimer-Jones 7 years ago.
Joint tests for tickets #11911 and #12183
11911-and-12183.patch (1.8 KB) - added by Tom Mortimer-Jones 7 years ago.
Joint patch for tickets #11911 and #12183
11911.patch (2.5 KB) - added by Graham King 6 years ago.
Split out just the part for this ticket. Added tests.
11911-2.patch (3.8 KB) - added by Aymeric Augustin 5 years ago.

Download all attachments as: .zip

Change History (13)

comment:1 Changed 7 years ago by Russell Keith-Magee

Summary: [BUG] urlizetrunc not taking into account last ')' of a link (example given)urlizetrunc not taking into account last ')' of a link
Triage Stage: UnreviewedAccepted

Changed 7 years ago by Tom Mortimer-Jones

Attachment: 11911-and-12183-tests.patch added

Joint tests for tickets #11911 and #12183

Changed 7 years ago by Tom Mortimer-Jones

Attachment: 11911-and-12183.patch added

Joint patch for tickets #11911 and #12183

comment:2 Changed 7 years ago by Tom Mortimer-Jones

Has patch: set

I have attached tests and a patch that fixes this bug as well as #12183.

comment:3 Changed 6 years ago by Julien Phalip

Severity: Normal
Type: Bug

Changed 6 years ago by Graham King

Attachment: 11911.patch added

Split out just the part for this ticket. Added tests.

comment:4 Changed 6 years ago by Graham King

Easy pickings: unset
milestone: 1.4
Version: 1.1SVN

This is still a problem in current SVN. I've updated the patch to apply cleanly, to be just for this ticket, and to have tests. All the unit tests pass.

Version 0, edited 6 years ago by Graham King (next)

comment:5 Changed 5 years ago by trez

Patch needs improvement: set
Triage Stage: AcceptedDesign decision needed
UI/UX: unset

It seems than the real problem come from the definition of punctuation_re:

TRAILING_PUNCTUATION = ['.', ',', ')', '>', '\n', '&gt;']

punctuation_re = re.compile('^(?P<lead>(?:%s)*)(?P<middle>.*?)(?P<trail>(?:%s)*)$' % \
    ('|'.join([re.escape(x) for x in LEADING_PUNCTUATION]),
    '|'.join([re.escape(x) for x in TRAILING_PUNCTUATION])))

The ')' character is not interpreted as a part of the url but as an enclosing character.
The patch provided solve the parenthesis problem but do not solve other cases for other unreserved characters in the TRAILING_PUNCTUATION list.

Valid character accoding to RFC 1738 named as extra are "!" | "*" | "'" | "(" | ")" | ","

For example: http://en.wikipedia.org/wiki/Sibiu(Romania). which is the valid (notice the dot at the end of the url)

To completely solve this problem, I think an in-depth rewrite of the punctation_re should be done which should be able to distinguish between
http://en.wikipedia.org/wiki/Sibiu(Romania) and (http://en.wikipedia.org/wiki/Sibiu(Romania))

A design decision is needed.

comment:6 Changed 5 years ago by Graham King

Could you provide an example of a live url where this is a problem, with characters other than parenthesis?

comment:7 Changed 5 years ago by Jacob

milestone: 1.4

Milestone 1.4 deleted

comment:8 Changed 5 years ago by Aymeric Augustin

Owner: changed from nobody to Aymeric Augustin

Changed 5 years ago by Aymeric Augustin

Attachment: 11911-2.patch added

comment:9 Changed 5 years ago by Aymeric Augustin

Resolution: fixed
Status: newclosed

In [17362]:

Fixed #11911 -- Made the urlize filter smarter with closing punctuation.

Note: See TracTickets for help on using tickets.
Back to Top