Context Navigation

← Previous Ticket
Next Ticket →

#5606 closed (wontfix)

urlize filter should recognize only the characters which URL RFC specifies.

Reported by:	daybreaker12@…	Owned by:	nobody
Component:	Template system	Version:	dev
Severity:		Keywords:
Cc:		Triage Stage:	Design decision needed
Has patch:	no	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description

Current implementations of urlize filter which uses the code in utils/html.py recognizes URL by splitting the text into many words.

But in Korean (or Japanese) language, this implementation may cause some problems. There is a concept of '조사 (postpositional word)' which is represented as one or more characters following an word without any spaces.

For example,
"나는 http://example.com을 추천합니다." means, "I recommend http://example.com.".
The character '을' is not a part of URL, but the current urlize implementation recognizes it as a part of URL.

Of course, because there may exist URLs including unicode Korean characters, deciding which character should be excluded from URL is somewhat confusing. However, those cases are very rare because most of Korean URLs are encoded like 'http://example.com/tags/%EB%B8%94%EB%A1%9C%EA%B7%B8' ('http://example.com/tags/블로그' in utf-8 encoding).

So I suggest you to modify the code using only characters US-ASCII code for URL auto-linking as specified in RFC 1738.

Change History (2)

comment:1 by Simon G <dev@…>, 17 years ago

Triage Stage:	Unreviewed → Design decision needed

comment:2 by Malcolm Tredinnick, 14 years ago

Resolution:	→ wontfix
Status:	new → closed

We cannot really do this. URls can contain non-ASCII characters, they just have to be URL-encoded at transmission time (not at display time). So without doing semantic analysis, we cannot work out which words are part of the url. The urlize filter is aiming at the 80% case. If it's not working your domain, then you'll need to use something else.

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues

Context Navigation

#5606 closed (wontfix)

urlize filter should recognize only the characters which URL RFC specifies.

Description

Change History (2)

comment:1 by Simon G <dev@…>, 17 years ago

comment:2 by Malcolm Tredinnick, 14 years ago

Download in other formats:

Django Links

Learn More

Get Involved

Get Help

Follow Us

Support Us